Indexing and integration
In principle, one could attempt to index and integrate every single image recorded during a "diffract-and-destroy" experiment. Typically only a small fraction of the recorded images exhibit an indexable lattice, and in practice a hit-finder is needed to triage the stream of images, and preserve computational resources.
Indexing and integration are tightly coupled; it is impossible to integrate an image without first obtaining an indexing solution. The result of successfully indexing an image is essentially a model of crystal–beam interaction. The model not only describes how the crystal is oriented with respect to the direction of the X-ray pulse, but also its degree of imperfection (i.e. mosaicity), and qualities of the illuminating pulse (e.g. bandwidth). The integration algorithm uses this model to predict where Bragg spots appear on the image, and integrates the intensities around those predications. The integration algorithm thus exploits the regularity of the underlying crystal lattice to recover signal where it was too weak to be detected.
pyana
section
For historical reasons, hit-finding and indexing/integration are both implemented in the same analysis module, called mod_hitfind
. To chain the two functions together, pyana must be configured to pass each event through two different instances of the same module. The two instances are differentiated by their name following the colon. In the example below, the first instance is called hitfind
, and the second is index
.
[pyana] modules = my_ana_pkg.mod_hitfind:hitfind \ my_ana_pkg.mod_hitfind:index
Any instance of an analysis module can signal pyana to skip further processing of the event. In the case above, the hitfind
instance of mod_hitfind
would signal pyana to skip the event if it determines that the image associated with the event is not a hit. In that case, the second instance, index
, will never see the event. Otherwise, the event is fed to the second instance, which will attempt indexing and integration on the associated image.
The pyana
-section of the processing configuration used for this part of the tutorial has one more feature.
[pyana] modules = my_ana_pkg.mod_hitfind:hitfind \ my_ana_pkg.mod_dump:hit \ my_ana_pkg.mod_hitfind:index \ my_ana_pkg.mod_dump:indexed
Each instance of mod_hitfind
is followed by an instance of mod_dump
. mod_dump
is a simple module that just outputs the image associated with the event to the file system. In the above configuration, images are written to disk if they are determined to be hits, and then written again—possibly using a different name—if indexing and integration succeeds.
Hit-finding
Hit-finding may be viewed as a pattern-recognition problem, separate from whatever methods are used for actual processing of a diffraction image. The hit-finder in cctbx.xfel, however, is based on the same algorithms as are used for data analysis. In particular, the hit-finder implemented in mod_hitfind uses the Spotfinder (Zhang et al., 2006) algorithm to detect strong, low-resolution peaks.
The configuration for the hitfind
instance of mod_hitfind
is shown below.
[my_ana_pkg.mod_hitfind:hitfind] address = CxiDs1-0|Cspad-0 calib_dir = /reg/d/ffb/cxi/temp/cctbx/cctbx_xfel/sources/cctbx_project/xfel/metrology/CSPad/run4/CxiDs1.0:Cspad.0 dark_path = /reg/d/ffb/cxi/temp/cctbx/tutorials/darks/Ds1-r0002-avg.pickle dark_stddev = /reg/d/ffb/cxi/temp/cctbx/tutorials/darks/Ds1-r0002-stddev.pickle detz_offset = 581 dispatch = nop distl_flags = permissive distl_min_peaks = 16 threshold = 450 xtal_target = /reg/d/ffb/cxi/temp/cctbx/cctbx_xfel/sources/labelit_regression/xfel/hitfind-7.1.phil
address
- Full data source address of the DAQ device, see Data Source Address in the Pyana User Manual. This identifies the detector from which images are to be extracted.
calib_dir
- This value defines the initial placement of the detector elements. cctbx.xfel implements its metrology corrections with respect to these initial definitions. As a result, the value of this option should not be changed.
dark_path
- Path to an average image from a dark run (i.e. pedestal) to use for dark-subtraction.
dark_stddev
- Path to a standard-deviation image from a dark run.
detz_offset
- The distance from the interaction region, where the X-ray pulse and the sample jet intersect, to the back of the detector stage, in mm.
dispatch
- What action
mod_hitfind
is to take. The value should benop
for hit-finding. distl_flags
- General behavior of the hit-finder, either
permissive
orrestrictive
. distl_min_peaks
- How many strong, low-resolution peaks are required to classify the image as a hit.
threshold
- How high a peak must be above the background to be classified as strong, in analog-to-digital units (ADU).
xtal_target
- phil-file containing further configuration options for the hit-finder.
Image output
The configuration section for image output is given below.
[my_ana_pkg.mod_dump:hit] address = CxiDs1-0|Cspad-0 calib_dir = /reg/d/ffb/cxi/temp/cctbx/cctbx_xfel/sources/cctbx_project/xfel/metrology/CSPad/run4/CxiDs1.0:Cspad.0 out_dirname = /reg/d/ffb/cxi/temp/cctbx/tutorials/scratch/<username>/lysozyme out_basename = shot-
The address
and calib_dir
options were explained in Hit-finding above.
out_dirname
- Directory portion of output image pathname.
out_basename
- Filename prefix of output image pathname.
The actual pathname of the image as it is written to the file system is determined by joining out_dirname
and out_basename
using the directory separator (/
on Unix), and appending a textual representation of the timestamp when the image was recorded.
Indexing and integration
address = CxiDs1-0|Cspad-0 calib_dir = /reg/d/ffb/cxi/temp/cctbx/cctbx_xfel/sources/cctbx_project/xfel/metrology/CSPad/run4/CxiDs1.0:Cspad.0 dark_path = /reg/d/ffb/cxi/temp/cctbx/tutorials/darks/Ds1-r0002-avg.pickle dark_stddev = /reg/d/ffb/cxi/temp/cctbx/tutorials/darks/Ds1-r0002-stddev.pickle detz_offset = 581 dispatch = index integration_dirname = /reg/d/ffb/cxi/temp/cctbx/tutorials/scratch/<username>/lysozyme integration_basename = int- xtal_target = /reg/d/ffb/cxi/temp/cctbx/cctbx_xfel/sources/labelit_regression/xfel/LXXX-lysozyme.phil
The options not already explained in Hit-finding are:
dispatch
- What action
mod_hitfind
is to take. The value should beindex
for indexing. integration_dirname
- Directory portion of output integration file pathname.
integration_basename
- Filename prefix of output integration file pathname.
xtal_target
- phil-file containing further configuration options for the indexing and integration algorithms.
The integration file is not an image file, but a list of Miller indices, their integrated intensities, and estimated uncertainties. The actual pathname of the integration file as it is written to the file system is determined by joining out_dirname and out_basename using the directory separator (/ on Unix), and appending a textual representation of the timestamp when the corresponding image was recorded.
Processing the tutorial data
The complete configuration is stored at XXX. To submit a processing job to the cluster
$ ./lsf.sh -c LXXX-lysozyme.cfg -o /reg/d/ffb/cxi/temp/cctbx/tutorials/scratch/<username>/lysozyme -p 6 -q psfehq -r 69 -x cxiXXXXX
But note that lsf.sh must be patched first. How to monitor progress, check on number of hits, number of integrated images. How to inspect sample images. etc