Indexing and integration: Difference between revisions

From cctbx_xfel
Jump to navigation Jump to search
(Skeleton in place.)
 
 
(31 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Data processing for "diffract-before-destroy" experiments is a serial, two-step procedure.  In the first step, the image is
In principle, one could attempt to index and integrate every single image recorded during a "diffract-and-destroy" experiment.  Typically only a small fraction of the recorded images exhibit an indexable lattice, and in practice a ''hit-finder'' is needed to triage the stream of images, and preserve computational resources.


In practice, indexing and integration for "diffract-before-destroy" experiments is a two-step procedure.  The first step consists of  hitfinding. The purpose of hit-finding is merely to avoid spending further computational resources on images that quickly can be determined to be blank.
Indexing and integration are tightly coupled; it is impossible to integrate an image without first obtaining an indexing solution.  The result of successfully indexing an image is essentially a model of crystal–beam interaction.  The model not only describes how the crystal is oriented with respect to the direction of the X-ray pulse, but also its degree of imperfection (''i.e.'' mosaicity), and qualities of the illuminating pulse (''e.g.'' bandwidth). The integration algorithm uses this model to predict where Bragg spots appear on the image, and integrates the intensities around those predications. The integration algorithm thus exploits the regularity of the underlying crystal lattice to recover signal where it was too weak to be detected.


Indexing and integration are tightly coupled; it is impossible to integrate an image without an indexing solution.
== <code>psana</code> section ==


For historical reasons, hit-findinding and indexing/integration are implemented in the same module, <code>mod_hitfind</code>.  On instance of <code>mod_hitfind</code> is configured for hit-finding, while another is configured for indexing and integration.
For historical reasons, hit-finding and indexing/integration are both implemented in the same analysis module, called <code>mod_hitfind</code>.  To chain the two functions together, ''psana'' must be configured to pass each event through two different instances of the same module.  The two instances are differentiated by their name following the colon.  In the example below, the first instance is called <code>hitfind</code>, and the second is <code>index</code>.
 
What action the module takes depends
 
== ''pyana'' section ==
 
For historical reasons, hit-finding and indexing/integration are both implemented in the same analysis module, called <code>mod_hitfind</code>.  To chain the two functions together, ''pyana'' must be configured to pass each event through two different instances of the same module.  The two instances are differentiated by their name following the colon.  In the example below, the first instance is called <code>hitfind</code>, and the second is <code>index</code>.
<pre>
<pre>
[pyana]
[psana]
modules = my_ana_pkg.mod_hitfind:hitfind \
modules = my_ana_pkg.mod_hitfind:hitfind \
           my_ana_pkg.mod_hitfind:index
           my_ana_pkg.mod_hitfind:index
</pre>
</pre>
Any instance of an analysis module can signal ''pyana'' to skip further processing of the event. In the case above, the <code>hitfind</code> instance of <code>mod_hitfind</code> would signal ''pyana'' to skip the event if it determines that the image associated with the event is not a hit.  In that case, the second instance, <code>index</code>, will never see the event.  Otherwise, the event is fed to the second instance, which will attempt indexing and integration on the associated image.
Any instance of an analysis module can signal ''psana'' to skip further processing of the event. In the case above, the <code>hitfind</code> instance of <code>mod_hitfind</code> would signal ''psana'' to skip the event if it determines that the image associated with the event is not a hit.  In that case, the second instance, <code>index</code>, will never see the event.  Otherwise, the event is fed to the second instance, which will attempt indexing and integration on the associated image.


The ''pyana''-section of the processing configuration used for this part of the tutorial has one more feature.
The <code>psana</code>-section of the processing configuration used for this part of the tutorial has one more feature.
<pre>
<pre>
[pyana]
[psana]
modules = my_ana_pkg.mod_hitfind:hitfind \
modules = my_ana_pkg.mod_hitfind:hitfind \
           my_ana_pkg.mod_dump:hit        \
           my_ana_pkg.mod_dump:hit        \
Line 29: Line 23:
Each instance of <code>mod_hitfind</code> is followed by an instance of <code>mod_dump</code>.  <code>mod_dump</code> is a simple module that just outputs the image associated with the event to the file system.  In the above configuration, images are written to disk if they are determined to be hits, and then written again—possibly using a different name—if indexing and integration succeeds.
Each instance of <code>mod_hitfind</code> is followed by an instance of <code>mod_dump</code>.  <code>mod_dump</code> is a simple module that just outputs the image associated with the event to the file system.  In the above configuration, images are written to disk if they are determined to be hits, and then written again—possibly using a different name—if indexing and integration succeeds.


== Hit-finding==
Hit-finding may be viewed as a pattern-recognition problem, separate from whatever methods are used for actual processing of a diffraction image.  The hit-finder in ''cctbx.xfel'', however, is based on the same algorithms as are used for data analysis.  In particular, the hit-finder implemented in mod_hitfind uses the ''Spotfinder'' (Zhang ''et al.'', 2006) algorithm to detect strong, low-resolution peaks.
The configuration for the <code>hitfind</code> instance of <code>mod_hitfind</code> is shown below.
<pre>
[my_ana_pkg.mod_hitfind:hitfind]
address        = CxiDs1-0|Cspad-0
calib_dir      = /reg/g/cctbx/sources/cctbx_project/xfel/metrology/CSPad/run4/CxiDs1.0_Cspad.0
dark_path      = /reg/g/cctbx/tutorials/darks/Ds1-r0002-avg.pickle
dark_stddev    = /reg/g/cctbx/tutorials/darks/Ds1-r0002-stddev.pickle
detz_offset    = 581
dispatch        = nop
distl_flags    = permissive
distl_min_peaks = 16
threshold      = 450
xtal_target    = /reg/g/cctbx/tutorials/indexing/hitfind-7.1.phil
</pre>
; <code>address</code>
: Full data source address of the DAQ device, see [https://confluence.slac.stanford.edu/display/PCDS/Pyana+User+Manual#PyanaUserManual-DataSourceAddress Data Source Address] in the [https://confluence.slac.stanford.edu/display/PCDS/Pyana+User+Manual Pyana User Manual].  This identifies the detector from which images are to be extracted.
; <code>calib_dir</code>
: This value defines the initial placement of the detector elements.  ''cctbx.xfel'' implements its metrology corrections with respect to these initial definitions.  As a result, the value of this option should not be changed.
; <code>dark_path</code>
: Path to an average image from a dark run (''i.e.'' pedestal) to use for dark-subtraction.
; <code>dark_stddev</code>
: Path to a standard-deviation image from a dark run.
; <code>detz_offset</code>
: The distance from the interaction region, where the X-ray pulse and the sample jet intersect, to the back of the detector stage, in mm.
; <code>dispatch</code>
: What action <code>mod_hitfind</code> is to take.  The value should be <code>nop</code> for hit-finding.
; <code>distl_flags</code>
: General behavior of the hit-finder, either <code>permissive</code> or <code>restrictive</code>.
; <code>distl_min_peaks</code>
: How many ''strong'', low-resolution peaks are required to classify the image as a hit.
; <code>threshold</code>
: How high a peak must be above the background to be classified as ''strong'', in analog-to-digital units (ADU).
; <code>xtal_target</code>
: ''phil''-file containing further configuration options for the hit-finder.
== Image output ==
The configuration section for image output is given below.
<pre>
[my_ana_pkg.mod_dump:hit]
address      = CxiDs1-0|Cspad-0
calib_dir    = /reg/g/cctbx/sources/cctbx_project/xfel/metrology/CSPad/run4/CxiDs1.0_Cspad.0
out_dirname  = out
out_basename = shot-
</pre>
The <code>address</code> and <code>calib_dir</code> options were explained in [[#Hit-finding|Hit-finding]] above.


== Hit-finding ==
; <code>out_dirname</code>
: Directory portion of output image pathname.
; <code>out_basename</code>
: Filename prefix of output image pathname.
 
The actual pathname of the image as it is written to the file system is determined by joining <code>out_dirname</code> and <code>out_basename</code> using the directory separator (<code>/</code> on Unix), and appending a textual representation of the timestamp when the image was recorded.


== Indexing and integration ==
== Indexing and integration ==
The result of successfully indexing an image is essentially a model of crystal–beam interaction.  The model not only describes how the crystal is oriented with respect to the direction of the X-ray pulse, but also it's degree of imperfection (''i.e.'' mosaicity), and qualities of the illuminating pulse (''e.g.'' bandwidth). The model
<pre>
address              = CxiDs1-0|Cspad-0
calib_dir            = /reg/g/cctbx/sources/cctbx_project/xfel/metrology/CSPad/run4/CxiDs1.0_Cspad.0
dark_path            = /reg/g/cctbx/tutorials/darks/Ds1-r0002-avg.pickle
dark_stddev          = /reg/g/cctbx/tutorials/darks/Ds1-r0002-stddev.pickle
detz_offset          = 581
dispatch            = index
integration_dirname  = integration
integration_basename = int-
xtal_target          = Ls04-lysozyme.phil
</pre>
 
The options not already explained in [[#Hit-finding|Hit-finding]] are:
; <code>dispatch</code>
: What action <code>mod_hitfind</code> is to take.  The value should be <code>index</code> for indexing.
; <code>integration_dirname</code>
: Directory portion of output integration file pathname.
; <code>integration_basename</code>
: Filename prefix of output integration file pathname.
; <code>xtal_target</code>
: ''[[phil]]''-file containing further configuration options for the indexing and integration algorithms.  Descriptions of these parameters are found on the ''[[phil]]'' section of the tutorial.
 
The integration file is not an image file, but a list of Miller indices, their integrated intensities, and estimated uncertainties.  The actual pathname of the integration file as it is written to the file system is determined by joining out_dirname and out_basename using the directory separator (/ on Unix), and appending a textual representation of the timestamp when the corresponding image was recorded.
 
== Processing the tutorial data ==
To submit a processing job to the cluster, execute the following commands:
<pre>
$ cd ~/myrelease
$ sit_setup
$ cp /reg/g/cctbx/tutorials/indexing/Ls04-lysozyme.cfg .
$ cxi.lsf \
    -c Ls04-lysozyme.cfg -i /reg/d/ana11/cxi/data/Mar2013calib/xtc \
    -o /reg/g/cctbx/tutorials/scratch/<username>/lysozyme \
    -p 6 -q psanaq  -r 3
</pre>
 
To minimize the load on the cluster during the tutorial, please do not submit more than one run.
 
<!--
How to monitor progress, check on number of hits, number of integrated images.  How to inspect sample images. etc
-->

Latest revision as of 17:35, 8 May 2015

In principle, one could attempt to index and integrate every single image recorded during a "diffract-and-destroy" experiment. Typically only a small fraction of the recorded images exhibit an indexable lattice, and in practice a hit-finder is needed to triage the stream of images, and preserve computational resources.

Indexing and integration are tightly coupled; it is impossible to integrate an image without first obtaining an indexing solution. The result of successfully indexing an image is essentially a model of crystal–beam interaction. The model not only describes how the crystal is oriented with respect to the direction of the X-ray pulse, but also its degree of imperfection (i.e. mosaicity), and qualities of the illuminating pulse (e.g. bandwidth). The integration algorithm uses this model to predict where Bragg spots appear on the image, and integrates the intensities around those predications. The integration algorithm thus exploits the regularity of the underlying crystal lattice to recover signal where it was too weak to be detected.

psana section

For historical reasons, hit-finding and indexing/integration are both implemented in the same analysis module, called mod_hitfind. To chain the two functions together, psana must be configured to pass each event through two different instances of the same module. The two instances are differentiated by their name following the colon. In the example below, the first instance is called hitfind, and the second is index.

[psana]
modules = my_ana_pkg.mod_hitfind:hitfind \
          my_ana_pkg.mod_hitfind:index

Any instance of an analysis module can signal psana to skip further processing of the event. In the case above, the hitfind instance of mod_hitfind would signal psana to skip the event if it determines that the image associated with the event is not a hit. In that case, the second instance, index, will never see the event. Otherwise, the event is fed to the second instance, which will attempt indexing and integration on the associated image.

The psana-section of the processing configuration used for this part of the tutorial has one more feature.

[psana]
modules = my_ana_pkg.mod_hitfind:hitfind \
          my_ana_pkg.mod_dump:hit        \
          my_ana_pkg.mod_hitfind:index   \
          my_ana_pkg.mod_dump:indexed

Each instance of mod_hitfind is followed by an instance of mod_dump. mod_dump is a simple module that just outputs the image associated with the event to the file system. In the above configuration, images are written to disk if they are determined to be hits, and then written again—possibly using a different name—if indexing and integration succeeds.

Hit-finding

Hit-finding may be viewed as a pattern-recognition problem, separate from whatever methods are used for actual processing of a diffraction image. The hit-finder in cctbx.xfel, however, is based on the same algorithms as are used for data analysis. In particular, the hit-finder implemented in mod_hitfind uses the Spotfinder (Zhang et al., 2006) algorithm to detect strong, low-resolution peaks.

The configuration for the hitfind instance of mod_hitfind is shown below.

[my_ana_pkg.mod_hitfind:hitfind]
address         = CxiDs1-0|Cspad-0
calib_dir       = /reg/g/cctbx/sources/cctbx_project/xfel/metrology/CSPad/run4/CxiDs1.0_Cspad.0
dark_path       = /reg/g/cctbx/tutorials/darks/Ds1-r0002-avg.pickle
dark_stddev     = /reg/g/cctbx/tutorials/darks/Ds1-r0002-stddev.pickle
detz_offset     = 581
dispatch        = nop
distl_flags     = permissive
distl_min_peaks = 16
threshold       = 450
xtal_target     = /reg/g/cctbx/tutorials/indexing/hitfind-7.1.phil
address
Full data source address of the DAQ device, see Data Source Address in the Pyana User Manual. This identifies the detector from which images are to be extracted.
calib_dir
This value defines the initial placement of the detector elements. cctbx.xfel implements its metrology corrections with respect to these initial definitions. As a result, the value of this option should not be changed.
dark_path
Path to an average image from a dark run (i.e. pedestal) to use for dark-subtraction.
dark_stddev
Path to a standard-deviation image from a dark run.
detz_offset
The distance from the interaction region, where the X-ray pulse and the sample jet intersect, to the back of the detector stage, in mm.
dispatch
What action mod_hitfind is to take. The value should be nop for hit-finding.
distl_flags
General behavior of the hit-finder, either permissive or restrictive.
distl_min_peaks
How many strong, low-resolution peaks are required to classify the image as a hit.
threshold
How high a peak must be above the background to be classified as strong, in analog-to-digital units (ADU).
xtal_target
phil-file containing further configuration options for the hit-finder.

Image output

The configuration section for image output is given below.

[my_ana_pkg.mod_dump:hit]
address      = CxiDs1-0|Cspad-0
calib_dir    = /reg/g/cctbx/sources/cctbx_project/xfel/metrology/CSPad/run4/CxiDs1.0_Cspad.0
out_dirname  = out
out_basename = shot-

The address and calib_dir options were explained in Hit-finding above.

out_dirname
Directory portion of output image pathname.
out_basename
Filename prefix of output image pathname.

The actual pathname of the image as it is written to the file system is determined by joining out_dirname and out_basename using the directory separator (/ on Unix), and appending a textual representation of the timestamp when the image was recorded.

Indexing and integration

address              = CxiDs1-0|Cspad-0
calib_dir            = /reg/g/cctbx/sources/cctbx_project/xfel/metrology/CSPad/run4/CxiDs1.0_Cspad.0
dark_path            = /reg/g/cctbx/tutorials/darks/Ds1-r0002-avg.pickle
dark_stddev          = /reg/g/cctbx/tutorials/darks/Ds1-r0002-stddev.pickle
detz_offset          = 581
dispatch             = index
integration_dirname  = integration
integration_basename = int-
xtal_target          = Ls04-lysozyme.phil

The options not already explained in Hit-finding are:

dispatch
What action mod_hitfind is to take. The value should be index for indexing.
integration_dirname
Directory portion of output integration file pathname.
integration_basename
Filename prefix of output integration file pathname.
xtal_target
phil-file containing further configuration options for the indexing and integration algorithms. Descriptions of these parameters are found on the phil section of the tutorial.

The integration file is not an image file, but a list of Miller indices, their integrated intensities, and estimated uncertainties. The actual pathname of the integration file as it is written to the file system is determined by joining out_dirname and out_basename using the directory separator (/ on Unix), and appending a textual representation of the timestamp when the corresponding image was recorded.

Processing the tutorial data

To submit a processing job to the cluster, execute the following commands:

$ cd ~/myrelease
$ sit_setup
$ cp /reg/g/cctbx/tutorials/indexing/Ls04-lysozyme.cfg .
$ cxi.lsf \
    -c Ls04-lysozyme.cfg -i /reg/d/ana11/cxi/data/Mar2013calib/xtc \
    -o /reg/g/cctbx/tutorials/scratch/<username>/lysozyme \
    -p 6 -q psanaq  -r 3

To minimize the load on the cluster during the tutorial, please do not submit more than one run.