Merging: Difference between revisions

From cctbx_xfel
Jump to navigation Jump to search
(Added introductory text)
(Added the merging section)
Line 3: Line 3:


== Merging a set of integration files ==
== Merging a set of integration files ==
In ''cctbx.xfel'' the per-image scale factors are determined using a ''scaling reference''.  This scaling reference is expected to be a previously solved, isomorphous data set.  The scale factor is determined by a least-squares fit of the observations to the reference intensities, after applying corrections for polarization<ref>[http://dx.doi.org/10.1107/S0021889882012060 Kahn, R, <i>et al.</i> Macromolecular Crystallography with Synchrotron Radiation: Photographic Data Collection and Polarization Correction. <i>J Appl Cryst</i> <b>15</b>, 330–337 (1982).]</ref>, and a significance filter, which limits the resolution of each diffraction pattern based on the signal-to-noise ratio.  Images not conforming to the symmetry of the scaling reference are rejected as outliers, as are images that correlate poorly with the scaling reference and images whose unit cell lies deviates too far from that of the scaling reference.


<pre>$ cxi.merge</pre>
Compared to indexing and integration, merging is a relatively quick procedure.  However, particularly for large datasets, it may significantly strain computational resources.  Therefore, it is recommended to merge data on the SLAC's interactive nodes.
<pre>$ ssh psanacs.slac.stanford.edu
$ cd myrelease
</pre>


Note that the SQLite backend does not work on the Lustre file system.  Use the FS backend instead?
In ''cctbx.xfel'' images are merged using the <code>cxi.merge</code> command.
<pre>
$ cxi.merge Ls04-lysozyme-merge.phil
</pre>
 
Here, <code>Ls04-lysozyme-merge.phil</code> is a ''phil''-file with the parameters to control the merging procedure.  In this tutorial only a subset of the available options are defined.
 
; backend
: Back end database; <code>FS</code> for flat-file ASCII data storage, <code>MySQL</code> and <code>SQLite</code> for the respective proper database back ends.  Note that the <code>SQLite</code> back end does not appear to work on the Lustre file system.
; d_min
: Limiting resolution for scaling and merging
; data
: Directory containing integrated data in pickle format.  Repeat to specify additional directories.
; merge_anomalous
: <code>True</code> to merge anomalous contributors (''i.e.'' Bijvoet mates), <code>False</code> to preserve them
; min_corr
: Correlation cutoff for rejecting individual frames
; model
: The scaling reference, PDB filename containing atomic coordinates and isomorphous <code>CRYST1</code> record
; nproc
: Specifies the number of scaling processes ''cxi.merge'' may have running at any one time
; output.prefix
: Prefix for all output file names
; rawdata.sdfac_auto
: <code>True</code> to apply <code>SDFAC</code> correction to each image, assuming negative intensities are normally distributed noise
; rescale_with_average_cell
: Rescale the images a second time, requiring images to conform to the average unit cellIf set to <code>True</code>, <code>set_average_unit_cell</code> must also be set to <code>True</code>.
; set_average_unit_cell
: If <code>True</code> set the unit cell of the merged data to the average of the merged images, otherwise use the unit cell of the scaling reference
 
XXX mention output here, table, rejected images summary
 
The name of the output file is <code><i>output.prefix</i>.mtz</code>.


== Calculating the <i>CC</i><sub>1/2</sub> statistic ==
== Calculating the <i>CC</i><sub>1/2</sub> statistic ==

Revision as of 02:16, 4 October 2013

The result of Indexing and integration is a set of Python pickle files, each of which essentially contains a table of Miller indices of the observed reflections, their integrated intensities, and estimated errors. In the general case, these files reflect the measurements from single shots, each exposing different crystals with a unique pulse of X-rays. Merging refers to the procedure applied to unite all these observations into a single data set. During merging, a distinct multiplicative factor, which accounts for the variance in pulse intensity and crystal size, is applied to the observations from a single shot to bring all the observations onto a common scale. The intensities for individual reflections are then summed, and their errors are propagated in quadrature. The result of merging is an mtz file suited for further processing, e.g. molecular replacement.


Merging a set of integration files

In cctbx.xfel the per-image scale factors are determined using a scaling reference. This scaling reference is expected to be a previously solved, isomorphous data set. The scale factor is determined by a least-squares fit of the observations to the reference intensities, after applying corrections for polarization<ref>Kahn, R, et al. Macromolecular Crystallography with Synchrotron Radiation: Photographic Data Collection and Polarization Correction. J Appl Cryst 15, 330–337 (1982).</ref>, and a significance filter, which limits the resolution of each diffraction pattern based on the signal-to-noise ratio. Images not conforming to the symmetry of the scaling reference are rejected as outliers, as are images that correlate poorly with the scaling reference and images whose unit cell lies deviates too far from that of the scaling reference.

Compared to indexing and integration, merging is a relatively quick procedure. However, particularly for large datasets, it may significantly strain computational resources. Therefore, it is recommended to merge data on the SLAC's interactive nodes.

$ ssh psanacs.slac.stanford.edu
$ cd myrelease

In cctbx.xfel images are merged using the cxi.merge command.

$ cxi.merge Ls04-lysozyme-merge.phil

Here, Ls04-lysozyme-merge.phil is a phil-file with the parameters to control the merging procedure. In this tutorial only a subset of the available options are defined.

backend
Back end database; FS for flat-file ASCII data storage, MySQL and SQLite for the respective proper database back ends. Note that the SQLite back end does not appear to work on the Lustre file system.
d_min
Limiting resolution for scaling and merging
data
Directory containing integrated data in pickle format. Repeat to specify additional directories.
merge_anomalous
True to merge anomalous contributors (i.e. Bijvoet mates), False to preserve them
min_corr
Correlation cutoff for rejecting individual frames
model
The scaling reference, PDB filename containing atomic coordinates and isomorphous CRYST1 record
nproc
Specifies the number of scaling processes cxi.merge may have running at any one time
output.prefix
Prefix for all output file names
rawdata.sdfac_auto
True to apply SDFAC correction to each image, assuming negative intensities are normally distributed noise
rescale_with_average_cell
Rescale the images a second time, requiring images to conform to the average unit cell. If set to True, set_average_unit_cell must also be set to True.
set_average_unit_cell
If True set the unit cell of the merged data to the average of the merged images, otherwise use the unit cell of the scaling reference

XXX mention output here, table, rejected images summary

The name of the output file is output.prefix.mtz.

Calculating the CC1/2 statistic

$ cxi.xmerge