Merging: Difference between revisions
m (Changed title.) |
(Added merging tables.) |
||
Line 3: | Line 3: | ||
== Merging a set of integration files == | == Merging a set of integration files == | ||
In ''cctbx.xfel'' the per-image scale factors are determined using a ''scaling reference''. This scaling reference is expected to be a previously solved, isomorphous data set. The scale factor is determined by a least-squares fit of the observations to the reference intensities, after applying corrections for polarization<ref>[http://dx.doi.org/10.1107/S0021889882012060 Kahn, R, <i>et al.</i> Macromolecular Crystallography with Synchrotron Radiation: Photographic Data Collection and Polarization Correction. <i>J Appl Cryst</i> <b>15</b>, 330–337 (1982).]</ref>, and a significance filter, which limits the resolution of each diffraction pattern based on the signal-to-noise ratio. | In ''cctbx.xfel'' the per-image scale factors are determined using a ''scaling reference''. This scaling reference is expected to be a previously solved, isomorphous data set. The scale factor is determined by a least-squares fit of the observations to the reference intensities, after applying corrections for polarization<ref>[http://dx.doi.org/10.1107/S0021889882012060 Kahn, R, <i>et al.</i> Macromolecular Crystallography with Synchrotron Radiation: Photographic Data Collection and Polarization Correction. <i>J Appl Cryst</i> <b>15</b>, 330–337 (1982).]</ref>, and a significance filter, which limits the resolution of each diffraction pattern based on the signal-to-noise ratio. Lattices not conforming to the Bravais symmetry of the scaling reference are rejected. By default non-isomorphous lattices with cell lengths differing by more than 10% in length from the mean, or 2° in the angles are also rejected, as are lattices that correlate poorly with the scaling reference. | ||
''cxi.merge'' may take several passes over the integrated images. In order to speed up processing, ''cxi.merge'' will write the scaled data to a back end database during each pass. Currently three database back ends are implemented. | ''cxi.merge'' may take several passes over the integrated images. In order to speed up processing, ''cxi.merge'' will write the scaled data to a back end database during each pass. Currently three database back ends are implemented. | ||
Line 13: | Line 13: | ||
* The <code>SQLite</code> back end uses a simple SQLite database, which is written to a single file on the file system. It is easier to use than the <code>MySQL</code> back end and more efficient than the <code>FS</code> backend. Regrettably, the <code>SQLite</code> back end does not appear to work on the Lustre file system. | * The <code>SQLite</code> back end uses a simple SQLite database, which is written to a single file on the file system. It is easier to use than the <code>MySQL</code> back end and more efficient than the <code>FS</code> backend. Regrettably, the <code>SQLite</code> back end does not appear to work on the Lustre file system. | ||
Compared to indexing and integration, merging is a relatively quick procedure. However, particularly for large datasets, it may significantly strain computational resources. Therefore, it is recommended to merge data on | Compared to indexing and integration, merging is a relatively quick procedure. However, particularly for large datasets, it may significantly strain computational resources. Therefore, it is recommended to merge data on SLAC's interactive nodes. | ||
<pre>$ ssh psanacs.slac.stanford.edu | <pre>$ ssh psanacs.slac.stanford.edu | ||
$ cd myrelease | $ cd myrelease | ||
</pre> | </pre> | ||
In ''cctbx.xfel'' images are merged using the <code>cxi.merge</code> command. | In ''cctbx.xfel'' images are merged using the <code>cxi.merge</code> command. The results are logged to standard output, so it is advisable to redirect the stream to a file. | ||
<pre> | <pre> | ||
$ cxi.merge Ls04-lysozyme-merge.phil | $ cxi.merge Ls04-lysozyme-merge.phil > merge.out | ||
</pre> | </pre> | ||
Line 48: | Line 48: | ||
: If <code>True</code> set the unit cell of the merged data to the average of the merged images, otherwise use the unit cell of the scaling reference | : If <code>True</code> set the unit cell of the merged data to the average of the merged images, otherwise use the unit cell of the scaling reference | ||
The merged data is written to an mtz-file named <code><i>output.prefix</i>.mtz</code>. The output, redirected to <code>merge.out</code> above, mainly consists of statistics for each individual image as it is scaled. The output concludes with a section labelled <code>FINISHED MERGING</code>, which first lists the number of accepted and rejected images, | |||
<pre> | |||
962 of 1185 integration files were accepted | |||
127 rejected due to wrong Bravais group | |||
1 rejected for unit cell outliers | |||
12 rejected for low signal | |||
83 rejected due to poor correlation | |||
</pre> | |||
This is followed by a histograms of the unit cell distribution, and the merging table: | |||
<pre> | |||
--------------------------------------------------------------------------------- | |||
<asu <obs | |||
Bin Resolution Range Completeness redun> redun> n_meas <I> <I/sig(I)> | |||
--------------------------------------------------------------------------------- | |||
1 -1.0000 - 6.4633 [306/309] 67.61 68.28 20893 35621 27.044 | |||
2 6.4633 - 5.1299 [285/285] 43.18 43.18 12305 19272 12.490 | |||
3 5.1299 - 4.4814 [275/275] 46.77 46.77 12862 24391 14.261 | |||
4 4.4814 - 4.0716 [269/269] 43.38 43.38 11670 29436 14.819 | |||
5 4.0716 - 3.7798 [249/249] 35.50 35.50 8840 29811 12.716 | |||
6 3.7798 - 3.5569 [267/267] 32.07 32.07 8562 24212 10.523 | |||
7 3.5569 - 3.3787 [274/274] 20.99 20.99 5750 23626 8.171 | |||
8 3.3787 - 3.2317 [256/256] 15.96 15.96 4085 24168 6.746 | |||
9 3.2317 - 3.1072 [263/264] 12.16 12.20 3209 23936 5.859 | |||
10 3.1072 - 3.0000 [263/263] 8.74 8.74 2298 20613 4.556 | |||
All [2707/2711] 33.37 33.42 90474 25594 11.978 | |||
---------------------------------------------------------------------------------- | |||
</pre> | |||
In this case, the overall completeness to 3.0 Å is 2,707 / 2,711, or approximately 100%, and each observed reflection is measured 33.42 times on average (<code><obs redun></code>). In total 90,474 observations, with an average scaled and integrated intensity of 25,594 analog-to-digital units (ADU), were merged and the mean <i>I</i> / σ(<i>I</i>) was determined to be 11.978. | |||
== Additional merging statistics == | == Additional merging statistics == |
Revision as of 05:43, 4 October 2013
The result of Indexing and integration is a set of Python pickle files, each of which essentially contains a table of Miller indices of the observed reflections, their integrated intensities, and estimated errors. In the general case, these files reflect the measurements from single shots, each exposing different crystals with a unique pulse of X-rays. Merging refers to the procedure applied to unite all these observations into a single data set. During merging, a distinct multiplicative factor, which accounts for the variance in pulse intensity and crystal size, is applied to the observations from a single shot to bring all the observations onto a common scale. The intensities for individual reflections are then summed, and their errors are propagated in quadrature. The result of merging is an mtz file suited for further processing, e.g. molecular replacement.
Merging a set of integration files
In cctbx.xfel the per-image scale factors are determined using a scaling reference. This scaling reference is expected to be a previously solved, isomorphous data set. The scale factor is determined by a least-squares fit of the observations to the reference intensities, after applying corrections for polarization<ref>Kahn, R, et al. Macromolecular Crystallography with Synchrotron Radiation: Photographic Data Collection and Polarization Correction. J Appl Cryst 15, 330–337 (1982).</ref>, and a significance filter, which limits the resolution of each diffraction pattern based on the signal-to-noise ratio. Lattices not conforming to the Bravais symmetry of the scaling reference are rejected. By default non-isomorphous lattices with cell lengths differing by more than 10% in length from the mean, or 2° in the angles are also rejected, as are lattices that correlate poorly with the scaling reference.
cxi.merge may take several passes over the integrated images. In order to speed up processing, cxi.merge will write the scaled data to a back end database during each pass. Currently three database back ends are implemented.
FS
is the simplest back end. It stores the scaled intensities, their Miller indices, and information about the shots they were observed during in three flat files on the file system.
- The
MySQL
back end stores data in a MySQL database. The database must be set up beforehand, and credentials to access it must be supplied in the parameters passed to cxi.merge.
- The
SQLite
back end uses a simple SQLite database, which is written to a single file on the file system. It is easier to use than theMySQL
back end and more efficient than theFS
backend. Regrettably, theSQLite
back end does not appear to work on the Lustre file system.
Compared to indexing and integration, merging is a relatively quick procedure. However, particularly for large datasets, it may significantly strain computational resources. Therefore, it is recommended to merge data on SLAC's interactive nodes.
$ ssh psanacs.slac.stanford.edu $ cd myrelease
In cctbx.xfel images are merged using the cxi.merge
command. The results are logged to standard output, so it is advisable to redirect the stream to a file.
$ cxi.merge Ls04-lysozyme-merge.phil > merge.out
Here, Ls04-lysozyme-merge.phil
is a phil-file with the parameters to control the merging procedure. In this tutorial only a subset of the available options are defined.
- backend
- Back end database;
FS
for flat-file ASCII data storage,MySQL
andSQLite
for the respective proper database back ends. - d_min
- Limiting resolution for scaling and merging
- data
- Directory containing integrated data in pickle format. Repeat to specify additional directories.
- merge_anomalous
True
to merge anomalous contributors (i.e. Bijvoet mates),False
to preserve them- min_corr
- Correlation cutoff for rejecting individual frames
- model
- The scaling reference, PDB filename containing atomic coordinates and isomorphous
CRYST1
record - nproc
- Specifies the number of scaling processes cxi.merge may have running at any one time
- output.prefix
- Prefix for all output file names
- rawdata.sdfac_auto
True
to applySDFAC
correction to each image, assuming negative intensities are normally distributed noise- rescale_with_average_cell
- Rescale the images a second time, requiring images to conform to the average unit cell. If set to
True
,set_average_unit_cell
must also be set toTrue
. - set_average_unit_cell
- If
True
set the unit cell of the merged data to the average of the merged images, otherwise use the unit cell of the scaling reference
The merged data is written to an mtz-file named output.prefix.mtz
. The output, redirected to merge.out
above, mainly consists of statistics for each individual image as it is scaled. The output concludes with a section labelled FINISHED MERGING
, which first lists the number of accepted and rejected images,
962 of 1185 integration files were accepted 127 rejected due to wrong Bravais group 1 rejected for unit cell outliers 12 rejected for low signal 83 rejected due to poor correlation
This is followed by a histograms of the unit cell distribution, and the merging table:
--------------------------------------------------------------------------------- <asu <obs Bin Resolution Range Completeness redun> redun> n_meas <I> <I/sig(I)> --------------------------------------------------------------------------------- 1 -1.0000 - 6.4633 [306/309] 67.61 68.28 20893 35621 27.044 2 6.4633 - 5.1299 [285/285] 43.18 43.18 12305 19272 12.490 3 5.1299 - 4.4814 [275/275] 46.77 46.77 12862 24391 14.261 4 4.4814 - 4.0716 [269/269] 43.38 43.38 11670 29436 14.819 5 4.0716 - 3.7798 [249/249] 35.50 35.50 8840 29811 12.716 6 3.7798 - 3.5569 [267/267] 32.07 32.07 8562 24212 10.523 7 3.5569 - 3.3787 [274/274] 20.99 20.99 5750 23626 8.171 8 3.3787 - 3.2317 [256/256] 15.96 15.96 4085 24168 6.746 9 3.2317 - 3.1072 [263/264] 12.16 12.20 3209 23936 5.859 10 3.1072 - 3.0000 [263/263] 8.74 8.74 2298 20613 4.556 All [2707/2711] 33.37 33.42 90474 25594 11.978 ----------------------------------------------------------------------------------
In this case, the overall completeness to 3.0 Å is 2,707 / 2,711, or approximately 100%, and each observed reflection is measured 33.42 times on average (<obs redun>
). In total 90,474 observations, with an average scaled and integrated intensity of 25,594 analog-to-digital units (ADU), were merged and the mean I / σ(I) was determined to be 11.978.
Additional merging statistics
cxi.xmerge retrieves the scaled, unmerged intensities from the database back end, and calculates the CC1/2<ref>Karplus, P. A. & Diederichs, K. Linking Crystallographic Model and Data Quality. Science 336, 1030–1033 (2012).</ref> and CCiso statistics. CC1/2 is defined as Pearson's correlation coefficient between two sets, such that for each unique reflection the average intensities of two randomly chosen halves of its independent observations are assigned to different set. CCiso is the correlation coefficient between the merged data and the isomorphous scaling reference. Both statistics are computed in each resolution bin, as well as for the full set of reflections.
Once the database has been populated using cxi.merge, cxi.xmerge can be run, using the parameters defined in a phil-file, Ls-04-lysozyme-xmerge.phil
.
$ cxi.xmerge Ls04-lysozyme-xmerge.phil
The options used in this tutorial not already described in Merging a set of integration files
- scaling.mtz_column_F
- Column name in the reference structure mtz-file with structure factors
- scaling.mtz_file
- mtz-file with reference structure factors, must have data type
F
- scaling.log_cutoff
- Intensities less than e
scaling.log_cutoff
will not be included in the calculation
References
<references/>