Advanced Merging

From cctbx_xfel
Jump to navigation Jump to search

Here we describe two use cases for merging: one where an isomorphous structure is already known, and the other with a new structure. In the examples below, note the use of the same command line parameters for the MERGE and XMERGE steps, allowing the command script to be formed in a condensed non-redundant fashion. Not all of the parameters are actually used by both steps. Used only by merge: nproc. Used only by xmerge: scaling.*

Use case 1: Isomorphous replacement

In this case, the new XFEL data are scaled to a known isomorphous reference, either a synchrotron-solved structure or a previous XFEL structure:

$ vi psII_merge.csh
#!/bin/csh -f
set trial=${1}

set runs = 127,130,132,134,135,140,141,142,144,145,146,148,151,152,157,162,163
set datastring = \
`python -c "print ' '.join(['data=/my_result_directory/L785/r%04d/${trial}/integration'%i for i in [${runs}]])"`
set tag = L785_2flash_${trial}

set effective_params = "d_min=4.7 \
output.n_bins=20 \
${datastring} \
model=/my_work/3bz1_3bz2_core.pdb \
nproc=16 \
merge_anomalous=True \
plot_single_index_histograms=False \
raw_data.sdfac_auto=True \
mysql.runtag=${tag} \
mysql.passwd=terp888 \
mysql.user=nick \
mysql.database=xfelnks \
scaling.mtz_file=./3bz1-sf.mtz \
scaling.show_plots=False \
scaling.algorithm=mark0 \
scaling.log_cutoff=9. \
set_average_unit_cell=True \
rescale_with_average_cell=True \
pixel_size = 0.11 \
output.prefix=${tag}"

cxi.merge ${effective_params}
cxi.xmerge ${effective_params}

$./psII_merge.csh 009 # merge the data from trial 009

Use case 2: New structure

This is a completely unknown structure with no isomorphous reference:

$ vi bt_allmerge.csh
#!/bin/csh -f

set trial=${1}
set datadir = /my_work_area/LCLS/cxis9913
set runs = 62,63,64,65,66,67,68,69,70
set datastring = `python -c "print ' '.join(['data=${datadir}/r%04d/${trial}/integration'%i for i in [${runs}]])"`
set tag = last_BT_${trial}

set effective_params = “d_min=3.0 \
output.n_bins=13 \
${datastring} \
target_unit_cell=81.8,94.0,123.0,90,90,90 \
target_space_group=P212121 \
nproc=16 \
merge_anomalous=True \
plot_single_index_histograms=False \
raw_data.sdfac_auto=True \
mysql.runtag=${tag} \
mysql.passwd=terp888 \
mysql.user=nick \
mysql.database=xfelnks \
scaling.mtz_file="fake_fake.mtz" \
scaling.show_plots=True \
scaling.algorithm=mark1 \
scaling.log_cutoff=3. \
scaling.mtz_column_F=f-obs \
set_average_unit_cell=True \
rescale_with_average_cell=True \
pixel_size = 0.11 \
output.prefix=${tag}”

cxi.merge ${effective_params}
cxi.xmerge ${effective_params}

$ ./bt_allmerge.csh 001

The two cases contrasted

Note the following differences in the command lines for the two use cases:

  • model [pdb file | mtz file] is provided only in case #1 as the scaling reference, not in case #2.
  • min_corr is ignored in case #2; images are not discarded for lack of correlation to a reference.
  • target_unit_cell is mandatory in case #2; it is used to form the list of reflections in the asymmetric unit. Not used in case #1.
  • target_space_group [symbol] is mandatory in case #2 and is carried through to the mtz output.
  • scaling.mtz_file is supplied in case #2, but here it is a dummy file name, used to output fake structure factor amplitudes for dummy scaling [Do not supply the name of an existing file]. In case #1 it is the actual file containing reference Iobs to calculate the CCiso.
  • scaling.algorithm is set to mark1 (no scaling) for case #2; mark0 (isomorphous reference) for case #1.
  • scaling.mtz_column_F is set to "f-obs", the label used by the fake structure factor generator for case #2. For case #1, set it to whatever label is to be used to look up the Iobs for calculating CCiso.

Additional explanation of parameters

model
a PBD file (heavy atoms omitted) to compute Fmodel intensities for frame by frame scaling.
nproc
the xmerge script is fast and uses 1 core, this parameter has no effect for xmerge, only for merge.
scaling.mtz_file
isomorphous experimental structure factors for computing the isomorphous correlation-coefficient CCiso; measures data quality only; does not affect scaling.
scaling.algorithm
mark0: the usual method, rejects some frames based on low correlation with reference PDB model, then scales frame-by-frame with data scaled to the isomorphous reference.

mark1: no scaling.

scaling.log_cutoff
For the calculation of correlation coefficients, ignore the weak data (affects only the reported quality measures, not the scaling algorithm). Cutoff is expressed on a log scale. The only way to determine a good cutoff is to use scaling.show_plots=True the first time through, and experiment with different values.