Difference between revisions of "Resolving an Indexing Ambiguity"

From cctbx_xfel
Jump to: navigation, search
(Sort the Lattices)
(Sort the Lattices)
Line 77: Line 77:
  
 
Output files from this script:
 
Output files from this script:
* <em>${tag}_intensities_presort.pickle</em> contains a Python tuple with two Miller arrays, the first giving all intensites input into the calculation, and the second assigning each intensity to a lattice [crystal] number.  For development, not actually used for anything.
+
* <b>${tag}_intensities_presort.pickle</b> contains a Python tuple with two Miller arrays, the first giving all intensites input into the calculation, and the second assigning each intensity to a lattice [crystal] number.  For development, not actually used for anything.
* <em>${tag}_lookup.pickle</em> is a Python dictionary whose keys are the reindexing operators, and values are lists of lattice [crystal] numbers to be reindexed accordingly.
+
* <b>${tag}_lookup.pickle</b> is a Python dictionary whose keys are the reindexing operators, and values are lists of lattice [crystal] numbers to be reindexed accordingly.
* <em>${tag}_reverse_lookup.pickle</em> is a Python dictionary whose keys are the original integrated data pickle file names, and values are the reindexing operator to be applied to each file.  This output is actually used in the next step.
+
* <b>${tag}_reverse_lookup.pickle</b> is a Python dictionary whose keys are the original integrated data pickle file names, and values are the reindexing operator to be applied to each file.  This output is actually used in the next step.

Revision as of 00:48, 29 December 2013

Here we describe the use of Brehm & Diederichs algorithm 2 to resolve the indexing ambiguity for XFEL data. This is applicable for all polar space groups (where the Bravais symmetry is higher than the space group symmetry) and also for cases with pseudo symmetry (e.g., a monoclinic cell with a near 90-degree beta angle).

Brief Description of the Workflow

It is assumed that the reader is familiar with the tutorial for Merging. The usual workflow of cxi.merge followed by cxi.xmerge will fail in the xmerge step if there are reindexing operators (not H,K,L) that relate the individual indexed lattices to one another. We resolve this by (1) running the cxi.merge step to generate a database with all the observations; (2) use Brehm-Diederichs algorithm 2 to identify the reindexing operators with cxi.brehm_diederichs; (3) re-run the cxi.merge + cxi.xmerge process with the additional list of reindexing operators as input.

Detailed Step-By-Step Instructions

Create a Database of Observations

$ vi myoglobin_step1.csh
#!/bin/csh -f
set trial=${1}

set runs = 127,130,132,134,135,140,141,142,144
set datastring = \
`python -c "print ' '.join(['data=/my_results/L785/r%04d/${trial}/integration'%i for i in [${runs}]])"`
set tag = myoglobin_${trial}

set effective_params = "d_min=2.0 \
output.n_bins=20 \
${datastring} \
scaling.algorithm=mark1 \
target_unit_cell=90.3,90.3,45.2,90,90,120 \
target_space_group=P6 \
nproc=16 \
merge_anomalous=True \
mysql.runtag=${tag} \
mysql.passwd=terp888 \
mysql.user=nick \
mysql.database=xfelnks \
scaling.mtz_file=fake_filename.mtz \
output.prefix=${tag}"

cxi.merge ${effective_params} # Note the xmerge script is NOT run here

$./myoglobin_step1.csh 009 # create the database from trial 009

Noteworthy parameters:

  • d_min is the high-resolution limit to be used for resolving the indexing ambiguity, not final merging. There is a trade-off: including too many data slows down the determination of lattice-lattice correlation coefficients, which scales as the number of common Miller indices. Including too few data makes the determination of correlation coefficients unreliable or undefined. Future: the program will print out <L>, the average number of observation pairs per correlation coefficient so the resolution limit can be sensibly adjusted.
  • scaling.algorithm is set to mark1 here (no scaling) to illustrate how data would be processed from an unknown structure. As a consequence (see the Advanced Merging page) we set the target_unit_cell and target_space_group but do not provide a PDB model. Also scaling.mtz_file is set to a dummy value.
  • nproc is the number of processors to be used for writing the database. Use as many single-host processors as are available up to a limit of about 16, beyond which there is no further benefit, at least on Linux.
  • merge_anomalous=True. Since we are trying to maximize the number of common Miller indices for each lattice pair we want to merge the Bijvoet pairs, even if we will look for dispersive differences in the final merging step.
  • backend. Use whatever backend is available on your system; choices are FS, MySQL (default), and SQLite.

Sort the Lattices

$ vi myoglobin_step2.csh
#!/bin/csh -f
set trial=${1}
set tag = myoglobin_${trial}

set effective_params = "d_min=2.0 \
target_unit_cell=90.3,90.3,45.2,90,90,120 \
target_space_group=P6 \
nproc=32 \
merge_anomalous=True \
mysql.runtag=${tag} \
mysql.passwd=terp888 \
mysql.user=nick \
mysql.database=xfelnks \
output.prefix=${tag}"

cxi.brehm_diederichs ${effective_params}

$./myoglobin_step2.csh 009 # sort the lattices from trial 009

Noteworthy parameters:

  • d_min is the high-resolution limit to be used for resolving the indexing ambiguity, not final merging. There is a trade-off: including too many data slows down the determination of lattice-lattice correlation coefficients, which scales as the number of common Miller indices. Including too few data makes the determination of correlation coefficients unreliable or undefined. Future: the program will print out <L>, the average number of observation pairs per correlation coefficient so the resolution limit can be sensibly adjusted.
  • nproc is the number of processors used on a single host. Must either be set to 1 or >=5; there is no implementation for 2-4. If there are a large number of images (>2000) it is advantageous to set this number as high as possible [we use 64 on our AMD Linux machine].
  • target_unit_cell, target_space_group and merge_anomalous are mandatory and should be set to the same values used in the myoglobin_step1.csh script above.
  • Likewise, mysql.runtag and output.prefix should use the same values.

Output files from this script:

  • ${tag}_intensities_presort.pickle contains a Python tuple with two Miller arrays, the first giving all intensites input into the calculation, and the second assigning each intensity to a lattice [crystal] number. For development, not actually used for anything.
  • ${tag}_lookup.pickle is a Python dictionary whose keys are the reindexing operators, and values are lists of lattice [crystal] numbers to be reindexed accordingly.
  • ${tag}_reverse_lookup.pickle is a Python dictionary whose keys are the original integrated data pickle file names, and values are the reindexing operator to be applied to each file. This output is actually used in the next step.