Difference between revisions of "Resolving an Indexing Ambiguity"

From cctbx_xfel
Jump to: navigation, search
(Detailed Step-By-Step Instructions)
Line 6: Line 6:
  
 
== Detailed Step-By-Step Instructions ==
 
== Detailed Step-By-Step Instructions ==
 +
 +
=== Create a Database of Observations ===
 +
 +
<pre>
 +
$ vi myoglobin_step1.csh
 +
#!/bin/csh -f
 +
set trial=${1}
 +
 +
set runs = 127,130,132,134,135,140,141,142,144
 +
set datastring = \
 +
`python -c "print ' '.join(['data=/my_results/L785/r%04d/${trial}/integration'%i for i in [${runs}]])"`
 +
set tag = myoglobin_${trial}
 +
 +
set effective_params = "d_min=2.0 \
 +
output.n_bins=20 \
 +
${datastring} \
 +
scaling.algorithm=mark1 \
 +
target_unit_cell=90.3,90.3,45.2,90,90,120 \
 +
target_space_group=P6 \
 +
nproc=16 \
 +
merge_anomalous=True \
 +
mysql.runtag=${tag} \
 +
mysql.passwd=terp888 \
 +
mysql.user=nick \
 +
mysql.database=xfelnks \
 +
scaling.mtz_file=fake_filename.mtz \
 +
output.prefix=${tag}"
 +
 +
cxi.merge ${effective_params} # Note the xmerge script is NOT run here
 +
 +
$./myoglobin_step1.csh 009 # create the database from trial 009
 +
</pre>
 +
 +
Noteworthy parameters:
 +
* <code>d_min</code> is the high-resolution limit to be used for resolving the indexing ambiguity, not final merging.  There is a trade-off:  including too many data slows down the determination of lattice-lattice correlation coefficients, which scales as the number of common Miller indices.  Including too few data makes the determination of correlation coefficients unreliable or undefined.  Future:  the program will print out <L>, the average number of observation pairs per correlation coefficient so the resolution limit can be sensibly adjusted. 
 +
* <code>scaling.algorithm</code> is set to mark1 here (no scaling) to illustrate how data would be processed from an unknown structure.  As a consequence (see the [[Advanced Merging]] page) we set the <code>target_unit_cell</code> and <code>target_space_group</code> but do not provide a PDB <code>model</code>.  Also <code>scaling.mtz_file</code> is set to a dummy value.
 +
* <code>nproc</code> is the number of processors to be used for writing the database.  Use as many single-host processors as are available up to a limit of about 16, beyond which there is no further benefit, at least on Linux.
 +
* <code>merge_anomalous=True</code>. Since we are trying to maximize the number of common Miller indices for each lattice pair we want to merge the Bijvoet pairs, even if we will look for dispersive differences in the final merging step.
 +
* <code>backend</code>. Use whatever backend is available on your system; choices are FS, MySQL (default), and SQLite.

Revision as of 23:53, 28 December 2013

Here we describe the use of Brehm & Diederichs algorithm 2 to resolve the indexing ambiguity for XFEL data. This is applicable for all polar space groups (where the Bravais symmetry is higher than the space group symmetry) and also for cases with pseudo symmetry (e.g., a monoclinic cell with a near 90-degree beta angle).

Brief Description of the Workflow

It is assumed that the reader is familiar with the tutorial for Merging. The usual workflow of cxi.merge followed by cxi.xmerge will fail in the xmerge step if there are reindexing operators (not H,K,L) that relate the individual indexed lattices to one another. We resolve this by (1) running the cxi.merge step to generate a database with all the observations; (2) use Brehm-Diederichs algorithm 2 to identify the reindexing operators with cxi.brehm_diederichs; (3) re-run the cxi.merge + cxi.xmerge process with the additional list of reindexing operators as input.

Detailed Step-By-Step Instructions

Create a Database of Observations

$ vi myoglobin_step1.csh
#!/bin/csh -f
set trial=${1}

set runs = 127,130,132,134,135,140,141,142,144
set datastring = \
`python -c "print ' '.join(['data=/my_results/L785/r%04d/${trial}/integration'%i for i in [${runs}]])"`
set tag = myoglobin_${trial}

set effective_params = "d_min=2.0 \
output.n_bins=20 \
${datastring} \
scaling.algorithm=mark1 \
target_unit_cell=90.3,90.3,45.2,90,90,120 \
target_space_group=P6 \
nproc=16 \
merge_anomalous=True \
mysql.runtag=${tag} \
mysql.passwd=terp888 \
mysql.user=nick \
mysql.database=xfelnks \
scaling.mtz_file=fake_filename.mtz \
output.prefix=${tag}"

cxi.merge ${effective_params} # Note the xmerge script is NOT run here

$./myoglobin_step1.csh 009 # create the database from trial 009

Noteworthy parameters:

  • d_min is the high-resolution limit to be used for resolving the indexing ambiguity, not final merging. There is a trade-off: including too many data slows down the determination of lattice-lattice correlation coefficients, which scales as the number of common Miller indices. Including too few data makes the determination of correlation coefficients unreliable or undefined. Future: the program will print out <L>, the average number of observation pairs per correlation coefficient so the resolution limit can be sensibly adjusted.
  • scaling.algorithm is set to mark1 here (no scaling) to illustrate how data would be processed from an unknown structure. As a consequence (see the Advanced Merging page) we set the target_unit_cell and target_space_group but do not provide a PDB model. Also scaling.mtz_file is set to a dummy value.
  • nproc is the number of processors to be used for writing the database. Use as many single-host processors as are available up to a limit of about 16, beyond which there is no further benefit, at least on Linux.
  • merge_anomalous=True. Since we are trying to maximize the number of common Miller indices for each lattice pair we want to merge the Bijvoet pairs, even if we will look for dispersive differences in the final merging step.
  • backend. Use whatever backend is available on your system; choices are FS, MySQL (default), and SQLite.