Cppxfel Indexing: Difference between revisions

From cctbx_xfel
Jump to navigation Jump to search
No edit summary
mNo edit summary
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
This covers the generation of indexing solutions using DIALS within the ''cppxfel'' distribution. This assumes a successful [[Cppxfel Installation | installation of cppxfel]] including the DIALS dependency.
This page is under development!
 
The original version of cppxfel indexed using DIALS, but if the unit cell dimensions are known, cppxfel itself can also be used to index images using the TakeTwo algorithm. This page details how to index using cppxfel.


== Downloading test data ==
== Downloading test data ==
Line 9: Line 11:
</pre>
</pre>


== Running DIALS on the set of 1000 images ==
== Running DIALS spot-finding on the set of 1000 images ==


The scripts included in ''cppxfel'' to run DIALS uses four cores by default. If you have more cores available on the current machine, this can be edited by setting the environment variable NSLOTS:
The scripts included in ''cppxfel'' to run DIALS uses four cores by default. This is to generate the spot-finding parameters for good indexing rates. If you have more cores available on the current machine, this can be edited by setting the environment variable NSLOTS:


<pre>
<pre>
Line 17: Line 19:
</pre>
</pre>


DIALS uses the options found in the two files <code>find_spots.options</code> and <code>index.options</code> for spotfinding/indexing the data. These should be generated for <code>find_spots.options</code> with these parameters:
DIALS uses the options found in the two files <code>find_spots.options</code> for spotfinding. These should be generated for <code>find_spots.options</code> with these parameters:


<pre>
<pre>
cat > find_spots.options << EOF
cat > find_spots.options << EOF
gain=14.0 min_spot_size=1 global_threshold=100
gain=14.0 min_spot_size=2 global_threshold=100
EOF
EOF
</pre>
</pre>


This should be similarly created for <code>index.options</code>:
Note, in the tcsh shell, this is not a valid way of creating a text file. In this case, create an appropriate file in your favourite text editor with the text <code>gain=14.0 min_spot_size=2 global_threshold=100</code>.
 
The DIALS scripts can now be run as follows:


<pre>
<pre>
cat > index.options << EOF
cppxfel.run_dials shot*.pickle index=no
auto_reduction.action=fix outlier.algorithm=null indexing.method=real_space_grid_search unit_cell=106.1,106.1,106.1 space_group=I23 minimum_number_of_reflections=20 detector.fix=all beam.fix=all recycle_unindexed_reflections=True
</pre>
EOF
 
The term <code>index=no</code> prevents DIALS from running indexing on these images, as we plan to index using cppxfel.
 
The spot-finding results made by DIALS should be converted to a new format for reading into ''cppxfel''. This is achieved by running the command:
 
<pre>
cppxfel.gen shot*.pickle
</pre>
 
This will generate a very simple <code>_XXX_strong.list</code> text file for every <code>_XXX_strong.pickle</code> file, as well as prepare the images for ''cppxfel''. For an easy analysis of the number of spots per image, you can use the command
 
<pre>
wc -l *strong.list
</pre>
</pre>


The DIALS scripts can now be run as follows:
This will also create shells of <code>index.txt</code>, <code>integrate.txt</code>, <code>refine.txt</code> and <code>merge.txt</code>.
 
== Preparing for TakeTwo ==
 
Have a look at the contents of the <code>index.txt</code> shell.


<pre>
<pre>
cppxfel.run_dials shot*.pickle
cat index.txt
</pre>
</pre>


== Assessing the output of DIALS indexing ==
The output will look something like this:
 
<pre>
ORIENTATION_MATRIX_LIST matrices.dat
NEW_MATRIX_LIST indexed.dat
# Be sure to set the UNIT_CELL and SPACE_GROUP for indexing. cppxfel cannot index without this knowledge.
SPACE_GROUP 0
UNIT_CELL 0 0 0 0 0 0
 
MM_PER_PIXEL 0.11
BEAM_CENTRE 881.755 881.5075
DETECTOR_DISTANCE 90.9988
INTEGRATION_WAVELENGTH 1.45825667181


The output from each DIALS find_spots and indexing event are stored in the <code>*find_spots.log</code> and <code>*index.log</code> files for each image name. The strong spots found in each individual image are stored in <code>_*_strong.pickle</code> for each respective image name. Any successfully indexed images follow the format <code>_*_experiments.json</code>. The number of indexed images can be roughly counted as so:
PANEL_LIST panels.txt
METROLOGY_SEARCH_SIZE 2
 
# If your crystal is highly mosaic or the detector is quite far back you may need to increase the padding values.
SHOEBOX_FOREGROUND_PADDING 1
SHOEBOX_NEITHER_PADDING 2
SHOEBOX_BACKGROUND_PADDING 3
 
# If you see too many spots, increase the intensity threshold.
INTENSITY_THRESHOLD 12
ABSOLUTE_INTENSITY OFF
 
OVER_PRED_BANDWIDTH 0.07
 
REFINE_ORIENTATIONS ON
ROUGH_CALCULATION ON
 
# Specifies maximum multiple lattices to index in total
SOLUTION_ATTEMPTS 1
 
# Maximum reciprocal distance from spot to spot to consider for analysis.
# A maximum reciprocal distance of 0.1 would be equivalent separation
# between the beam centre and the 10 Angstrom resolution ring.
MAX_RECIPROCAL_DISTANCE 0.15
 
# Initial rlp size: used to determine the tolerances for the vector lengths in the crystal.
# For a 1 micron crystal with no mosaicity, the initial rlp size is 0.0001 Ang^-1 (i.e.,
# 1 / 10000 Ang). To be more strict for indexing, lower this number; to be less strict increase it.
INITIAL_RLP_SIZE 0.0001
 
# If you wish to see more verbose output, change to 1 (moderate), or 2 (debug, usually too much).
VERBOSITY_LEVEL 0
 
COMMANDS
 
INDEX
</pre>
 
Note that some parameters have not been initialised. Edit these lines in order to supply the correct information. The edited lines are shown below, but check the entire input. The space group and unit cell are essential for ''cppxfel'' indexing: it cannot currently index without a known unit cell and space group.


<pre>
<pre>
find _*_experiments.json | wc -l
SPACE_GROUP 197
UNIT_CELL 106.1 106.1 106.1 90 90 90
</pre>
</pre>


== Converting the results of DIALS indexing for ''cppxfel''==
When the <code>index.txt</code> file is ready, you may run indexing on the data:
 
<pre>
cppxfel.run -i index.txt
</pre>


The images and matrices indexed by DIALS should be converted to a new format for reading into ''cppxfel''. This is achieved by running the command:
Wavelength histograms should appear every time an image is successfully indexed:


<pre>
<pre>
cppxfel.input_gen
Wavelength histogram for shot-s00-20130316164947655.img
1.356
1.366
1.377
1.387 ...
1.397 ....
1.407 ......
1.417 ..
1.428 ....
1.438 ..
1.448 .....
1.458 ................................................................
1.468 ................................................................
1.479 ........
1.489 ..
1.499 ....
1.509 ....
1.52 ...
1.53
1.54
1.55
1.56
</pre>
</pre>


This runs on the number of cores specified by the <code>NSLOTS</code> environment variable to extract the appropriate image data, and also generates several input files: <code>panels.txt</code>, <code>integrate.txt</code>, <code>refine.txt</code>, <code>merge.txt</code>, which will be explained in the following tutorials. The indexed images are extracted to <code>.img</code> files.
At the end of the run, it should create a file called <code>integrate-indexed.dat</code> which can be fed into integration.

Latest revision as of 14:26, 4 May 2016

This page is under development!

The original version of cppxfel indexed using DIALS, but if the unit cell dimensions are known, cppxfel itself can also be used to index images using the TakeTwo algorithm. This page details how to index using cppxfel.

Downloading test data

Data can be downloaded as a [zip file] from the DIALS website and should be extracted to a new folder.

tar zxvf ginn_jac_cpv17.tgz

Running DIALS spot-finding on the set of 1000 images

The scripts included in cppxfel to run DIALS uses four cores by default. This is to generate the spot-finding parameters for good indexing rates. If you have more cores available on the current machine, this can be edited by setting the environment variable NSLOTS:

export NSLOTS=16

DIALS uses the options found in the two files find_spots.options for spotfinding. These should be generated for find_spots.options with these parameters:

cat > find_spots.options << EOF
gain=14.0 min_spot_size=2 global_threshold=100
EOF

Note, in the tcsh shell, this is not a valid way of creating a text file. In this case, create an appropriate file in your favourite text editor with the text gain=14.0 min_spot_size=2 global_threshold=100.

The DIALS scripts can now be run as follows:

cppxfel.run_dials shot*.pickle index=no

The term index=no prevents DIALS from running indexing on these images, as we plan to index using cppxfel.

The spot-finding results made by DIALS should be converted to a new format for reading into cppxfel. This is achieved by running the command:

cppxfel.gen shot*.pickle

This will generate a very simple _XXX_strong.list text file for every _XXX_strong.pickle file, as well as prepare the images for cppxfel. For an easy analysis of the number of spots per image, you can use the command

wc -l *strong.list

This will also create shells of index.txt, integrate.txt, refine.txt and merge.txt.

Preparing for TakeTwo

Have a look at the contents of the index.txt shell.

cat index.txt

The output will look something like this:

ORIENTATION_MATRIX_LIST matrices.dat
NEW_MATRIX_LIST indexed.dat
# Be sure to set the UNIT_CELL and SPACE_GROUP for indexing. cppxfel cannot index without this knowledge.
SPACE_GROUP 0
UNIT_CELL 0 0 0 0 0 0 

MM_PER_PIXEL 0.11
BEAM_CENTRE 881.755 881.5075
DETECTOR_DISTANCE 90.9988
INTEGRATION_WAVELENGTH 1.45825667181 

PANEL_LIST panels.txt
METROLOGY_SEARCH_SIZE 2

# If your crystal is highly mosaic or the detector is quite far back you may need to increase the padding values.
SHOEBOX_FOREGROUND_PADDING 1
SHOEBOX_NEITHER_PADDING 2
SHOEBOX_BACKGROUND_PADDING 3

# If you see too many spots, increase the intensity threshold.
INTENSITY_THRESHOLD 12
ABSOLUTE_INTENSITY OFF

OVER_PRED_BANDWIDTH 0.07

REFINE_ORIENTATIONS ON
ROUGH_CALCULATION ON

# Specifies maximum multiple lattices to index in total
SOLUTION_ATTEMPTS 1

# Maximum reciprocal distance from spot to spot to consider for analysis.
# A maximum reciprocal distance of 0.1 would be equivalent separation
# between the beam centre and the 10 Angstrom resolution ring.
MAX_RECIPROCAL_DISTANCE 0.15

# Initial rlp size: used to determine the tolerances for the vector lengths in the crystal.
# For a 1 micron crystal with no mosaicity, the initial rlp size is 0.0001 Ang^-1 (i.e.,
# 1 / 10000 Ang). To be more strict for indexing, lower this number; to be less strict increase it.
INITIAL_RLP_SIZE 0.0001

# If you wish to see more verbose output, change to 1 (moderate), or 2 (debug, usually too much).
VERBOSITY_LEVEL 0

COMMANDS

INDEX

Note that some parameters have not been initialised. Edit these lines in order to supply the correct information. The edited lines are shown below, but check the entire input. The space group and unit cell are essential for cppxfel indexing: it cannot currently index without a known unit cell and space group.

SPACE_GROUP 197
UNIT_CELL 106.1 106.1 106.1 90 90 90

When the index.txt file is ready, you may run indexing on the data:

cppxfel.run -i index.txt

Wavelength histograms should appear every time an image is successfully indexed:

Wavelength histogram for shot-s00-20130316164947655.img
1.356	
1.366	
1.377	
1.387	...
1.397	....
1.407	......
1.417	..
1.428	....
1.438	..
1.448	.....
1.458	................................................................
1.468	................................................................
1.479	........
1.489	..
1.499	....
1.509	....
1.52	...
1.53	
1.54	
1.55	
1.56	

At the end of the run, it should create a file called integrate-indexed.dat which can be fed into integration.