Cctbx.prime
Prime: post-refinement and merging
With the latest update, prime can be used to process data on multiple nodes (on a queuing system). At the moment, only LSF (bsub) is supported. See documentation below for more information on how to use the queuing system.
This major update replaces prime.postrefine with
prime.run
For auto mode, you can still use prime.run with your parameter phil file like before. For manual mode, the available sub commands in prime are:
prime.genref #generates a reference set from given integration results prime.postrefine #refines all images prime.merge #merges all refined results for an mtz file
You can choose to run these commands independently (ideally in the above order) using the same phil file. See "PRIME flowchart". This will give you the freedom to change something (e.g. set of parameters to refine, resolution cut-off, etc.) at different stages of the post-refinement and merging. See running prime in manual mode for more detail.
Step-by-step guidelines to post-refine and merge XFEL diffraction images. For more detail and citation, see "Enabling X-ray Free Electron Laser Crystallography for Challenging Biological Systems from a Limited Number of Crystals" "DOI: http://dx.doi.org/10.7554/eLife.05421"
Prime is gui-ed
Thanks to Dr. Lyubimov, PRIME is also available as a Graphic User Interface program. Try it by running
prime
Click to see "PRIME main gui" and "Advanced options"
Getting started
Generating input phil file
Like most programs developed under cctbx framework, prime reads in input .phil file, which stores all the parameters needed to run post-refinement and merging steps. To generate the template .phil file, do the dry run by calling
$ prime.run
An example of the template .phil file:
data = None run_no = None title = None scale { d_min = 0.1 d_max = 99 sigma_min = 1.5 } ...
You can save the content of the output to any file name - in this tutorial, let's save it to thermolysin.phil.
First look at your phil file
To run prime, set the required parameters to match with your experiments (you can leave other parameters with their default values - or just delete them from you .phil file). The most interesting parameters are shown below:
data = /path/to/your/integarion/result/pickle_files run_no = 001 title = First trial for thermolysin scale { d_min = 2.1 d_max = 45 sigma_min = 1.5 } postref { scale { d_min = 2.1 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 } crystal_orientation { flag_on = True d_min = 2.1 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 } reflecting_range { flag_on = True d_min = 2.1 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 } unit_cell { flag_on = True d_min = 2.1 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 uc_tolerance = 3 } allparams { flag_on = False d_min = 2.1 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 uc_tolerance = 3 } } merge { d_min = 2.1 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 uc_tolerance = 3 } target_unit_cell = 93.99,93.99,130.87,90,90,120 target_space_group = P 61 2 2 pixel_size_mm = 0.102
You should pay attention to d_min and d_max for the refinement and merging parameters. If you use IOTA to integrate the images, IOTA will output .phil file for prime that has the optimal resolution range. If not, a few trial-and-error runs may be required to get the best resolution range for your dataset. Use merging statistics output by prime and check the values of CC1/2 and I/sigI to find out your optimal resolution range.
Cell parameters (target_unit_cell and target_space_group) are required to run prime. Target cell parameter is used to remove some outlier images by controlling uc_tolerance parameter (the default value of tolerate range is 3% different). Space group parameter is used in removing outliers and merging with the given symmetry.
Don't forget also to change your pixel size in millimeters. Check what your detector is and note down its pixel size.
Running post-refinement in automatic mode
Once you have the input .phil file, you can run prime by calling
prime.run thermolysin.phil
Prime will post-refine and merge for reflection sets using three (default value) macrocycles. At the end of the run, you can obtain merging statistics in the last cycle - all other cycle statistics are also available in log.txt.
An example of merging statistics:
Summary for 001/postref_cycle_1_merge.mtz Bin Resolution Range Completeness <N_obs> |Rsplit CC1/2 N_ind |CCanom N_ind| <I/sigI> <I> ------------------------------------------------------------------------------------------------------------- 02 5.70 - 4.52 100.0 1055 / 1055 65.89 16.02 89.15 1055 0.00 0 20.17 2101.97 03 4.52 - 3.95 100.0 1032 / 1032 61.53 14.48 92.03 1032 0.00 0 20.39 2529.90 04 3.95 - 3.59 100.0 1016 / 1016 54.15 15.61 90.13 1016 0.00 0 16.69 1971.43 05 3.59 - 3.33 100.0 1004 / 1004 42.67 17.66 89.23 1004 0.00 0 14.21 1502.14 06 3.33 - 3.14 100.0 1013 / 1013 32.77 20.40 84.26 1013 0.00 0 11.76 1077.60 07 3.14 - 2.98 100.0 995 / 995 27.36 23.00 78.72 995 0.00 0 11.58 935.37 08 2.98 - 2.85 100.0 1006 / 1006 23.57 22.63 82.26 1006 0.00 0 10.56 722.62 09 2.85 - 2.74 100.0 986 / 986 16.64 28.51 72.90 985 0.00 0 10.01 591.56 10 2.74 - 2.65 99.9 989 / 990 12.41 31.35 72.95 987 0.00 0 9.91 515.07 11 2.65 - 2.56 99.7 979 / 982 9.35 37.14 65.31 970 0.00 0 9.31 438.96 12 2.56 - 2.49 98.0 979 / 999 6.06 45.98 45.37 930 0.00 0 9.45 390.05 13 2.49 - 2.42 95.1 931 / 979 4.46 50.68 34.20 834 0.00 0 8.93 334.80 14 2.42 - 2.37 91.7 896 / 977 3.35 55.66 37.15 729 0.00 0 9.27 320.17 15 2.37 - 2.31 83.9 829 / 988 2.61 56.92 43.21 600 0.00 0 9.60 296.67 16 2.31 - 2.26 72.4 702 / 969 1.97 65.81 26.89 386 0.00 0 10.29 284.39 17 2.26 - 2.22 59.1 582 / 985 1.75 64.72 31.28 275 0.00 0 9.87 284.06 18 2.22 - 2.18 52.9 513 / 970 1.51 71.27 16.86 188 0.00 0 8.93 215.31 19 2.18 - 2.14 35.7 349 / 978 1.32 62.26 68.25 90 0.00 0 8.22 199.09 20 2.14 - 2.10 23.1 227 / 981 1.20 92.14 -9.20 42 0.00 0 8.59 224.44 ------------------------------------------------------------------------------------------------------------- TOTAL 85.9 17224 / 20046 27.11 21.11 92.07 15305 0.00 0 12.87 999.53 ------------------------------------------------------------------------------------------------------------- Summary of refinement and merging No. good frames: 1809 No. bad cc frames: 153 No. bad G frames) : 0 No. bad unit cell frames: 5 No. bad gamma_e frames: 0 No. bad SE: 0 No. observations: 466997
Solving indexing ambiguity (New)
- SOFTWARE UPDATE REQUIRED *
With the latest version (Aug 31, 2016), you can solve the indexing ambiguity problem directly in prime. The Brehm & Diederichs algorithms (doi:10.1107/S1399004713025431) have been implemented with bootstrap capability to handle large dataset.
For merohedral twining group, the indexing choices will be determined automatically in prime. Use this default setting in your .phil file,
indexing_ambiguity { mode = Auto index_basis_in = None assigned_basis = None n_sample_frames = 300 n_selected_frames = 100 }
The n_sample_frames parameter indicates no. of images that will be used for the calculation of the scoring function. After that, only n_selected_images will be used in the B&D algorithms. This saves a lot of computing time since only the selected images will be used for the determination of the ambiguity. You can change these two parameters to fit with your experiments. The default values are 300 and 100 (give 300 - use 100).
For pseudo-merohedral twinning, due to different possibilities for the indexing choice, prime doesn't determine these choices automatically. If you suspect that you may have pseudo twinning (b and c are similar, beta angle is almost 90 degree but not quite), you have an option to force prime to determine the ambiguity according to your choices.
indexing_ambiguity { mode = Forced index_basis_in = None assigned_basis = -h,l,k assigned_basis = -k, l, h n_sample_frames = 300 n_selected_frames = 100 }
When you set indexing_ambiguity.mode to Forced, you can assign indexing choices according to your problem. In this example, two more choices (-h, l, k and -k, l, h) were assigned as the indexing choice.
At the end of the run, your solution pickle is saved to your_run_no/index_ambiguity/solution_pickle.pickle. If you don't want to spend time solving the ambiguity again in the next run, you can reuse this solution pickle by setting these parameters:
indexing_ambiguity { mode = Auto index_basis_in = your_run_no/index_ambiguity/solution_pickle.pickle }
This will bypass the indexing ambiguity module. Prime will use the solution file to perform normal post-refinement and merging.
To use another isomorphous dataset (e.g. from synchrotron experiment), you can specify the mtz file as part of these parameters:
indexing_ambiguity { mode = Auto index_basis_in = path/to/your/mtz/file.mtz }
Again, you can choose to do Auto or Forced (with a list of assigned_basis parameters) depending on your problem.
More detail with input parameters
Now that you have your first trial merged data set, you can explore different parameter settings to merge or to obtain the Bijvoet pairs (I+/I-) for your anomalous data set.
Anomalous data:
target_anomalous_flag = True
In the last cycle, prime will output a reflection set with I+ and I-.
Number of micro- and macrocycles
n_postref_cycle = 3 n_postref_sub_cycle = 1
Number of bins for merging statistics
n_bins = 20
Help with input parameters
Most input parameters are self-explained. However, you can run -h switch to view help information for each parameter.
prime.run -h
Running in manual mode
With the same phil file, you can run prime manually. This gives you more freedom in terms of parameter settings at different stages (generating reference set, post-refining images, and merging) or at different cycle of post-refinement.
Example A: I want to generate a reference set then post-refine all the images on the scale factors only for three cycles then refine all parameters in the 4th cycle. To do this, you can follow these steps:
To generate a reference set,
prime.genref prime.phil
To post-refine on scale factors only, modify your .phil file so that all parameters are turned off.
... scale { d_min = 2.5 d_max = 45 sigma_min = 1.5 } postref { residual_threshold = 5 residual_threshold_xy = 5 scale { d_min = 2.5 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 } crystal_orientation { flag_on = False d_min = 2.5 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 } reflecting_range { flag_on = False d_min = 2.5 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 } unit_cell { flag_on = False d_min = 2.5 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 uc_tolerance = 5 } allparams { flag_on = False d_min = 2.5 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 uc_tolerance = 5 } } ... n_postref_cycle = 3 ...
Then run,
prime.postrefine prime.phil
To refine all parameters one more cycle, update your .phil file again (flag_on = True)
... allparams { flag_on = True d_min = 2.5 d_max = 45 sigma_min = 1.5 partiality_min = 0.1 uc_tolerance = 5 } } ... n_postref_cycle = 1 ...
Then run,
prime.postrefine prime.phil
To obtain the final merged mtz, run
prime.merge prime.phil
Running on multiple nodes
For LCLS users (or other users with LSF bsub), you can use psana (or your) queuing system to parallelize the entire process. For example, if you want to run your job on 100 nodes using psanq, you can specify:
queue { mode = bsub qname = psanaq n_nodes = 100 } timeout_seconds = 300
Prime will divide all the images into 100 batches and submit them to different nodes. It will wait until all images in every batches are done before returning to the merging step (or the exit step in the manual mode). You can control timeout_seconds parameter to tell prime how long it should wait for all the image batches to finish. Usually, this timeout parameter is not used (all images should return before 300 seconds) but in case, you need to wait longer or shorter, you can modify this parameter.