2017 prime tutorial
Post-refine and Merge Sample Data Set with PRIME (2017 Tutorial)
In this tutorial, we will work on the integration results from the first part of Tutorial 2 (Myoglobin Data). Before proceeding to running the program, we'll consider making the input file for PRIME based on the situation of this data set.
Generating Input File
PRIME input files contain information necessary for successful post-refinement and merging steps. You can access and review the list of input parameters by running prime.run or prime.run -h to view the description of these parameters. For this tutorial we'll start building it from scratch.
- Location of integration results
In this case, we know the location where the integration results (pickle files) are. We can then set,
data = /net/viper/raid1/mu238/XfelProject/dials17/extracted
Note that you can supply data parameter as multiple arguments. The value of the parameter can be a file containing list of integration results or a folder.
- Unit cell information
You can obtain the mean (or median) unit-cell dimensions from either IOTA or DIALS. In case of IOTA, prime .phil file is auto generated and this information is readily available in there. For n_residues, enter number of residues in asymmetric unit of your molecule.
target_unit_cell = 91.7 91.7 46 90 90 120 target_space_group = P6 n_residues = 128
- Detector information
pixel_size_mm = 0.172
- Post-refinement and Scaling information
This is where you specify the optimal resolution cutoffs for post-refinement and merging. Note that when running for the first time on you newly collected data, you can choose the "expected" values (resolution which you see the spots at the corner or on the edge). You can then adjust these parameters when analyzing merging statistics based on the I/sigI values in the high resolution shells and rerun the program again. Note that sigma cutoffs are set to 1.5 in scaling and post-refinement steps while it's set to -3.0 so we can include negative values in the merged reflection set.
scale { d_min = 2.5 d_max = 20 sigma_min = 1.5 } postref { scale { d_min = 2.5 d_max = 20 sigma_min = 1.5 partiality_min = 0.1 allparams { flag_on = True d_min = 2.5 d_max = 20 sigma_min = 1.5 partiality_min = 0.1 uc_tolerance = 5 } } merge { d_min = 2.5 d_max = 20 sigma_min = -3.0 partiality_min = 0.1 uc_tolerance = 5 }
- Indexing ambiguity
For other sets that are not in polar space or have indexing ambiguity (when one or more of the unit-cell dimensions are very similar but not the same!), you can very well use the .phil file parameters thus far to proceed and run post-refinement. However, this data set is in P6 (polar space group) and therefore, the indexing ambiguity needs to be resolved prior to other refinement and merging steps.
Other point worth noting is for any polar space groups, PRIME will automatically solve the ambiguity based on the default parameters. However, this data set has about 5,000 integration results so we want to make sure that we modify the number of images used for random and best selections.
indexing_ambiguity { mode = Auto index_basis_in = None assigned_basis = None d_min = 3.0 d_max = 10.0 sigma_min = 1.5 n_sample_frames = 1000 n_selected_frames = 100 }
We left other parameters to their default value and modified n_sample_frames to 1000 and n_selected_frames to 100.
- No. of Bin
n_bins = 10
Now we have a complete .phil file ready to run.
data = /net/viper/raid1/mu238/XfelProject/dials17/extracted target_unit_cell = 91.7 91.7 46 90 90 120 target_space_group = P6 n_residues = 128 pixel_size_mm = 0.172 scale { d_min = 2.5 d_max = 20 sigma_min = 1.5 } postref { scale { d_min = 2.5 d_max = 20 sigma_min = 1.5 partiality_min = 0.1 } allparams { flag_on = True d_min = 2.5 d_max = 20 sigma_min = 1.5 partiality_min = 0.1 uc_tolerance = 5 } } merge { d_min = 2.5 d_max = 20 sigma_min = -3.0 partiality_min = 0.1 uc_tolerance = 5 } indexing_ambiguity { mode = Auto index_basis_in = None assigned_basis = None d_min = 3.0 d_max = 10.0 sigma_min = 1.5 n_sample_frames = 1000 n_selected_frames = 100 } n_bins = 10
Copy and paste this set of parameter in an editor then save the file as "prime.phil".
Running the Program
You can run the program by giving it an input file:
prime.run prime.phil
For this tutorial, PRIME will score the randomly selected 1,000 images then select the best 100 for running Brehm & Diederichs algorithm in Bootstrap mode. If you run the program with flag_plot=True, you'll see a plot showing two separated clusters, each representing images with matching assigned basis.
Results of Running Indexing Ambiguity with Boostrap
PRIME will select on of these two clusters and merge it to get a reference set for the Bootstrap step. Here, the remaining images will get assigned with a basis that makes it correlate best with the reference set.
Once all images are assigned with appropriate basis, PRIME will proceed to scaling and post-refinement steps. After three post-refinement cycles (default value), the process is done and here is the output of the program.
Isotropic B-factor: 5.30 No. of reflections all: 7786 outside resolution: 51 outliers: 0 total left: 7735 Summary for Prime_Run_1/postref_cycle_3_merge.mtz Bin Resolution Range Completeness N_obs |Rmerge Rsplit CC1/2 N_ind |CCiso N_ind|CCanoma N_ind| I/sigI I sigI I**2 -------------------------------------------------------------------------------------------------------------------------------------------------- 01 19.88 - 5.35 100.0 807 / 807 189.78 85.75 8.87 98.43 807 0.00 0 0.00 0 4.46 684.2 136.2 3.57 02 5.35 - 4.26 100.0 782 / 782 140.90 73.61 8.89 97.67 782 0.00 0 0.00 0 5.23 794.3 140.3 2.05 03 4.26 - 3.73 100.0 788 / 788 129.32 69.32 8.70 98.11 788 0.00 0 0.00 0 5.41 878.5 150.3 1.95 04 3.73 - 3.39 100.0 765 / 765 117.75 70.72 9.67 97.55 765 0.00 0 0.00 0 4.12 712.3 162.4 1.88 05 3.39 - 3.15 100.0 770 / 770 113.22 71.61 11.22 88.54 770 0.00 0 0.00 0 2.73 500.1 173.7 2.19 06 3.15 - 2.96 100.0 767 / 767 106.12 73.19 11.07 97.21 767 0.00 0 0.00 0 2.09 404.6 183.8 2.02 07 2.96 - 2.81 100.0 766 / 766 103.73 75.79 12.53 96.62 766 0.00 0 0.00 0 1.72 345.3 193.9 1.89 08 2.81 - 2.69 100.0 745 / 745 101.51 76.11 12.84 96.21 745 0.00 0 0.00 0 1.49 317.3 204.1 1.98 09 2.69 - 2.59 100.0 786 / 786 97.88 77.59 14.19 95.40 786 0.00 0 0.00 0 1.38 299.0 209.3 1.86 10 2.59 - 2.50 100.0 759 / 759 92.53 78.61 14.72 96.51 759 0.00 0 0.00 0 1.36 312.4 218.8 2.07 -------------------------------------------------------------------------------------------------------------------------------------------------- TOTAL 100.0 7735 / 7735 119.73 75.51 10.45 97.21 7735 0.00 0 0.00 0 3.02 527.3 176.9 2.56 --------------------------------------------------------------------------------------------------------------------------------------------------
Summary of CC1/2 on three crystal axes Bin Resolution Range CC1/2 I N_refl a* b* c* | a* b* c* | a* b* c* --------------------------------------------------------------------------------------------------------- 01 19.88 - 5.35 97.01 98.64 98.23 528.6 559.6 1216.7 42 51 47 02 5.35 - 4.26 97.64 98.43 99.08 817.6 527.5 964.9 43 44 40 03 4.26 - 3.73 96.31 98.02 97.68 605.7 682.9 856.0 39 39 41 04 3.73 - 3.39 98.49 98.55 97.73 961.9 532.6 729.1 42 37 45 05 3.39 - 3.15 96.88 98.38 92.69 449.5 492.1 721.6 39 39 40 06 3.15 - 2.96 98.48 93.58 98.61 389.9 303.4 391.7 39 37 39 07 2.96 - 2.81 96.98 98.02 95.35 361.3 331.7 383.2 42 37 43 08 2.81 - 2.69 95.29 94.02 94.69 290.8 194.7 292.7 41 35 36 09 2.69 - 2.59 96.55 91.88 98.57 265.7 341.4 290.2 41 35 44 10 2.59 - 2.50 94.44 97.81 96.67 249.0 400.1 236.2 42 36 40 ---------------------------------------------------------------------------------------------------------- total 97.57 97.74 94.58 494.1 446.0 619.4 410 390 415 ----------------------------------------------------------------------------------------------------------
Summary of refinement and merging No. good frames: 4733 No. bad cc frames: 113 No. bad G frames) : 109 No. bad unit cell frames: 20 No. bad gamma_e frames: 22 No. bad SE: 2 No. observations: 935265 Mean target value (BEFORE: Mean Median (Std.)) post-refinement: 301.22 259.10 ( 171.56) (x,y) restraints: 1679.63 1573.15 ( 657.49) Mean target value (AFTER: Mean Median (Std.)) post-refinement: 300.02 257.53 ( 170.98) (x,y) restraints: 1679.90 1572.77 ( 660.19) SE: 1915.60 776.84 ( 33765.97) G: 1.000e+00 8.971e-01 ( 8.15e-01) B: 11.83 14.45 ( 11.95) Rot.x: -0.08 0.00 ( 12.10) Rot.y: 0.14 0.00 ( 9.62) gamma_y: 0.00000 0.00000 ( 0.00000) gamma_z: 0.00000 0.00000 ( 0.00000) gamma_0: 0.03793 0.00019 ( 0.60820) gamma_e: -0.12824 0.00145 ( 0.60227) voigt_nu: 0.50000 0.50000 ( 0.00000) unit cell a: 91.45 91.45 ( 0.11) b: 91.45 91.45 ( 0.11) c: 45.96 45.96 ( 0.12) alpha: 90.00 90.00 ( 0.00) beta: 90.00 90.00 ( 0.00) gamma: 120.00 120.00 ( 0.00) Parmeters from integration (not-refined) Wavelength: 0.96861 0.96861 ( 0.00000) Detector distance: 303.81868 303.81868 ( 0.00000) * (standard deviation)
Total calculation time: 542.00 seconds Finished: Tuesday 14. February 2017 10:53:18
Obtaining the Output
Your output will be in Prime_Run_n (where n is the number of run).
-bash-4.1$ ls Prime_Run_1/ -l total 9076 -rw-r--r-- 1 mu238 camb 879638 Feb 14 10:53 crystal.o drwxr-xr-x 2 mu238 camb 104 Feb 14 10:46 index_ambiguity drwxr-xr-x 2 mu238 camb 6 Feb 14 10:44 isoform_cluster -rw-r--r-- 1 mu238 camb 32704 Feb 14 10:53 log.txt -rw-r--r-- 1 mu238 camb 324556 Feb 14 10:47 mean_scaled_merge.hkl -rw-r--r-- 1 mu238 camb 157260 Feb 14 10:47 mean_scaled_merge.mtz -rw-r--r-- 1 mu238 camb 15753 Feb 14 10:53 pickle.stat -rw-r--r-- 1 mu238 camb 324381 Feb 14 10:49 postref_cycle_1_merge.hkl -rw-r--r-- 1 mu238 camb 157260 Feb 14 10:49 postref_cycle_1_merge.mtz -rw-r--r-- 1 mu238 camb 324515 Feb 14 10:51 postref_cycle_2_merge.hkl -rw-r--r-- 1 mu238 camb 157260 Feb 14 10:51 postref_cycle_2_merge.mtz -rw-r--r-- 1 mu238 camb 324716 Feb 14 10:53 postref_cycle_3_merge.hkl -rw-r--r-- 1 mu238 camb 157340 Feb 14 10:53 postref_cycle_3_merge.mtz -rw-r--r-- 1 mu238 camb 6412200 Feb 14 10:53 rejections.txt
File log.txt contains all the merging stats. The final merged reflection set is postref_cycle_3_merge.mtz (or .hkl).