Difference between revisions of "2017 prime tutorial"

From cctbx_xfel
Jump to: navigation, search
(Running the Program)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 
Post-refine and Merge Sample Data Set with PRIME (2017 Tutorial)
 
Post-refine and Merge Sample Data Set with PRIME (2017 Tutorial)
  
In this tutorial, we will work on the integration results from the first of Tutorial 2 (Myoglobin Data). Before proceeding to running the program, we'll consider making the input file for PRIME based on the situation of this data set.
+
In this tutorial, we will work on the integration results from the first part of Tutorial 2 (Myoglobin Data). Before proceeding to running the program, we'll consider making the input file for PRIME based on the situation of this data set.
  
 
== Generating Input File ==
 
== Generating Input File ==
Line 12: Line 12:
 
  data = /net/viper/raid1/mu238/XfelProject/dials17/extracted
 
  data = /net/viper/raid1/mu238/XfelProject/dials17/extracted
  
Note that you supply data as a multiple arguments. The value of the parameter can be a file containing list of integration results, a folder, or a wildcard argument.
+
Note that you can supply data parameter as multiple arguments. The value of the parameter can be a file containing list of integration results or a folder.
  
 
* Unit cell information  
 
* Unit cell information  
Line 111: Line 111:
 
   uc_tolerance = 5
 
   uc_tolerance = 5
 
  }
 
  }
 
 
  indexing_ambiguity {
 
  indexing_ambiguity {
 
   mode = Auto
 
   mode = Auto
Line 133: Line 132:
 
For this tutorial, PRIME will score the randomly selected 1,000 images then select the best 100 for running Brehm & Diederichs algorithm in Bootstrap mode. If you run the program with flag_plot=True, you'll see a plot showing two separated clusters, each representing images with matching assigned basis.  
 
For this tutorial, PRIME will score the randomly selected 1,000 images then select the best 100 for running Brehm & Diederichs algorithm in Bootstrap mode. If you run the program with flag_plot=True, you'll see a plot showing two separated clusters, each representing images with matching assigned basis.  
  
 +
[https://commons.wikimedia.org/wiki/File:Dials17_myo_indexing_ambiguity.png Results of Running Indexing Ambiguity with Boostrap]
 +
 +
[https://commons.wikimedia.org/wiki/File:Dials17_indexing_ambiguity_clustering.gif Results of Image Clustering]
  
 
PRIME will select on of these two clusters and merge it to get a reference set for the Bootstrap step. Here, the remaining images will get assigned with a basis that makes it correlate best with the reference set.
 
PRIME will select on of these two clusters and merge it to get a reference set for the Bootstrap step. Here, the remaining images will get assigned with a basis that makes it correlate best with the reference set.
  
 
Once all images are assigned with appropriate basis, PRIME will proceed to scaling and post-refinement steps. After three post-refinement cycles (default value), the process is done and here is the output of the program.
 
Once all images are assigned with appropriate basis, PRIME will proceed to scaling and post-refinement steps. After three post-refinement cycles (default value), the process is done and here is the output of the program.
 +
  
 
  Isotropic B-factor:    5.30
 
  Isotropic B-factor:    5.30
Line 158: Line 161:
 
  10    2.59 -    2.50 100.0    759 /    759  92.53  78.61  14.72  96.51    759    0.00      0    0.00      0    1.36      312.4    218.8  2.07
 
  10    2.59 -    2.50 100.0    759 /    759  92.53  78.61  14.72  96.51    759    0.00      0    0.00      0    1.36      312.4    218.8  2.07
 
  --------------------------------------------------------------------------------------------------------------------------------------------------
 
  --------------------------------------------------------------------------------------------------------------------------------------------------
        TOTAL        100.0  7735 /  7735  119.73  75.51  10.45  97.21  7735    0.00      0    0.00      0    3.02      527.3    176.9  2.56
+
        TOTAL        100.0  7735 /  7735  119.73  75.51  10.45  97.21  7735    0.00      0    0.00      0    3.02      527.3    176.9  2.56
 
  --------------------------------------------------------------------------------------------------------------------------------------------------
 
  --------------------------------------------------------------------------------------------------------------------------------------------------
  
Line 176: Line 179:
 
  10    2.59 -    2.50  94.44  97.81  96.67      249.0      400.1      236.2    42    36    40
 
  10    2.59 -    2.50  94.44  97.81  96.67      249.0      400.1      236.2    42    36    40
 
  ----------------------------------------------------------------------------------------------------------
 
  ----------------------------------------------------------------------------------------------------------
      total          97.57  97.74  94.58      494.1      446.0      619.4    410    390    415
+
        total          97.57  97.74  94.58      494.1      446.0      619.4    410    390    415
 
  ----------------------------------------------------------------------------------------------------------
 
  ----------------------------------------------------------------------------------------------------------
  
Line 217: Line 220:
 
  Total calculation time: 542.00 seconds
 
  Total calculation time: 542.00 seconds
 
  Finished: Tuesday 14. February 2017 10:53:18
 
  Finished: Tuesday 14. February 2017 10:53:18
 +
 +
== Obtaining the Output ==
 +
 +
Your output will be in Prime_Run_n (where n is the number of run).
 +
 +
-bash-4.1$ ls Prime_Run_1/ -l
 +
total 9076
 +
-rw-r--r-- 1 mu238 camb  879638 Feb 14 10:53 crystal.o
 +
drwxr-xr-x 2 mu238 camb    104 Feb 14 10:46 index_ambiguity
 +
drwxr-xr-x 2 mu238 camb      6 Feb 14 10:44 isoform_cluster
 +
-rw-r--r-- 1 mu238 camb  32704 Feb 14 10:53 log.txt
 +
-rw-r--r-- 1 mu238 camb  324556 Feb 14 10:47 mean_scaled_merge.hkl
 +
-rw-r--r-- 1 mu238 camb  157260 Feb 14 10:47 mean_scaled_merge.mtz
 +
-rw-r--r-- 1 mu238 camb  15753 Feb 14 10:53 pickle.stat
 +
-rw-r--r-- 1 mu238 camb  324381 Feb 14 10:49 postref_cycle_1_merge.hkl
 +
-rw-r--r-- 1 mu238 camb  157260 Feb 14 10:49 postref_cycle_1_merge.mtz
 +
-rw-r--r-- 1 mu238 camb  324515 Feb 14 10:51 postref_cycle_2_merge.hkl
 +
-rw-r--r-- 1 mu238 camb  157260 Feb 14 10:51 postref_cycle_2_merge.mtz
 +
-rw-r--r-- 1 mu238 camb  324716 Feb 14 10:53 postref_cycle_3_merge.hkl
 +
-rw-r--r-- 1 mu238 camb  157340 Feb 14 10:53 postref_cycle_3_merge.mtz
 +
-rw-r--r-- 1 mu238 camb 6412200 Feb 14 10:53 rejections.txt
 +
 +
File log.txt contains all the merging stats. The final merged reflection set is postref_cycle_3_merge.mtz (or .hkl).

Latest revision as of 20:38, 14 February 2017

Post-refine and Merge Sample Data Set with PRIME (2017 Tutorial)

In this tutorial, we will work on the integration results from the first part of Tutorial 2 (Myoglobin Data). Before proceeding to running the program, we'll consider making the input file for PRIME based on the situation of this data set.

Generating Input File

PRIME input files contain information necessary for successful post-refinement and merging steps. You can access and review the list of input parameters by running prime.run or prime.run -h to view the description of these parameters. For this tutorial we'll start building it from scratch.

  • Location of integration results

In this case, we know the location where the integration results (pickle files) are. We can then set,

data = /net/viper/raid1/mu238/XfelProject/dials17/extracted

Note that you can supply data parameter as multiple arguments. The value of the parameter can be a file containing list of integration results or a folder.

  • Unit cell information

You can obtain the mean (or median) unit-cell dimensions from either IOTA or DIALS. In case of IOTA, prime .phil file is auto generated and this information is readily available in there. For n_residues, enter number of residues in asymmetric unit of your molecule.

target_unit_cell = 91.7 91.7 46 90 90 120
target_space_group = P6
n_residues = 128
  • Detector information
pixel_size_mm = 0.172
  • Post-refinement and Scaling information

This is where you specify the optimal resolution cutoffs for post-refinement and merging. Note that when running for the first time on you newly collected data, you can choose the "expected" values (resolution which you see the spots at the corner or on the edge). You can then adjust these parameters when analyzing merging statistics based on the I/sigI values in the high resolution shells and rerun the program again. Note that sigma cutoffs are set to 1.5 in scaling and post-refinement steps while it's set to -3.0 so we can include negative values in the merged reflection set.

scale {
  d_min = 2.5
  d_max = 20
  sigma_min = 1.5
}
postref {
  scale {
    d_min = 2.5
    d_max = 20
    sigma_min = 1.5
    partiality_min = 0.1
  allparams {
    flag_on = True
    d_min = 2.5
    d_max = 20
    sigma_min = 1.5
    partiality_min = 0.1
    uc_tolerance = 5
  }
}
merge {
  d_min = 2.5
  d_max = 20
  sigma_min = -3.0
  partiality_min = 0.1
  uc_tolerance = 5
}
  • Indexing ambiguity

For other sets that are not in polar space or have indexing ambiguity (when one or more of the unit-cell dimensions are very similar but not the same!), you can very well use the .phil file parameters thus far to proceed and run post-refinement. However, this data set is in P6 (polar space group) and therefore, the indexing ambiguity needs to be resolved prior to other refinement and merging steps.

Other point worth noting is for any polar space groups, PRIME will automatically solve the ambiguity based on the default parameters. However, this data set has about 5,000 integration results so we want to make sure that we modify the number of images used for random and best selections.

indexing_ambiguity {
 mode = Auto
 index_basis_in = None
 assigned_basis = None
 d_min = 3.0
 d_max = 10.0
 sigma_min = 1.5
 n_sample_frames = 1000
 n_selected_frames = 100
}

We left other parameters to their default value and modified n_sample_frames to 1000 and n_selected_frames to 100.

  • No. of Bin
n_bins = 10

Now we have a complete .phil file ready to run.

data = /net/viper/raid1/mu238/XfelProject/dials17/extracted
target_unit_cell = 91.7 91.7 46 90 90 120
target_space_group = P6
n_residues = 128
pixel_size_mm = 0.172
scale {
 d_min = 2.5
 d_max = 20
 sigma_min = 1.5
}
postref {
 scale {
   d_min = 2.5
   d_max = 20
   sigma_min = 1.5
   partiality_min = 0.1
 }
 allparams {
   flag_on = True
   d_min = 2.5
   d_max = 20
   sigma_min = 1.5
   partiality_min = 0.1
   uc_tolerance = 5
 }
}
merge {
 d_min = 2.5
 d_max = 20
 sigma_min = -3.0
 partiality_min = 0.1
 uc_tolerance = 5
}
indexing_ambiguity {
 mode = Auto
 index_basis_in = None
 assigned_basis = None
 d_min = 3.0
 d_max = 10.0
 sigma_min = 1.5
 n_sample_frames = 1000
 n_selected_frames = 100
}
n_bins = 10

Copy and paste this set of parameter in an editor then save the file as "prime.phil".

Running the Program

You can run the program by giving it an input file:

prime.run prime.phil

For this tutorial, PRIME will score the randomly selected 1,000 images then select the best 100 for running Brehm & Diederichs algorithm in Bootstrap mode. If you run the program with flag_plot=True, you'll see a plot showing two separated clusters, each representing images with matching assigned basis.

Results of Running Indexing Ambiguity with Boostrap

Results of Image Clustering

PRIME will select on of these two clusters and merge it to get a reference set for the Bootstrap step. Here, the remaining images will get assigned with a basis that makes it correlate best with the reference set.

Once all images are assigned with appropriate basis, PRIME will proceed to scaling and post-refinement steps. After three post-refinement cycles (default value), the process is done and here is the output of the program.


Isotropic B-factor:     5.30
No. of reflections
 all:                   7786
 outside resolution:      51
 outliers:                 0
 total left:            7735
Summary for Prime_Run_1/postref_cycle_3_merge.mtz
Bin Resolution Range     Completeness       N_obs  |Rmerge  Rsplit   CC1/2   N_ind |CCiso   N_ind|CCanoma  N_ind|  I/sigI     I      sigI      I**2 
--------------------------------------------------------------------------------------------------------------------------------------------------
01   19.88 -    5.35 100.0    807 /    807  189.78   85.75    8.87   98.43    807    0.00      0    0.00      0     4.46      684.2    136.2   3.57
02    5.35 -    4.26 100.0    782 /    782  140.90   73.61    8.89   97.67    782    0.00      0    0.00      0     5.23      794.3    140.3   2.05
03    4.26 -    3.73 100.0    788 /    788  129.32   69.32    8.70   98.11    788    0.00      0    0.00      0     5.41      878.5    150.3   1.95
04    3.73 -    3.39 100.0    765 /    765  117.75   70.72    9.67   97.55    765    0.00      0    0.00      0     4.12      712.3    162.4   1.88
05    3.39 -    3.15 100.0    770 /    770  113.22   71.61   11.22   88.54    770    0.00      0    0.00      0     2.73      500.1    173.7   2.19
06    3.15 -    2.96 100.0    767 /    767  106.12   73.19   11.07   97.21    767    0.00      0    0.00      0     2.09      404.6    183.8   2.02
07    2.96 -    2.81 100.0    766 /    766  103.73   75.79   12.53   96.62    766    0.00      0    0.00      0     1.72      345.3    193.9   1.89
08    2.81 -    2.69 100.0    745 /    745  101.51   76.11   12.84   96.21    745    0.00      0    0.00      0     1.49      317.3    204.1   1.98
09    2.69 -    2.59 100.0    786 /    786   97.88   77.59   14.19   95.40    786    0.00      0    0.00      0     1.38      299.0    209.3   1.86
10    2.59 -    2.50 100.0    759 /    759   92.53   78.61   14.72   96.51    759    0.00      0    0.00      0     1.36      312.4    218.8   2.07
--------------------------------------------------------------------------------------------------------------------------------------------------
        TOTAL        100.0   7735 /   7735  119.73   75.51   10.45   97.21   7735    0.00      0    0.00      0     3.02      527.3    176.9   2.56
--------------------------------------------------------------------------------------------------------------------------------------------------
Summary of CC1/2 on three crystal axes
Bin Resolution Range           CC1/2                       I                           N_refl           
                       a*      b*      c*  |      a*        b*       c*    |    a*      b*     c*      
---------------------------------------------------------------------------------------------------------
01   19.88 -    5.35   97.01   98.64   98.23      528.6      559.6     1216.7     42     51     47
02    5.35 -    4.26   97.64   98.43   99.08      817.6      527.5      964.9     43     44     40
03    4.26 -    3.73   96.31   98.02   97.68      605.7      682.9      856.0     39     39     41
04    3.73 -    3.39   98.49   98.55   97.73      961.9      532.6      729.1     42     37     45
05    3.39 -    3.15   96.88   98.38   92.69      449.5      492.1      721.6     39     39     40
06    3.15 -    2.96   98.48   93.58   98.61      389.9      303.4      391.7     39     37     39
07    2.96 -    2.81   96.98   98.02   95.35      361.3      331.7      383.2     42     37     43
08    2.81 -    2.69   95.29   94.02   94.69      290.8      194.7      292.7     41     35     36
09    2.69 -    2.59   96.55   91.88   98.57      265.7      341.4      290.2     41     35     44
10    2.59 -    2.50   94.44   97.81   96.67      249.0      400.1      236.2     42     36     40
----------------------------------------------------------------------------------------------------------
       total           97.57   97.74   94.58      494.1      446.0      619.4    410    390    415
----------------------------------------------------------------------------------------------------------
Summary of refinement and merging
No. good frames:                  4733
No. bad cc frames:                 113
No. bad G frames) :                109
No. bad unit cell frames:           20
No. bad gamma_e frames:             22
No. bad SE:                          2
No. observations:               935265
Mean target value (BEFORE: Mean Median (Std.))
post-refinement:                301.22       259.10 (   171.56)
(x,y) restraints:              1679.63      1573.15 (   657.49)
Mean target value (AFTER: Mean Median (Std.))
post-refinement:                300.02       257.53 (   170.98)
(x,y) restraints:              1679.90      1572.77 (   660.19)
SE:                            1915.60       776.84 ( 33765.97)
G:                           1.000e+00    8.971e-01 ( 8.15e-01)
B:                               11.83        14.45 (    11.95)
Rot.x:                           -0.08         0.00 (    12.10)
Rot.y:                            0.14         0.00 (     9.62)
gamma_y:                       0.00000      0.00000 (  0.00000)
gamma_z:                       0.00000      0.00000 (  0.00000)
gamma_0:                       0.03793      0.00019 (  0.60820)
gamma_e:                      -0.12824      0.00145 (  0.60227)
voigt_nu:                      0.50000      0.50000 (  0.00000)
unit cell
  a:                             91.45        91.45 (     0.11)
  b:                             91.45        91.45 (     0.11)
  c:                             45.96        45.96 (     0.12)
  alpha:                         90.00        90.00 (     0.00)
  beta:                          90.00        90.00 (     0.00)
  gamma:                        120.00       120.00 (     0.00)
Parmeters from integration (not-refined)
 Wavelength:                   0.96861      0.96861 (  0.00000)
 Detector distance:          303.81868    303.81868 (  0.00000)
* (standard deviation)
Total calculation time: 542.00 seconds
Finished: Tuesday 14. February 2017 10:53:18

Obtaining the Output

Your output will be in Prime_Run_n (where n is the number of run).

-bash-4.1$ ls Prime_Run_1/ -l
total 9076
-rw-r--r-- 1 mu238 camb  879638 Feb 14 10:53 crystal.o
drwxr-xr-x 2 mu238 camb     104 Feb 14 10:46 index_ambiguity
drwxr-xr-x 2 mu238 camb       6 Feb 14 10:44 isoform_cluster
-rw-r--r-- 1 mu238 camb   32704 Feb 14 10:53 log.txt
-rw-r--r-- 1 mu238 camb  324556 Feb 14 10:47 mean_scaled_merge.hkl
-rw-r--r-- 1 mu238 camb  157260 Feb 14 10:47 mean_scaled_merge.mtz
-rw-r--r-- 1 mu238 camb   15753 Feb 14 10:53 pickle.stat
-rw-r--r-- 1 mu238 camb  324381 Feb 14 10:49 postref_cycle_1_merge.hkl
-rw-r--r-- 1 mu238 camb  157260 Feb 14 10:49 postref_cycle_1_merge.mtz
-rw-r--r-- 1 mu238 camb  324515 Feb 14 10:51 postref_cycle_2_merge.hkl
-rw-r--r-- 1 mu238 camb  157260 Feb 14 10:51 postref_cycle_2_merge.mtz
-rw-r--r-- 1 mu238 camb  324716 Feb 14 10:53 postref_cycle_3_merge.hkl
-rw-r--r-- 1 mu238 camb  157340 Feb 14 10:53 postref_cycle_3_merge.mtz
-rw-r--r-- 1 mu238 camb 6412200 Feb 14 10:53 rejections.txt

File log.txt contains all the merging stats. The final merged reflection set is postref_cycle_3_merge.mtz (or .hkl).