2017 cxi merge tutorial

From cctbx_xfel
Revision as of 05:07, 9 February 2017 by Nicksauter (Talk | contribs)

Jump to: navigation, search

This is an updated, worked example of data merging using cxi.merge. Previous documentation sets are here and here.

Initial characterization

In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code 5M3S.

  • /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
  • /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:

  • all integration pickles have space group P6 (good)
  • distance and beam center is fixed throughout the integrated dataset
  • Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120
phenix.fetch_pdb --mtz 5m3s

Merge command file:

#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement. set the MTZ flag = jobs

 4493 of 5031 integration files were accepted
 0 rejected due to wrong Bravais group
 11 rejected for unit cell outliers
 22 rejected for low signal
 505 rejected due to up-front poor correlation under min_corr parameter
 0 rejected for file errors or no reindex matrix

Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].

 File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
   return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1

ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
                                      CC      N     CC     N     R     R     R   Scale  Scale    SpSig
Bin  Resolution Range  Completeness  int    int    iso   iso    int  split  iso   int    iso      Test
---------------------------------------------------------------------------------------------------------
  1 -1.0000 -  5.3861     [809/809] 80.0%     809 75.2%    805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
  2  5.3861 -  4.2749     [791/791] 54.9%     791 74.5%    791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
  3  4.2749 -  3.7345     [781/781] 65.8%     781 81.6%    781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
  4  3.7345 -  3.3930     [776/776] 63.9%     776 74.5%    776 49.3% 36.4% 48.6% 0.764 283.109  758.0388
  5  3.3930 -  3.1498     [765/765] 67.1%     765 81.9%    765 48.4% 35.6% 43.4% 0.795 338.091  533.7650
  6  3.1498 -  2.9641     [771/771] 58.6%     771 72.4%    771 49.3% 36.6% 50.7% 0.759 286.707  222.4718
  7  2.9641 -  2.8156     [765/765] 56.0%     765 72.3%    765 48.5% 35.3% 46.7% 0.765 320.954  154.5299
  8  2.8156 -  2.6930     [746/746] 63.0%     746 76.1%    746 46.4% 34.3% 42.6% 0.867 357.183  99.4430
  9  2.6930 -  2.5894     [790/790] 52.1%     790 69.4%    790 50.4% 37.4% 47.5% 0.814 314.326  113.1264
 10  2.5894 -  2.5000     [757/757] 54.9%     757 78.6%    757 52.4% 38.9% 44.4% 0.794 306.403  109.0768

All                     [7751/7751] 74.9%    7751 78.8%   7747 51.9% 36.9% 50.1% 0.680 266.538   1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

Breaking the indexing ambiguity