2017 cxi merge tutorial: Difference between revisions
Nicksauter (talk | contribs) No edit summary |
Nicksauter (talk | contribs) No edit summary |
||
| Line 1: | Line 1: | ||
This is an updated, worked example of data merging using cxi.merge. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. | This is an updated, worked example of data merging using cxi.merge. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. | ||
Revision as of 05:07, 9 February 2017
This is an updated, worked example of data merging using cxi.merge. Previous documentation sets are here and here.
Initial characterization
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code 5M3S.
- /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
- /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration
Unix ls reveals 5031 *.pickle files in each directory.
Immediately there is a problem:
$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle
...fails on image 0059 with a traceback; it looks like the file is corrupted.
So focus on the data without integration resolution cutoff:
$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle
Some conclusions with the aid of grep:
- all integration pickles have space group P6 (good)
- distance and beam center is fixed throughout the integrated dataset
- Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120
phenix.fetch_pdb --mtz 5m3s
Merge command file:
#!/bin/csh -f
set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`
cxi.merge ${eff}
exit
cxi.xmerge ${eff}
Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line
Scale-up trial nproc=60, no postrefinement. set the MTZ flag = jobs
4493 of 5031 integration files were accepted 0 rejected due to wrong Bravais group 11 rejected for unit cell outliers 22 rejected for low signal 505 rejected due to up-front poor correlation under min_corr parameter 0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'
Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.
Table of Scaling Results:
---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768
All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------
Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.