2017 cxi merge tutorial: Difference between revisions
Nicksauter (talk | contribs) No edit summary |
Nicksauter (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
This is an updated, worked example of data merging using cxi.merge. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. | This is an updated, worked example of data merging using cxi.merge. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. | ||
Revision as of 05:07, 9 February 2017
This is an updated, worked example of data merging using cxi.merge. Previous documentation sets are here and here.
Initial characterization
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code 5M3S.
- /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
- /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration
Unix ls reveals 5031 *.pickle files in each directory.
Immediately there is a problem:
$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle
...fails on image 0059 with a traceback; it looks like the file is corrupted.
So focus on the data without integration resolution cutoff:
$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle
Some conclusions with the aid of grep:
- all integration pickles have space group P6 (good)
- distance and beam center is fixed throughout the integrated dataset
- Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120
phenix.fetch_pdb --mtz 5m3s
Merge command file:
#!/bin/csh -f set effective_params = "d_min=DMIN \ data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \ output.n_bins=10 \ pixel_size=0.172 \ backend=FS \ nproc=1 \ model=5m3s.pdb \ merge_anomalous=True \ plot_single_index_histograms=False \ scaling.algorithm=mark0 \ raw_data.sdfac_auto=False \ scaling.mtz_file=5m3s.mtz \ scaling.show_plots=False \ scaling.log_cutoff=None \ scaling.mtz_column_F=i-obs \ scaling.report_ML=True \ set_average_unit_cell=True \ rescale_with_average_cell=False \ significance_filter.apply=True \ significance_filter.min_ct=30 \ significance_filter.sigma=0.2 \ include_negatives=NEG \ postrefinement.enable=True \ postrefinement.algorithm=rs \ output.prefix=TAG" set tag = p6m set dmin = 2.5 set neg = True set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"` cxi.merge ${eff} exit cxi.xmerge ${eff}
Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line
Scale-up trial nproc=60, no postrefinement. set the MTZ flag = jobs
4493 of 5031 integration files were accepted 0 rejected due to wrong Bravais group 11 rejected for unit cell outliers 22 rejected for low signal 505 rejected due to up-front poor correlation under min_corr parameter 0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'
Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.
Table of Scaling Results:
--------------------------------------------------------------------------------------------------------- CC N CC N R R R Scale Scale SpSig Bin Resolution Range Completeness int int iso iso int split iso int iso Test --------------------------------------------------------------------------------------------------------- 1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850 2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625 3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218 4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388 5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650 6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718 7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299 8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430 9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264 10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768 All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0 ---------------------------------------------------------------------------------------------------------
Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.