Cppxfel Statistics

From cctbx_xfel
Revision as of 17:56, 25 November 2015 by Helenginn (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

There are a number of merging statistics and plots which can be generated using cppxfel and your favourite graph-drawing software. cppxfel has a number of commands which generate CSV files which can be plotted elsewhere.

Correlation between two images

cppxfel can be used to generate plots of intensities between two images. To calculate the correlation between two halves of the data set in the first merge:

cppxfel.run -cc half1Merge0.mtz half2Merge0.mtz

This creates a new CSV file named correlation.csv. This can also be carried out for individual images:

cppxfel.run -cc allMerge5.mtz ref-img-shot-s00-20130316165414164_0.mtz

The beginning of correlation.csv begins as so. The "first intensity" and "second intensity" columns can be plotted in a suitable program (e.g. R, Veusz, etc.).

<pre>
h k l,First intensity,Second intensity,Resolution
0 1 29,6788.76,6681.7,3.64553
0 1 49,113.671,234.905,2.15839
0 1 53,24.636,44.4067,1.99555
0 1 57,110.308,175.868,1.85556
0 2 4,346.637,351.928,23.6538
0 2 18,11660.6,11678.3,5.8409
0 2 36,143.598,30.5506,2.9339
0 2 42,1503.52,1370.33,2.5158
0 2 50,29.0206,20.7457,2.11397
0 2 52,59.3656,110.755,2.03279
0 3 5,5267.67,5281.37,18.1417
0 3 15,786.865,2939.76,6.91526
0 3 47,82.9657,111.247,2.24614
0 3 49,218.114,361.618,2.15481
0 4 16,130.598,202.681,6.41405
0 4 34,9586.69,8670.65,3.08996
0 4 42,619.911,235.344,2.5073
0 4 44,3396.41,3825.67,2.39429
0 4 50,661.039,709.513,2.10893

Rsplit between two halves of the data set

As well as CC1/2, Rsplit can be calculated between two halves of the data set. For example, to compare the two halves of the final merge:

cppxfel.run -rsplit half1Merge.mtz half2Merge.mtz

For the 1000 image data set provided in this tutorial, this will produce an Rsplit of approximately 13%.

Running cppxfel...
Welcome to cppxfel!
 SYMINFO file set to /apps/strubi/ccp4/ccp4-6.5/lib/data/syminfo.lib 
Loaded 14594 reflections (14594 accepted).
 SYMINFO file set to /apps/strubi/ccp4/ccp4-6.5/lib/data/syminfo.lib 
Loaded 14200 reflections (14200 accepted).
N: lowRes	highRes	Value	Hits	Multiplicity
N: inf	4.33175	0.0571572	383	2
N: 4.33175	3.43811	0.0529099	358	2
N: 3.43811	3.00347	0.0949427	474	2
N: 3.00347	2.72883	0.151788	511	2
N: 2.72883	2.53322	0.161476	652	2
N: 2.53322	2.38385	0.18049	640	2
N: 2.38385	2.26446	0.21476	686	2
N: 2.26446	2.16587	0.224589	735	2
N: 2.16587	2.08249	0.235197	791	2
N: 2.08249	2.01062	0.252543	899	2
N: 2.01062	1.94775	0.280977	816	2
N: 1.94775	1.89207	0.33236	713	2
N: 1.89207	1.84225	0.389915	577	2
N: 1.84225	1.7973	0.43699	354	2
N: 1.7973	1.75644	0.527985	255	2
N: 1.75644	1.71906	0.589053	169	2
N: 1.71906	1.68467	0.693728	68	2
N: 1.68467	1.65287	0.662898	37	2
N: 1.65287	1.62335	0.893698	15	2
N: 1.62335	1.59583	1.25962	5	2
N: *** Overall ***
N: 0	1.59583	0.129852	9138	2

Partiality plot for an individual image

cppxfel can produce a CSV file containing information on the success of the partiality model for a particular image. This requires a reference MTZ (generated with > 2.0-3.0 multiplicity) and an image of the format ref-img*.mtz created by the standard input file refine.txt.

This can be generated as follows:

cppxfel.run -partiality allMerge5.mtz ref-img-shot-s00-20130316165414164_0.mtz

Alternatively, a maximum resolution can be specified:

cppxfel.run -partiality allMerge5.mtz ref-img-shot-s00-20130316165414164_0.mtz 2.0

This creates the following output on the screen:

Running cppxfel...
Welcome to cppxfel!
 SYMINFO file set to /apps/strubi/ccp4/ccp4-6.5/lib/data/syminfo.lib 
Loaded 23822 reflections (23822 accepted).
Setting reference to allMerge5.mtz
Partiality plot for ref-img-shot-s00-20130316165414164_0.mtz
 SYMINFO file set to /apps/strubi/ccp4/ccp4-6.5/lib/data/syminfo.lib 
Loaded 3002 reflections (3002 accepted).
Ambiguity 0: 0.622248, ambiguity 1: 0.931926
2754 reflections in common with reference MTZ.
Total number of reflections in MTZ: 2890
     Low res    High res   Num refl.
         inf     2.53984         718
     2.53984     2.01587        1100
     2.01587     1.76103         879
     1.76103         1.6         167
N: Total time: 0 minutes, 1 seconds (1 seconds).
Done

This shows the four resolution bins used to generate the data, and outputs the appropriate data to partiality_[n].csv where [n] is the number of generated bins. The format of the partiality CSV files is as follows:

h,k,l,wavelength,partiality,percentage,intensity,resolution
4,2,-6,1.37061,0,3.76223,263.63,0.070742
2,2,-4,1.39565,0.419087,20.8099,142.831,0.0463115
-9,4,9,1.4039,0,3.41021,24.9747,0.126123
-1,9,-6,1.41084,0,3.08305,35.6135,0.102689
3,20,-15,1.41234,0,2.05453,94.6172,0.238028
-10,28,-4,1.41258,0,96.443,158.223,0.283599
17,24,-25,1.41401,0,94.9401,74.3205,0.364902
18,-6,-10,1.41447,0,0.439069,56.5138,0.202751
11,25,-22,1.41478,0,1.55951,39.9897,0.33154
-15,35,2,1.41483,0,1.32867,9.05428,0.360467
-19,8,35,1.41484,0,0,-1.88631,0.383995
-17,4,35,1.41519,0,35.4585,59.3055,0.369768

The wavelength column corresponds to the Ewald sphere on which the centre of the reciprocal lattice point is found in Å, the partiality column is the theoretically calculated partiality value, the percentage column is the percentage (intensity for a given image / intensity of reference data set), the intensity is the raw integrated intensity for that reflection and the resolution is d* or 1 / d, in Å-1.

One should plot the wavelength on the X axis and both the partiality and percentage columns on separate Y axes, and one hopes that the partiality and percentage graphs match each other as closely as possible.

Rmerge, Rmeas, Rpim

The R values Rmerge, Rmeas and Rpim can be generated from the unmerged*.mtz files generated during refinement (refine.txt) or the unmerged.mtz file generated during the final merge (merge.txt).

This can be carried out as follows:

cppxfel.run -rmerge unmerged.mtz
cppxfel.run -rmeas unmerged.mtz
cppxfel.run -rpim unmerged.mtz

These should be used to determine the quality of post-refinement, not the quality of the high resolution data, and should reduce as the post-refinement strategy improves. These will produce appropriate tables:

$ cppxfel.run -rpim unmerged.mtz
Running cppxfel...
Welcome to cppxfel!
 SYMINFO file set to /apps/strubi/ccp4/ccp4-6.5/lib/data/syminfo.lib 
Loaded 1520200 reflections (102129 accepted).
N: lowRes	highRes	Value	Hits	Multiplicity
N: inf	4.28806	0.0138179	1142	3.75862
N: 4.28806	3.40343	0.0111124	1175	3.62434
N: 3.40343	2.97317	0.0189678	1223	4.25449
N: 2.97317	2.70131	0.0270932	1231	4.18195
N: 2.70131	2.50767	0.0314556	1306	5.0643
N: 2.50767	2.35981	0.0350953	1306	4.94366
N: 2.35981	2.24162	0.0398659	1322	5.34516
N: 2.24162	2.14403	0.0437967	1313	5.49478
N: 2.14403	2.06148	0.0463546	1319	6.04176
N: 2.06148	1.99034	0.0486673	1312	6.70868
N: 1.99034	1.9281	0.0563663	1350	5.81406
N: 1.9281	1.87298	0.0641532	1291	5.29523
N: 1.87298	1.82367	0.0746328	1207	4.17778
N: 1.82367	1.77917	0.0858512	1189	3.44205
N: 1.77917	1.73872	0.104914	997	2.93709
N: 1.73872	1.70172	0.127369	826	2.38837
N: 1.70172	1.66767	0.160804	613	1.98459
N: 1.66767	1.6362	0.187585	418	1.68597
N: 1.6362	1.60698	0.266887	163	1.36477
N: 1.60698	1.57973	0.391058	23	1.17532
N: *** Overall ***
N: 0	1.57973	0.0239934	20726	4.29492
N: Total time: 0 minutes, 23 seconds (23 seconds).

This shows an overall Rpim of 2.4%, with significant increase in Rpim after 1.8 Å.