Gd-Lysozyme: Difference between revisions

From cctbx_xfel
Jump to navigation Jump to search
 
(153 intermediate revisions by the same user not shown)
Line 19: Line 19:
  ls /reg/d/psdm/cxi/cxi84914/xtc/e240
  ls /reg/d/psdm/cxi/cxi84914/xtc/e240


Notice that there are numerous runs in the directory.  Now we will create composite averages for each run.  Grab this configuration file: [[mkdark.cfg]] and put it in your cxi84914 directory.  For one run only from each experiment directory:
Notice that there are numerous runs in the directory.  Now we will create composite averages for each run.  Grab this configuration file: [[mkdark_Gd-Lysozyme.cfg]] and put it in your cxi84914 directory.  For one run only from each experiment directory:


  cxi.lsf -c ~/myrelease/cxi84914/mkdark.cfg \
  cxi.lsf -c ~/myrelease/cxi84914/mkdark_Gd-Lysozyme.cfg \
  -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e239/ \
  -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e239/ \
  -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -s -p 8 -x 239 -r 16 -t 0
  -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -s -p 8 -x 239 -r 16 -t 0


  cxi.lsf -c ~/myrelease/cxi84914/mkdark.cfg \
  cxi.lsf -c ~/myrelease/cxi84914/mkdark_Gd-Lysozyme.cfg \
  -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e240/ \
  -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e240/ \
  -i /reg/d/psdm/cxi/cxi84914/xtc/e240 -q psanacsq -s -p 8 -x 240 -r 16 -t 0
  -i /reg/d/psdm/cxi/cxi84914/xtc/e240 -q psanacsq -s -p 8 -x 240 -r 16 -t 0
Line 43: Line 43:
  aklog
  aklog
  for m in 27 28 29 30 31; \
  for m in 27 28 29 30 31; \
   do echo $m; cxi.lsf -c ~/myrelease/cxi84914/mkdark.cfg \
   do echo $m; cxi.lsf -c ~/myrelease/cxi84914/mkdark_Gd-Lysozyme.cfg \
   -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e239/ \
   -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e239/ \
   -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -s -p 8 -x 239 -r ${m} -t 0; done
   -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -s -p 8 -x 239 -r ${m} -t 0; done
Line 51: Line 51:
  aklog
  aklog
  for m in 1 2 3 4 5 6 7 8 9 10 12 13 14 16 17 18 19 21 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40; \
  for m in 1 2 3 4 5 6 7 8 9 10 12 13 14 16 17 18 19 21 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40; \
   do echo $m; cxi.lsf -c ~/myrelease/cxi84914/mkdark.cfg \
   do echo $m; cxi.lsf -c ~/myrelease/cxi84914/mkdark_Gd-Lysozyme.cfg \
   -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e240/ \
   -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e240/ \
   -i /reg/d/psdm/cxi/cxi84914/xtc/e240 -q psanacsq -s -p 8 -x 240 -r ${m} -t 0; done
   -i /reg/d/psdm/cxi/cxi84914/xtc/e240 -q psanacsq -s -p 8 -x 240 -r ${m} -t 0; done
Line 383: Line 383:
Now we'll figure out which pixels are untrusted, and thus should not be integrated.  Three criteria will be used:
Now we'll figure out which pixels are untrusted, and thus should not be integrated.  Three criteria will be used:
* Hot pixels--on the average-dark the pixel values exceed 1250 (should be fine tuned by inspecting the dark & using trial and error)
* Hot pixels--on the average-dark the pixel values exceed 1250 (should be fine tuned by inspecting the dark & using trial and error)
* Hot pixels--on the standard deviation-dark the stddev exceeds 5 and therefore unreliable (also should be fine tuned by trial and error)
* Hot pixels--on the standard deviation-dark the stddev exceeds 4 and therefore unreliable (also should be fine tuned by trial and error)
* Cold pixels or shadows--on a maximum-composite data image, inspect values and set a minimum threshold value (we choose 14 here)
* Cold pixels or shadows--on a maximum-composite data image, inspect values and set a minimum threshold value (we choose 14 here)
For more information on masking parameters see [[Preparatory_steps|creating a mask image]]
For more information on masking parameters see [[Preparatory_steps|creating a mask image]]
Line 413: Line 413:
The electron microscopy step above determines only the sensor positions relative to the frames of reference of each quadrant; but not the absolute position of each quadrant in space.  At the CXI instrument, the forward detector DS1 has rail-mounted quadrants to allow re-sizing of the central hole.  The quadrant placement should be assessed for both the forward and back (DS2) detectors.   
The electron microscopy step above determines only the sensor positions relative to the frames of reference of each quadrant; but not the absolute position of each quadrant in space.  At the CXI instrument, the forward detector DS1 has rail-mounted quadrants to allow re-sizing of the central hole.  The quadrant placement should be assessed for both the forward and back (DS2) detectors.   


For lysozyme we examine one of the strong images (maximum composite).  Students may use the instructor's files ($USER=nksauter).  We first determine that our image has a timestamp that identifies it within cctbx as being from run 7 (2013):
For lysozyme we examine one of the strong images (maximum composite).  Students may use the instructor's files ($USER=tmmclark).  We first determine that our image has a timestamp that identifies it within cctbx as being from run 7 (2013):


  cxi.print_pickle /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/r0028/000/out/max-r0028.pickle  
  cxi.print_pickle /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/r0028/000/out/max-r0028.pickle  
Line 421: Line 421:
  cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/r0028/000/out/max-r0028.pickle \
  cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/r0028/000/out/max-r0028.pickle \
   distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
   distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
   viewer.calibrate_unitcell.unitcell=79,79,39,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212
   viewer.calibrate_unitcell.unitcell=79,79,38,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212


The "Settings" GUI panel shows detector distance as well as the hard-coded quadrant positions corresponding to "CXI 7.1", namely [2, -6, 3, -6, -7, 0, -1, -4]. Tile translations have been zeroed out in the code. The settings can be changed in the panel, or alternately given as a separate command line parameter (distl.quad_translations=-3,-1,-1,-5,-13,2,-7,-4).  The object is to align the powder pattern with the predicted rings (red circles) based on the unit cell parameters.  It can be seen that the alignment is not quite perfect.
The "Settings" GUI panel shows detector distance as well as the hard-coded quadrant positions corresponding to "CXI 7.1", namely [2, -6, 3, -6, -7, 0, -1, -4]. Tile translations have been zeroed out in the code. The settings can be changed in the panel, or alternately given as a separate command line parameter (distl.quad_translations=2,-6,3,-6,-7,0,-1,-4).  The object is to align the powder pattern with the predicted rings (red circles) based on the unit cell parameters.  It can be seen that the alignment is not quite perfect.


Since we have well-formed powder rings, we can run the automatic quadrant positioning tool:
Since we have well-formed powder rings, we can run the automatic quadrant positioning tool:
Line 433: Line 433:
Try a few different max-composites from runs 27, 29 and 31:
Try a few different max-composites from runs 27, 29 and 31:


The NEW QUAD translations are: [2, -6, 4, -6, -9, 2, 0, -4]
The NEW QUAD translations are: [2, -6, 4, -6, -9, 2, 0, -4] (poor self correlation value approx. 0.06 weak powder diffraction for this run)


The NEW QUAD translations are: [3, -4, 5, -6, -6, 1, 0, -4]
The NEW QUAD translations are: [3, -4, 5, -6, -6, 1, 0, -4]
Line 442: Line 442:
  cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/r0028/000/out/max-r0028.pickle \
  cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/r0028/000/out/max-r0028.pickle \
   distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
   distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
   viewer.calibrate_unitcell.unitcell=79,79,39,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212\
   viewer.calibrate_unitcell.unitcell=79,79,38,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212\
   distl.quad_translations=2, -6, 4, -6, -6, 1, 0, -4
   distl.quad_translations=2, -6, 4, -6, -6, 1, 0, -4


Line 451: Line 451:
For lysozyme we examine one of the strong images (maximum composite).  Students may use the instructor's files ($USER=nksauter).  We first determine that our image has a timestamp that identifies it within cctbx as being from run 7 (2013):
For lysozyme we examine one of the strong images (maximum composite).  Students may use the instructor's files ($USER=nksauter).  We first determine that our image has a timestamp that identifies it within cctbx as being from run 7 (2013):


  cxi.print_pickle /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0019/000/out/max-r0019.pickle  
  cxi.print_pickle /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0029/000/out/max-r0029.pickle  
   > Detector format version: CXI 7.1
   > Detector format version: CXI 7.1
  cxi.detector_format_versions
  cxi.detector_format_versions
  cxi.detector_format_versions "CXI 7.1"
  cxi.detector_format_versions "CXI 7.1"
  cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0019/000/out/max-r0019.pickle \
  cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0019/000/out/max-r0029.pickle \
   distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
   distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
   viewer.calibrate_unitcell.unitcell=79,79,39,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212
   viewer.calibrate_unitcell.unitcell=79,79,38,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212


The "Settings" GUI panel shows detector distance as well as the hard-coded quadrant positions corresponding to "CXI 7.1", namely [2, -6, 3, -6, -7, 0, -1, -4]. Tile translations have been zeroed out in the code. The settings can be changed in the panel, or alternately given as a separate command line parameter (distl.quad_translations=-3,-1,-1,-5,-13,2,-7,-4).  The object is to align the powder pattern with the predicted rings (red circles) based on the unit cell parameters.  It can be seen that the alignment is not quite perfect.
The "Settings" GUI panel shows detector distance as well as the hard-coded quadrant positions corresponding to "CXI 7.1", namely [2, -6, 3, -6, -7, 0, -1, -4]. Tile translations have been zeroed out in the code. The settings can be changed in the panel, or alternately given as a separate command line parameter (distl.quad_translations=2,-6,3,6,-7,0,-1,-4.  The object is to align the powder pattern with the predicted rings (red circles) based on the unit cell parameters.  It can be seen that the alignment is not quite perfect.


Since we have well-formed powder rings, we can run the automatic quadrant positioning tool:
Since we have well-formed powder rings, we can run the automatic quadrant positioning tool:


  cspad.quadrants /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0019/000/out/max-r0019.pickle \
  cspad.quadrants /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0029/000/out/max-r0029.pickle \
   distl.detector_format_version="CXI 7.1"
   distl.detector_format_version="CXI 7.1"
  > The NEW QUAD translations are: [3, -4, 4, -6, -6, 1, 0, -4]
  > The NEW QUAD translations are: [2, -6, 4, -6, -6, 1, 0, -4]


Try a few different max-composites from runs 3, 7, 17, 27, 29:
Try a few different max-composites from runs 3, 7, 17, 27:


The NEW QUAD translations are:  
The NEW QUAD translations are: [2, -5, 4, -6, -6, 1, 0, -3]


The NEW QUAD translations are:  
The NEW QUAD translations are: [2, -5, 4, -6, -6, 1, 0, -4]


The NEW QUAD translations are:  
The NEW QUAD translations are: [19, -6, 3, -6, -11, 11, 1, -1] (very poor self-correlation coefficient around 0.06 run 17 is weak powder and not a good run to use for the metrology)


The NEW QUAD translations are:  
The NEW QUAD translations are: [3, -4, 4, -6, -6, 1, 0, -4]


The NEW QUAD translations are:


These can be pasted on to the command line for graphical review:
These can be pasted on to the command line for graphical review:
  cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0019/000/out/max-r0019.pickle \
  cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0029/000/out/max-r0029.pickle \
   distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
   distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
   viewer.calibrate_unitcell.unitcell=79,79,39,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212\
   viewer.calibrate_unitcell.unitcell=79,79,38,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212\
   distl.quad_translations=3, -4, 4, -6, -6, 1, 0, -4
   distl.quad_translations= 2, -6, 4, -6, -6, 1, 0, -4
 
This looks slightly better. We will continue in the next section to test the distance and find the best value.
 
==Data Discovery Phase==
Before we can move forward with metrology on the pixel and sub pixel level, we must test if the data can be indexed and index a test image. Beginning with e239 we will choose run 27 and run hit finding and dump the images for trial 0. The configuration file [[Gd-Lysozyme-27_discover.cfg]] (for configuration run 27) names our phil parameter file [[Gd-Lysozyme-t000.phil]] (t000 means trial 0)] for trial 0 we comment out the target cell and setting. Modules that are commented out in the config file can be uncommented for indexing once we discover the correct parameters.
 
  cxi.lsf -c ~/myrelease/cxi84914/e239/[[Gd-Lysozyme-27_discover.cfg]] \
  -o /path to home/myrelease/cxi84914/e239/ \
  -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -p 8 -x 239 -r 27 -t 0
 
Nothing can progress until we successfully index an image.
The next steps involve adjusting the configuration file so that we can just look the images and find one that should index. First we make a new configuration file that just does a hit find and a raw image dump and examine images to find one with "good" diffraction. We choose the s00 stream as this will give a thin slice of the entire run and we can examine all images from this run using the command:


This looks slightly better. From the GUI it also appears that distance=81 fits; meaning that the detz_offset for the configuration file of 571 is ok.
  cctbx.image_viewer shot-s00-20130315225*
 
Looking through these images the best is the third image (shot-s00-20130315225354116.pickle). First we check using this image if spot finder has the correct parameters by using the .phil file parameters in image viewer as follows:
 
  distl.image_viewer distl.minimum_signal_height=5 distal.minimum_spot_height=10 \
  distal.minimum_spot_area=1 shot-s00-20130315225354116.pickle
 
The spot_finder is finding the spots correctly so it seems there is another issue. In examine the images it is clear that the water rings are not at the expected resolution of approximately 3.5 angstroms but rather around 1.8 angstroms indicating that the detz_offset is incorrect (way off) and indexing is not going to work until this is fixed.
 
This brings us to trial 1. The detz_offset is incorrect and we aren't in the ball park so first we will try a detz_offset = 591 which puts the detector at 100 mm away to make the water rings be closer to a 3.5 angstrom resolution. To do this we edit  [[Gd-Lysozyme-27_discover.cfg]] by uncomment the indexing module (also comment out my_ana_pkg.mod_dump:img_thresh, if you don't want to dump the raw images a second time) and change detz_offset to 591. Then run the following command:
  cxi.lsf -c ~/myrelease/cxi84914/e239/[[Gd-Lysozyme-27_discover.cfg]] \
  -o /reg/d/psdm/cxi/cxi84914/scratch/USERNAME/results/e239/ \
  -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq
  -p 8 -x 239 -r 27 -t 1
 
Looking at ~/myrelease/cxi84914/e239/r0027/stdout log file s00.out the unit cell found (86.3581, 86.3581, 42.8606, 90, 90, 90) is not the correct one and many images did not index at all. We need to try to get to a value for detz_offset that is in the correct range. For trial 2 we will use a detz_offset of 581, so the detz_offset in [[Gd-Lysozyme-27_discover.cfg]] is changed t0 581 and we run the following command:
 
  cxi.lsf -c ~/myrelease/cxi84914/e239/[[Gd-Lysozyme-27_discover.cfg]] \
  -o /reg/d/psdm/cxi/cxi84914/scratch/USERNAME/results/e239/ \
  -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -p 8 -x 239 -r 27 -t 2
 
Checking the log file the indexed unit cell is (78.1073, 78.1073, 39.1608, 90, 90, 90) which is much closer to what we expect and checking how many indexed images are in r0027/002/out
  ls -l |wc -1
counts all files in current directory and we get 188 indexed images out of a total of 303 total images in our image dump of r0027 for trial 0. Before accepting detz_offset of 581 we will try a series of detz_offset values in the range of 575 - 595. Go to your my release in your home directory and create a new directory called diet_trials; cd into it and run this command to create new configuration files with the test detz_offset values.
  for i in `seq 575 595`; do vi -c "%s/581/$i/g" -c "w Gd-Lysozyme-27_discover0$i.cfg" \
  -c q\! ../cxi84914/e239/Gd-Lysozyme-27_discover0.cfg ; done
Here, a vi command is executed repeatedly that searches for the number 581 in your config file and replaces it with a number from 575 to 595, then writes out the new file with an appropriate file name.  Note that the number was originally 571, and has already been optimized to 581 through earlier trials.
 
Next, submit indexing jobs for each candidate detz_offset from your myrelease folder:
 
for i in `seq 575 595`; do cxi.lsf -c dist_trials/Gd-Lysozyme-27_discover0$i.cfg \
-o /reg/d/psdm/cxi/cxi84914/scratch/$USER/dist_trials/ -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -r 27
-q psanacsq -p 8 -t $i; done
 
When complete, go to your results directory:
cd /reg/d/psdm/cxi/cxi84914/scratch/$USER/dist_trials/r0027
 
Then determine which detz offset is best:
for i in `ls`; do echo -n "$i "; ls $i/out | wc -l; done
Output for r0027:
<pre>
detz_offset:        Indexed images:
575                          80
576                          98
577                          120
578                          153
579                          182
580                          185
581                          187
582                          159
583                          154
584                          142
585                          120
586                          116
587                          103
588                          85
589                          81
590                          68
591                          64
592                          58
593                          60
594                          53
595                          50
</pre>
The detz_offset of 581 gives the most indexed images. So now we move to the next step in the metrology (unit-pixel tile positions). Repeating this Test for the detz_offset of e240 using the same range of values yields a detz_offset of 580.


==Unit-pixel tile positions==
==Unit-pixel tile positions==
Now we will index the data to derive model lattices. [The configuration file [L498-thermolysin-17.cfg]] names our phil parameter file [[L498-thermolysin-t000.phil]] (t000 means trial 0)].  We'll then compare model and observation, from which we can deduce better metrology.
We now know the correct detz_offset (581) and we can change the Gd-Lysozyme-t000.phil file by uncommenting the target unit cell and known setting and change the confif file to only do the indexing and integration with a detz_offset=581.
Now we will index the data to derive model lattices. The configuration file [[Gd-Lysozyme-27.cfg]] (for configuration run 27) names our phil parameter file [[Gd-Lysozyme-t003.phil]] (t003 means trial 3)].  We'll then compare model and observation, from which we can deduce better metrology. This will be done for both experiments 239 and 240 separately.
 
for m in 27 28 29 31; \
  do echo $m; cxi.lsf -c ~/myrelease/cxi84914/e239/[[Gd-Lysozyme-27.cfg]] \
  -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e239/ \
  -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -p 8 -x 239 -r ${m} -t 3; done
 
bkill 0 # stop all jobs; wrong file path


for m in 17 18 19 20; \
Now for e240 with detz_offset of 580 and a different dark path we need a different config file [[Gd-Lysozyme-3.cfg]] (for configuration run 3) the .phil file for the two experiments is the same in this trial 1. I have separated the configuration and phil files for each experiment in ~/myrelease/cxi84914 by creating e239 and e240 directories to place the files.  
  do echo $m; cxi.lsf -c ~/myrelease/cxi84914/L498-thermolysin-17.cfg \
  -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e157/ \
  -i /reg/d/psdm/cxi/cxi84914/xtc/e157 -q psanacsq -p 8 -x 157 -r ${m} -t 0; done


for m in 21 22 23 24 25 26 27; \
  for m in 3 4 5 6 7 10 12 14 15 16 17 18 19 21 24 25 26 27 28 29 30 32 33 34 35 36 37 38 39 40; \
  do echo $m; cxi.lsf -c ~/myrelease/cxi84914/[[L498-thermolysin-21.cfg]] \
  do echo $m; cxi.lsf -c ~/myrelease/cxi84914/e240/Gd-Lysozyme-3.cfg   \
   -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e157/ \
   -o /reg/d/psdm/cxi/cxi84914/scratch/tmmclark/results/e240/ -i /reg/d/psdm/cxi/cxi84914/xtc/e240/ \
  -i /reg/d/psdm/cxi/cxi84914/xtc/e157 -q psanacsq -p 8 -x 157 -r ${m} -t 0; done
  -q psanacsq -p 8 -x 240 -r ${m} -t 1; done


bkill 0 # stop all jobs; wrong file path
A quick command to count how many integration files there are in e239 and e240:


A quick command to count how many integration files there are:
  find /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e239/*/003/integration -name "int*.pickle"|wc -l
for m in `seq 17 27`; \
  find /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e240/*/001/integration -name "int*.pickle"|wc -l
  do echo $m `find /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e157/r00$m/000/integration -name "int*.pickle"|wc -l` ; done; \
  find /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e157/*/000/integration -name "int*.pickle"|wc -l


Determine whole-pixel translations for all sensors on the CSPAD. 
We have 6427 and 98808 integrated pickles for e239 and e240 respectively.  
cspad.metrology data=/reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e157/r*/000/integration bravais_setting_id=12 max_frames=1000 min_count=25 detector_format_version="CXI 5.1" cxi84914/L498-thermolysin-t000.phil | tee L498-17-t000.unit
List out the new unit translations:
cat L498-17-t000.unit |grep -A21 "Unit translations"
Results from three refinement cycles are listed; capture the last one and incorporate it into a new version of the integration phil file, [[L498-thermolysin-t001.phil]].  Also in this phil file, for the next integration round we'll increase our integration limits (3 places) to 1.8 Angstroms.  After editing the configuration file to use this new phil file, submit the new round of integration jobs.


For the purpose of the tutorial, we'll drop runs 17-20 since they contribute only 1095 lattices compared with >18000 for runs 21-27. Fine-tuning of the metrology for runs 17-20 would have to be performed separately since the detector is at a different distance.
Determine whole-pixel translations for all sensors on the CSPAD for e239. 
cspad.metrology data=/reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e239/r*/003/integration \
bravais_setting_id=9 max_frames=1000 min_count=25 \
detector_format_version="CXI 7.1" ~/myrelease/cxi84914/e239 \
Gd-Lysozyme-t003.phil | tee Gd-27-t003.unit


  for m in 21 22 23 24 25 26 27; \
Determine whole-pixel translations for all sensors on the CSPAD for e240.  
  do echo $m; cxi.lsf -c ~/myrelease/cxi84914/L498-thermolysin-21.cfg \
cspad.metrology data=/reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e240/r*/001/integration \
  -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e157/ \
bravais_setting_id=9 max_frames=1000 min_count=25 \
  -i /reg/d/psdm/cxi/cxi84914/xtc/e157 -q psanacsq -p 8 -x 157 -r ${m} -t 1; done
detector_format_version="CXI 7.1" ~/myrelease/cxi84914/e240 \
Gd-Lysozyme-t001.phil | tee Gd-27-t001.unit


Another round of metrology refinement, this time using trial 001 as the basis:
List out the new unit translations for e239 and e240:
  cspad.metrology data=/reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e157/r002[1-7]/001/integration bravais_setting_id=12 max_frames=1000 min_count=25 detector_format_version="CXI 5.1" cxi84914/L498-thermolysin-t001.phil | tee L498-21-t001.unit
  cat Gd-27-t003.unit |grep -A21 "Unit translations"
cat Gd-3-t001.unit |grep -A21 "Unit translations"
Results from three refinement cycles are listed; capture the last one and incorporate it into a new version of the integration phil file, [[Gd-Lysozyme-t004.phil]].


…proving we have not yet converged.  Take this output and construct trial 002: [[L498-thermolysin-t002.phil]]; edit the *.cfg file. Submit another round of integration jobs:
Repeat this process for both experiments, incrementing the trial numbers and creating a new trial phil file each time until the unit pixel translations have converged and the rmsd is less than 1 unit pixel. Once the final unit pixel metrology for experiments e239 and e240 is complete, we have 6669 and 99073 integrated pickle files.   
  for m in 21 22 23 24 25 26 27; \
  do echo $m; cxi.lsf -c ~/myrelease/cxi84914/L498-thermolysin-21.cfg \
  -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e157/ \
  -i /reg/d/psdm/cxi/cxi84914/xtc/e157 -q psanacsq -p 8 -x 157 -r ${m} -t 2; done


Evaluate metrology again:
The next step is to incorporate the sub pixel translations and rotations into a new phil file and integrate each experiment once more.
cspad.metrology data=/reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e157/r002[1-7]/002/integration bravais_setting_id=12 max_frames=1500 min_count=25 detector_format_version="CXI 5.1" cxi84914/L498-thermolysin-t002.phil | tee L498-21-t002.unit


==Add sub-pixel corrections==  
==Add sub-pixel corrections==  
There are negligible changes in the unit-translations (non-zero only at large radius), and the rmsd is now 0.96 pixels.  We'll leave the unit-translations exactly where they are now, and add subpixel translations and rotations to the phil file.  These are taken from the very end of the log file (L498-21-t002.unit).  Incorporate these into [[L498-thermolysin-t003.phil]].
The rmsd is now on the sub pixel level (less than 0.8 or both e239 and e240).  We'll leave the unit-translations exactly where they are now. No sub pixel translations were applied to these data, we move to merging, however for this integration round we'll increase our integration limits to 1.8 Angstroms.


=Integrate the data=
=Integrate the data=
We're ready for the final integration trial, which will be t003.  Again edit the configuration file Gd-thermolysin-27.cfg so that it points to the latest phil file, L498-thermolysin-t003.phil.  Submit integration jobs:
We're ready for the final integration trial.  Again edit the configuration file Gd-Lysozyme-27.cfg and Gd-Lysozyme-3.cfg so that they points to the latest phil file, with the converged unit pixel metrologies using .phil files [[Gd-Lysozyme-27-t007.phil]] and  [[Gd-Lysozyme-3-t006.phil]]
We use the trials from each experiment where the most images were integrated (only difference was e239 was indexed at 1.8 Angstrom resolution and e240 was indexed at 1.9 Angstrom resolution). Start with merging these indexed images first.
 
  Submit integration jobs:
  for m in 27 28 29 31; \
  for m in 27 28 29 31; \
   do echo $m; cxi.lsf -c ~/myrelease/cxi84914/L498-thermolysin-27.cfg \
   do echo $m; cxi.lsf -c ~/myrelease/cxi84914/Gd-Lysozyme-27.cfg \
   -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e157/ \
   -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e239/ \
   -i /reg/d/psdm/cxi/cxi84914/xtc/e157 -q psanacsq -p 8 -x 157 -r ${m} -t 3; done
   -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -p 8 -x 239 -r ${m} -t 7; done


…19980 lattices indexed and integrated.  If the second lattice is desired (about 10% of images have two) set indexing.outlier_detection.switch=True in the phil file, and integrate again <em> under a new trial number (-t 4)</em>.
  for m in 3 4 5 6 7 10 12 14 16 17 18 19 21 24 25 26 27 28 29 30 32 33 34 35 36 37 38 39 40; \
  do echo $m; cxi.lsf -c ~/myrelease/cxi84914/Gd-Lysozyme-3.cfg \
  -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e240/ \
  -i /reg/d/psdm/cxi/cxi84914/xtc/e240 -q psanacsq -p 8 -x 239 -r ${m} -t 6; done


Trial 005: Try first-lattice again, correcting an omission of the mask_pixel_value=-2 phil parameter in trial 003.
=Merge the data=


…19987 lattices indexed and integrated in trial 005.
On pslogin, download the reference structure 4etc from the pdb (must be done on pslogin, the only outward-facing host).  4etc.pdb will be used for per-image scaling; 4etc.mtz will be used for reporting correlation to isomorphous synchrotron data.


=Merge the data=
phenix.fetch_pdb --mtz 4etc
on pslogin, download the reference structure 2tli from the pdb (must be done on pslogin, the only outward-facing host)2tli.pdb will be used for per-image scaling; 2tli.mtz will be used for reporting correlation to isomorphous synchrotron data.
 
back to psana.  Get the command file [[mergeLysozyme0.csh]].  
  mkdir /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/
cd /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/
./mergeLysozyme0.csh e239 trail# e240 trial#
 
[[GdLysozymenoamo_4etc.log | Log file with model scaling and no anomalous flag]]
 
[[GdLysozymeanom_4etc.log | Log file with model scaling and anomalous flag]]
 
 
Trying to merge not scaling to a model:
We will merge the data as a new structure see [http://viper.lbl.gov/cctbx.xfel/index.php/Advanced_Merging  Advanced Merging Tutorial] in particular use case 2 (new structure).


phenix.fetch_pdb --mtz 2tli
Get the command file [[mergeLysozyme.csh]]. We are merging all the runs from both experiments.  


back to psana.  Get the command file [[mergethermo.csh]].
  mkdir /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge
  mkdir /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/e157
  cd /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge
  cd /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/e157
  ./mergeLysozyme.csh e239 trail# e240 trial#
  ./mergethermo.csh


[[mergethermo.log | Log file]]
The statistics resulting from merging over 105,000 images are contained here in the [[mergeLysozyme.log | Log file]].


[[mergethermo005.log | Better log file from trial 005--using untrusted pixel mask]]
In summary, out of 105,912 indexed images there were:
  105526 of 105912 integration files were accepted
  284 rejected due to wrong Bravais group
  3 rejected for unit cell outliers
  99 rejected for low signal
  0 rejected due to poor correlation
  0 rejected for file errors or no reindex matrix


=Solve the structure=
=Solve the structure=
Line 566: Line 666:
Some quick commands to evaluate the data.  Here the result *.mtz files must be moved back to your $HOME directory and transferred to your laptop to run PHENIX:
Some quick commands to evaluate the data.  Here the result *.mtz files must be moved back to your $HOME directory and transferred to your laptop to run PHENIX:


  cp /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/e157/thermoanom_2tli_s0_mark0.mtz $HOME
  cp /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/GdLysozymeanom_4etc_s0_mark0.mtz $HOME
  cp /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/e157/thermonoanom_2tli_s0_mark0.mtz $HOME
  cp /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/GdLysozymenoanom_4etc_s0_mark0.mtz $HOME


Always use the "s0" files; "s1" and "s2" are the semi-datasets used only to calculate CC1/2.
Always use the "s0" files; "s1" and "s2" are the semi-datasets used only to calculate CC1/2.
  phenix.fetch_pdb --mtz 2tli
  phenix.fetch_pdb --mtz 4etc
  phenix.xtriage thermonoanom_2tli_s0_mark0.mtz scaling.input.xray_data.obs_labels=Iobs > triage_noanom.log
  phenix.xtriage GdLysozymenoanom_4etc_s0_mark0.mtz scaling.input.xray_data.obs_labels=Iobs > triage_noanom.log
phenix.xtriage GdLysozymeanom_4etc_s0_mark0.mtz scaling.input.xray_data.obs_labels=Iobs > triage_anom.log


Wilson B factor:14.8
Wilson B factor: 15.58 no anomalous


phenix.automr 2tli.pdb thermonoanom_2tli_s0_mark0.mtz seq_file=2tli.fa identity=100 copies=1 build=False
Wilson B factor: 15.12 anomalous


The MR-placed model is in ./AutoMR_run_1_/MR.1.pdb.
Use Gd-Lysozyme anomalous structure to get Gd sites:
phenix.fetch_pdb --mtz 4n5r


Obtain the set of R-free-flags used in the Nature Methods paper
 
wget http://cci.lbl.gov/publications/download/4ow3_original_iobs_flags.mtz
Obtain the set of R-free-flags using those generated in the refinement from 4n5r.pdb and extending the resolution from 1.5 -  100 using phenix
(new R-free-flags file is reflections.mtz)
generate restraints for the "DO3" ligands using phenix.elbow


Refine the model
Refine the model
  phenix.refine ./AutoMR_run_1_/MR.1.pdb refinement.output.prefix=001 \
  phenix.refine 4n5r.pdb [[DO3.cif]] refinement.output.prefix=001 \
   xray_data.file_name=thermonoanom_2tli_s0_mark0.mtz \
   xray_data.file_name=GdLysozymeanom_4etc_s0_mark0.mtz \
   xray_data.r_free_flags.file_name=4ow3_original_iobs_flags.mtz \
   xray_data.r_free_flags.file_name=reflections.mtz \
   xray_data.r_free_flags.label=R-free-flags \
   xray_data.r_free_flags.label=FreeR_flag \
   main.number_of_macro_cycles=6 optimize_xyz_weight=True \
   main.number_of_macro_cycles=7 optimize_xyz_weight=True \
   optimize_adp_weight=True nproc=20 refinement.input.xray_data.labels=IMEAN \
   optimize_adp_weight=True nproc=20 refinement.input.xray_data.labels=Iobs unit_cell="78.033, 78.033, 38.833" \
   ordered_solvent=true ordered_solvent.mode=every_macro_cycle
   ordered_solvent=true ordered_solvent.mode=every_macro_cycle
trial 003: RWORK = 23.1% RFREE = 26.5% out to 2.1 Angstrom


trial 005: RWORK = 22.0% RFREE = 26.3% out to 2.1 Angstrom (taking untrusted pixels into account)
 
RWORK = 20.25% RFREE = 23.7% out to 1.9 Angstrom


Run an ersatz script (provided by Nat Echols) to measure the peak heights of the anomalous scatterers.
Run an ersatz script (provided by Nat Echols) to measure the peak heights of the anomalous scatterers.


  libtbx.python [[map_height_at_atoms.py]] \
  libtbx.python [[map_height_at_atoms.py]] \
   001_001.pdb thermoanom_2tli_s0_mark0.mtz \
   001_001.pdb GdLysozymeanom_4etc_s0_mark0.mtz \
  selection="element GD"
   input.xray_data.labels=Iobs \
   input.xray_data.labels=Iobs \
   xray_data.r_free_flags.file_name=4ow3_original_iobs_flags.mtz \
   xray_data.r_free_flags.file_name=001_data.mtz \
   xray_data.r_free_flags.label=R-free-flags
   xray_data.r_free_flags.label=R-free-flags


Promising results for the Zn and one Ca:
Promising results for the two Gd sites:
<pre>
pdb="ZN    ZN A 317 " :  5.97 sigma
pdb="CA    CA A 318 " :  0.93 sigma
pdb="CA    CA A 319 " :  2.74 sigma
pdb="CA    CA A 320 " :  1.08 sigma
pdb="CA    CA A 321 " :  0.20 sigma
</pre>
 
Better results from trial 005:
<pre>
<pre>
pdb="ZN   ZN A 317 " :   7.66 sigma
pdb="GD   GD A 201 " : 17.85 sigma
pdb="CA   CA A 318 " :   1.10 sigma
pdb="GD   GD A 202 " : 23.46 sigma
pdb="CA    CA A 319 " :  2.52 sigma
pdb="CA    CA A 320 " :  1.40 sigma
pdb="CA    CA A 321 " :  1.01 sigma


</pre>
</pre>

Latest revision as of 21:55, 6 October 2014

In this tutorial, we assume that we are handed an SFX dataset containing Lysozyme diffraction, but are not told anything else. We will have to go through all the data runs, figure out which one is to be used for dark subtraction, and account for untrusted pixels and detector metrology. At this point, we will be prepared to integrate and merge the data. Finally, we will perform simple molecular replacement and ask whether there is any Gd signal in the anomalous difference Fourier.

Discovery of data collection parameters

Log in to pslogin.slac.stanford.edu, and then to psana. Carry through flags so that X-windows will work

ssh -YAC $USER@pslogin.slac.stanford.edu
ssh -YAC psana

Go in to the working directory and source the package manager:

cd ~/myrelease
sit_setup

Create a subdirectory for the 2014 tutorial files if not already done:

mkdir -p cxi84914

List out the Gd-Lysozyme XTC files (this could take time since there are many images):

ls /reg/d/psdm/cxi/cxi84914/xtc/e239
ls /reg/d/psdm/cxi/cxi84914/xtc/e240

Notice that there are numerous runs in the directory. Now we will create composite averages for each run. Grab this configuration file: mkdark_Gd-Lysozyme.cfg and put it in your cxi84914 directory. For one run only from each experiment directory:

cxi.lsf -c ~/myrelease/cxi84914/mkdark_Gd-Lysozyme.cfg \
-o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e239/ \
-i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -s -p 8 -x 239 -r 16 -t 0
cxi.lsf -c ~/myrelease/cxi84914/mkdark_Gd-Lysozyme.cfg \
-o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e240/ \
-i /reg/d/psdm/cxi/cxi84914/xtc/e240 -q psanacsq -s -p 8 -x 240 -r 16 -t 0

Take note:

  • -c configuration file
  • -o output directory (will be created)
  • -i input files (directory containing the XTC streams)
  • -q which batch queue to use
  • -s funnel all streams for the run into one node (takes longer, but necessary for averaging)
  • -p number of cores to use on the node
  • -x which experiment number
  • -r which run number
  • -t which processing trial (auto increments from 0 if not given)

For all the runs in the Gd-Lysozyme data set 239:

kinit
aklog
for m in 27 28 29 30 31; \
 do echo $m; cxi.lsf -c ~/myrelease/cxi84914/mkdark_Gd-Lysozyme.cfg \
 -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e239/ \
 -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -s -p 8 -x 239 -r ${m} -t 0; done

For all the runs in the Gd-Lysozyme data set 240:

kinit
aklog
for m in 1 2 3 4 5 6 7 8 9 10 12 13 14 16 17 18 19 21 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40; \
 do echo $m; cxi.lsf -c ~/myrelease/cxi84914/mkdark_Gd-Lysozyme.cfg \
 -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e240/ \
 -i /reg/d/psdm/cxi/cxi84914/xtc/e240 -q psanacsq -s -p 8 -x 240 -r ${m} -t 0; done


bjobs lists all your batch jobs; use this form for more information including other-user load:

bjobs -w -u all -q psanacsq
bkill [number] # stops unwanted job

Some runs take up to 2 hrs wall time to average. Find the averages, view the max-composites, and list out header information for each experiment separately and create a table of the results:

ls /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e239/r*/000/out/*.pickle
cctbx.image_viewer `find /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e239/r*/000 -name "max*.pickle"`
for m in `find /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e239/r*/000 -name "max*.pickle"`; 
 do echo $m; cxi.print_pickle $m; echo; done

Let's make a table of the results for experiment 239:

Run Distance Wavelength Diffraction Comments
27 81.0 1.456964 weak powder
28 81.0 1.456955 strong powder
29 81.0 1.4569557 strong powder
30 81.0 1.4569529 dark
31 81.0 1.456949 strong powder

The next Gd-Lysozyme experiment:

ls /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e240/r*/000/out/*.pickle
cctbx.image_viewer `find /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e240/r*/000 -name "max*.pickle"`
for m in `find /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e240/r*/000 -name "max*.pickle"`; 
 do echo $m; cxi.print_pickle $m; echo; done

Let's make a table of the results for experiment 240:

Run Distance Wavelength Diffraction Comments
1 156.0 1.145418 dark
2 101.0 1.454165 strong powder
3 81.0 1.454175 strong powder
4 81.0 1.454175 strong powder
5 81.0 1.454175 strong powder
6 81.0 1.454176 strong powder
7 81.0 1.454176 strong powder
8 97.735 1.453730 dark
9 171.0 1.4569401 weak powder
10 81.0 1.456952 strong powder
12 81.0 1.45695 weak powder
13 81.0 1.45815 dark
14 81.0 1.456952 strong powder
16 81.0 1.45697 strong powder
17 81.0 1.457426 weak powder
18 81.0 1.467101 very weak powder
19 81.0 1.456950 strong powder
21 81.0 1.456949 strong powder
24 81.0 1.456950 very weak powder
25 81.0 1.456958 strong powder
26 81.0 1.4569358 strong powder
27 81.0 1.4569524 strong powder
28 81.0 1.456964 very weak powder
29 81.0 1.4570738 strong powder saturated spots
30 81.0 1.4569528 strong powder saturated spots
31 81.0 1.4569488 dark
32 81.0 1.456953 strong powder
33 81.0 1.4569476 strong powder
34 81.0 1.4569505 strong powder
35 81.0 1.456966 strong powder
36 81.0 1.456948 strong powder
37 81.0 1.4573264 strong powder
38 81.0 1.4569576 strong powder
39 81.0 1.4569520 strong powder
40 81.0 1.4569505 strong powder

Some conclusions:

  • Run 30 was the dark run for experiment 239. We'll use the average and standard deviation for further processing. Tutorial students can take result from the instructor's directory:
/reg/d/psdm/cxi/cxi84914/scratch/tmmclark/initial_dark/e239/r0030/000/out/avg-r0030.pickle
/reg/d/psdm/cxi/cxi84914/scratch/tmmclark/initial_dark/e239/r0030/000/out/stddev-r0030.pickle
  • Runs 1, 8, 13 and 31 were dark runs for experiment 240. We will discard runs 1 and 8 since the detector distances are unique and would require separate detector calibration. Either run 13 or 31 will work so we'll use the average and standard deviation of run 31 for further processing of experiment 240. Tutorial students can take result from the instructor's directory:
/reg/d/psdm/cxi/cxi84914/scratch/tmmclark/initial_dark/e240/r0031/000/out/avg-r0031.pickle
/reg/d/psdm/cxi/cxi84914/scratch/tmmclark/initial_dark/e240/r0031/000/out/stddev-r0031.pickle
  • We are interested in getting the Gd anomalous signal from lysozyme, runs were collected at the far remote for f” at approximately 8500 eV or 1.457 Angstroms. See the X-ray Anomalous Scattering.
  • We'll accept runs 27-29 and 31 ("calibration27") for experiment 239, and in experiment 240 runs 3-7, 10, 11, 14-19, 23-30 and 32-40 ("calibration3"). We'll also discard run 9 of experiment 240 as the diffraction was relatively weak and the unique detector distance would require separate detector calibration.

Prepare to mask out the untrusted pixels

We'll now calculate dark-subtracted averages for experiment 239.

for m in 27 28 29 31; \
 do echo $m; cxi.lsf -c ~/myrelease/cxi84914/mkavg_e239.cfg \
 -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/ \
 -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -s -p 8 -x 239 -r ${m} -t 0; done

This repeats exactly the same averaging calculations as before, except the dark average from run 30 is subtracted. The dark image to be subtracted (along with its std deviation) is defined in the configuration file mkavg_e239.cfg.

Now we'll figure out which pixels are untrusted, and thus should not be integrated. Three criteria will be used:

  • Hot pixels--on the average-dark the pixel values exceed 1150 (should be fine tuned by inspecting the dark & using trial and error)
  • Hot pixels--on the standard deviation-dark the stddev exceeds 4 and therefore unreliable (also should be fine tuned by trial and error)
  • Cold pixels or shadows--on a maximum-composite data image, inspect values and set a minimum threshold value (we choose 15 here)

For more information on masking parameters see creating a mask image

cxi.make_mask -v --maxproj_min 15 --avg_max 1150 --stddev_max 4 --output mask_base.pickle \
 /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e239/r0030/000/out/avg-r0030.pickle \
 /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e239/r0030/000/out/stddev-r0030.pickle \
 /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/r0027/000/out/max-r0027.pickle

Inspect the mask:

cctbx.image_viewer mask_base.pickle show_untrusted=true

Non-bonded pixels are masked and untrusted regions of high and low/negative intensity.

Now We need to repeat the procedure above to calculate dark-subtracted averages for experiment 240.

for m in 3 4 5 6 7 10 12 14 16 17 18 19 21 23 24 25 26 27 28 29 20 32 33 34 35 36 37 38 39 40; \
 do echo $m; cxi.lsf -c ~/myrelease/cxi84914/mkavg_e240.cfg \
 -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/ \
 -i /reg/d/psdm/cxi/cxi84914/xtc/e240 -q psanacsq -s -p 8 -x 240 -r ${m} -t 0; done

This repeats exactly the same averaging calculations as before, except the dark average from run 31 is subtracted. The dark image to be subtracted (along with its std deviation) is defined in the configuration file mkavg_e240.cfg.

Now we'll figure out which pixels are untrusted, and thus should not be integrated. Three criteria will be used:

  • Hot pixels--on the average-dark the pixel values exceed 1250 (should be fine tuned by inspecting the dark & using trial and error)
  • Hot pixels--on the standard deviation-dark the stddev exceeds 4 and therefore unreliable (also should be fine tuned by trial and error)
  • Cold pixels or shadows--on a maximum-composite data image, inspect values and set a minimum threshold value (we choose 14 here)

For more information on masking parameters see creating a mask image

cxi.make_mask -v --maxproj_min 14 --avg_max 1250 --stddev_max 4 --output mask_base.pickle \
 /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e240/r0031/000/out/avg-r0031.pickle \
 /reg/d/psdm/cxi/cxi84914/scratch/$USER/initial_dark/e240/r0031/000/out/stddev-r0031.pickle \
 /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0018/000/out/max-r0018.pickle

Inspect the mask:

cctbx.image_viewer mask_base.pickle show_untrusted=true

Non-bonded pixels are masked and untrusted regions of high and low/negative intensity.

Correct the detector metrology

Accurate data integration requires highly precise knowledge of pixel positions in laboratory space (metrology). Gaining this knowledge is especially difficult due to the segmented nature of the CSPAD detector, which is tiled into 64 application-specific integrated circuits (ASICs). The 64 ASICs are arranged in quadrants that are approximately 4-fold rotationally symmetric, with 8 sensors per quadrant and 2 ASICs per sensor. The sensors are field-serviceable, and may therefore change from Run to Run.

We thus need to determine positions and rotations for all 64 tiles, ideally down to an accuracy on order of 10 microns. As a general overview, cctbx takes the following steps:

  • Tile placement in physical space is measured by the beamline operators optically using electron microscopy. This is done at the per-sensor level (2 ASICs per sensor). This is already hard-coded; nothing for the user to do.
  • Relative positions of the quadrants are determined coarsely by considering powder rings.
  • Sensor positions are refined based on Bragg spot diffraction, allowing for whole-pixel translations in x and y.
  • ASIC positions are refined to subpixel accuracy based on Bragg diffraction, allow for sub-pixel translations and rotations.

Quadrant positions

The electron microscopy step above determines only the sensor positions relative to the frames of reference of each quadrant; but not the absolute position of each quadrant in space. At the CXI instrument, the forward detector DS1 has rail-mounted quadrants to allow re-sizing of the central hole. The quadrant placement should be assessed for both the forward and back (DS2) detectors.

For lysozyme we examine one of the strong images (maximum composite). Students may use the instructor's files ($USER=tmmclark). We first determine that our image has a timestamp that identifies it within cctbx as being from run 7 (2013):

cxi.print_pickle /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/r0028/000/out/max-r0028.pickle 
 > Detector format version: CXI 7.1
cxi.detector_format_versions
cxi.detector_format_versions "CXI 7.1"
cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/r0028/000/out/max-r0028.pickle \
 distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
 viewer.calibrate_unitcell.unitcell=79,79,38,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212

The "Settings" GUI panel shows detector distance as well as the hard-coded quadrant positions corresponding to "CXI 7.1", namely [2, -6, 3, -6, -7, 0, -1, -4]. Tile translations have been zeroed out in the code. The settings can be changed in the panel, or alternately given as a separate command line parameter (distl.quad_translations=2,-6,3,-6,-7,0,-1,-4). The object is to align the powder pattern with the predicted rings (red circles) based on the unit cell parameters. It can be seen that the alignment is not quite perfect.

Since we have well-formed powder rings, we can run the automatic quadrant positioning tool:

cspad.quadrants /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/r0028/000/out/max-r0028.pickle \
 distl.detector_format_version="CXI 7.1"
> The NEW QUAD translations are: [2, -6, 4, -6, -6, 1, 0, -4]

Try a few different max-composites from runs 27, 29 and 31:

The NEW QUAD translations are: [2, -6, 4, -6, -9, 2, 0, -4] (poor self correlation value approx. 0.06 weak powder diffraction for this run)

The NEW QUAD translations are: [3, -4, 5, -6, -6, 1, 0, -4]

The NEW QUAD translations are: [2, -6, 4, -6, -6, 1, 0, -4]

These can be pasted on to the command line for graphical review:

cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e239/r0028/000/out/max-r0028.pickle \
 distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
 viewer.calibrate_unitcell.unitcell=79,79,38,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212\
 distl.quad_translations=2, -6, 4, -6, -6, 1, 0, -4

This looks slightly better. From the GUI it also appears that distance=81 fits; meaning that the detz_offset for the configuration file of 571 is ok.

Now we need to check e240:

For lysozyme we examine one of the strong images (maximum composite). Students may use the instructor's files ($USER=nksauter). We first determine that our image has a timestamp that identifies it within cctbx as being from run 7 (2013):

cxi.print_pickle /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0029/000/out/max-r0029.pickle 
 > Detector format version: CXI 7.1
cxi.detector_format_versions
cxi.detector_format_versions "CXI 7.1"
cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0019/000/out/max-r0029.pickle \
 distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
 viewer.calibrate_unitcell.unitcell=79,79,38,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212

The "Settings" GUI panel shows detector distance as well as the hard-coded quadrant positions corresponding to "CXI 7.1", namely [2, -6, 3, -6, -7, 0, -1, -4]. Tile translations have been zeroed out in the code. The settings can be changed in the panel, or alternately given as a separate command line parameter (distl.quad_translations=2,-6,3,6,-7,0,-1,-4. The object is to align the powder pattern with the predicted rings (red circles) based on the unit cell parameters. It can be seen that the alignment is not quite perfect.

Since we have well-formed powder rings, we can run the automatic quadrant positioning tool:

cspad.quadrants /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0029/000/out/max-r0029.pickle \
 distl.detector_format_version="CXI 7.1"
> The NEW QUAD translations are:  [2, -6, 4, -6, -6, 1, 0, -4]

Try a few different max-composites from runs 3, 7, 17, 27:

The NEW QUAD translations are: [2, -5, 4, -6, -6, 1, 0, -3]

The NEW QUAD translations are: [2, -5, 4, -6, -6, 1, 0, -4]

The NEW QUAD translations are: [19, -6, 3, -6, -11, 11, 1, -1] (very poor self-correlation coefficient around 0.06 run 17 is weak powder and not a good run to use for the metrology)

The NEW QUAD translations are: [3, -4, 4, -6, -6, 1, 0, -4]


These can be pasted on to the command line for graphical review:

cxi.view /reg/d/psdm/cxi/cxi84914/scratch/$USER/averages/e240/r0029/000/out/max-r0029.pickle \
 distl.detector_format_version="CXI 7.1" viewer.calibrate_unitcell.d_min=10 \
 viewer.calibrate_unitcell.unitcell=79,79,38,90,90,90 viewer.calibrate_unitcell.spacegroup=P43212\
 distl.quad_translations= 2, -6, 4, -6, -6, 1, 0, -4

This looks slightly better. We will continue in the next section to test the distance and find the best value.

Data Discovery Phase

Before we can move forward with metrology on the pixel and sub pixel level, we must test if the data can be indexed and index a test image. Beginning with e239 we will choose run 27 and run hit finding and dump the images for trial 0. The configuration file Gd-Lysozyme-27_discover.cfg (for configuration run 27) names our phil parameter file Gd-Lysozyme-t000.phil (t000 means trial 0)] for trial 0 we comment out the target cell and setting. Modules that are commented out in the config file can be uncommented for indexing once we discover the correct parameters.

 cxi.lsf -c ~/myrelease/cxi84914/e239/Gd-Lysozyme-27_discover.cfg \
 -o /path to home/myrelease/cxi84914/e239/ \
 -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -p 8 -x 239 -r 27 -t 0

Nothing can progress until we successfully index an image. The next steps involve adjusting the configuration file so that we can just look the images and find one that should index. First we make a new configuration file that just does a hit find and a raw image dump and examine images to find one with "good" diffraction. We choose the s00 stream as this will give a thin slice of the entire run and we can examine all images from this run using the command:

 cctbx.image_viewer shot-s00-20130315225*

Looking through these images the best is the third image (shot-s00-20130315225354116.pickle). First we check using this image if spot finder has the correct parameters by using the .phil file parameters in image viewer as follows:

 distl.image_viewer distl.minimum_signal_height=5 distal.minimum_spot_height=10 \
 distal.minimum_spot_area=1 shot-s00-20130315225354116.pickle

The spot_finder is finding the spots correctly so it seems there is another issue. In examine the images it is clear that the water rings are not at the expected resolution of approximately 3.5 angstroms but rather around 1.8 angstroms indicating that the detz_offset is incorrect (way off) and indexing is not going to work until this is fixed.

This brings us to trial 1. The detz_offset is incorrect and we aren't in the ball park so first we will try a detz_offset = 591 which puts the detector at 100 mm away to make the water rings be closer to a 3.5 angstrom resolution. To do this we edit Gd-Lysozyme-27_discover.cfg by uncomment the indexing module (also comment out my_ana_pkg.mod_dump:img_thresh, if you don't want to dump the raw images a second time) and change detz_offset to 591. Then run the following command:

  cxi.lsf -c ~/myrelease/cxi84914/e239/Gd-Lysozyme-27_discover.cfg \
 -o /reg/d/psdm/cxi/cxi84914/scratch/USERNAME/results/e239/ \
 -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq 
 -p 8 -x 239 -r 27 -t 1

Looking at ~/myrelease/cxi84914/e239/r0027/stdout log file s00.out the unit cell found (86.3581, 86.3581, 42.8606, 90, 90, 90) is not the correct one and many images did not index at all. We need to try to get to a value for detz_offset that is in the correct range. For trial 2 we will use a detz_offset of 581, so the detz_offset in Gd-Lysozyme-27_discover.cfg is changed t0 581 and we run the following command:

  cxi.lsf -c ~/myrelease/cxi84914/e239/Gd-Lysozyme-27_discover.cfg \
 -o /reg/d/psdm/cxi/cxi84914/scratch/USERNAME/results/e239/ \
 -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -p 8 -x 239 -r 27 -t 2

Checking the log file the indexed unit cell is (78.1073, 78.1073, 39.1608, 90, 90, 90) which is much closer to what we expect and checking how many indexed images are in r0027/002/out

 ls -l |wc -1

counts all files in current directory and we get 188 indexed images out of a total of 303 total images in our image dump of r0027 for trial 0. Before accepting detz_offset of 581 we will try a series of detz_offset values in the range of 575 - 595. Go to your my release in your home directory and create a new directory called diet_trials; cd into it and run this command to create new configuration files with the test detz_offset values.

 for i in `seq 575 595`; do vi -c "%s/581/$i/g" -c "w Gd-Lysozyme-27_discover0$i.cfg" \
 -c q\! ../cxi84914/e239/Gd-Lysozyme-27_discover0.cfg ; done

Here, a vi command is executed repeatedly that searches for the number 581 in your config file and replaces it with a number from 575 to 595, then writes out the new file with an appropriate file name. Note that the number was originally 571, and has already been optimized to 581 through earlier trials.

Next, submit indexing jobs for each candidate detz_offset from your myrelease folder:

for i in `seq 575 595`; do cxi.lsf -c dist_trials/Gd-Lysozyme-27_discover0$i.cfg \
-o /reg/d/psdm/cxi/cxi84914/scratch/$USER/dist_trials/ -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -r 27 
-q psanacsq -p 8 -t $i; done

When complete, go to your results directory:

cd /reg/d/psdm/cxi/cxi84914/scratch/$USER/dist_trials/r0027

Then determine which detz offset is best:

for i in `ls`; do echo -n "$i "; ls $i/out | wc -l; done

Output for r0027:

detz_offset:        Indexed images:
575                          80
576                          98
577                          120
578                          153
579                          182
580                          185
581                          187
582                          159
583                          154
584                          142
585                          120
586                          116
587                          103
588                          85
589                          81
590                          68
591                          64
592                          58
593                          60
594                          53
595                          50

The detz_offset of 581 gives the most indexed images. So now we move to the next step in the metrology (unit-pixel tile positions). Repeating this Test for the detz_offset of e240 using the same range of values yields a detz_offset of 580.

Unit-pixel tile positions

We now know the correct detz_offset (581) and we can change the Gd-Lysozyme-t000.phil file by uncommenting the target unit cell and known setting and change the confif file to only do the indexing and integration with a detz_offset=581. Now we will index the data to derive model lattices. The configuration file Gd-Lysozyme-27.cfg (for configuration run 27) names our phil parameter file Gd-Lysozyme-t003.phil (t003 means trial 3)]. We'll then compare model and observation, from which we can deduce better metrology. This will be done for both experiments 239 and 240 separately.

for m in 27 28 29 31; \
 do echo $m; cxi.lsf -c ~/myrelease/cxi84914/e239/Gd-Lysozyme-27.cfg \
 -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e239/ \
 -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -p 8 -x 239 -r ${m} -t 3; done
bkill 0 # stop all jobs; wrong file path

Now for e240 with detz_offset of 580 and a different dark path we need a different config file Gd-Lysozyme-3.cfg (for configuration run 3) the .phil file for the two experiments is the same in this trial 1. I have separated the configuration and phil files for each experiment in ~/myrelease/cxi84914 by creating e239 and e240 directories to place the files.

  for m in 3 4 5 6 7 10 12 14 15 16 17 18 19 21 24 25 26 27 28 29 30 32 33 34 35 36 37 38 39 40; \
  do echo $m; cxi.lsf -c ~/myrelease/cxi84914/e240/Gd-Lysozyme-3.cfg   \
 -o /reg/d/psdm/cxi/cxi84914/scratch/tmmclark/results/e240/ -i /reg/d/psdm/cxi/cxi84914/xtc/e240/ \
 -q psanacsq -p 8 -x 240 -r ${m} -t 1; done

A quick command to count how many integration files there are in e239 and e240:

find /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e239/*/003/integration -name "int*.pickle"|wc -l
find /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e240/*/001/integration -name "int*.pickle"|wc -l

We have 6427 and 98808 integrated pickles for e239 and e240 respectively.

Determine whole-pixel translations for all sensors on the CSPAD for e239.

cspad.metrology data=/reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e239/r*/003/integration \
bravais_setting_id=9 max_frames=1000 min_count=25 \
detector_format_version="CXI 7.1" ~/myrelease/cxi84914/e239 \
Gd-Lysozyme-t003.phil | tee Gd-27-t003.unit

Determine whole-pixel translations for all sensors on the CSPAD for e240.

cspad.metrology data=/reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e240/r*/001/integration \
bravais_setting_id=9 max_frames=1000 min_count=25 \
detector_format_version="CXI 7.1" ~/myrelease/cxi84914/e240 \
Gd-Lysozyme-t001.phil | tee Gd-27-t001.unit

List out the new unit translations for e239 and e240:

cat Gd-27-t003.unit |grep -A21 "Unit translations"
cat Gd-3-t001.unit |grep -A21 "Unit translations"

Results from three refinement cycles are listed; capture the last one and incorporate it into a new version of the integration phil file, Gd-Lysozyme-t004.phil.

Repeat this process for both experiments, incrementing the trial numbers and creating a new trial phil file each time until the unit pixel translations have converged and the rmsd is less than 1 unit pixel. Once the final unit pixel metrology for experiments e239 and e240 is complete, we have 6669 and 99073 integrated pickle files.

The next step is to incorporate the sub pixel translations and rotations into a new phil file and integrate each experiment once more.

Add sub-pixel corrections

The rmsd is now on the sub pixel level (less than 0.8 or both e239 and e240). We'll leave the unit-translations exactly where they are now. No sub pixel translations were applied to these data, we move to merging, however for this integration round we'll increase our integration limits to 1.8 Angstroms.

Integrate the data

We're ready for the final integration trial. Again edit the configuration file Gd-Lysozyme-27.cfg and Gd-Lysozyme-3.cfg so that they points to the latest phil file, with the converged unit pixel metrologies using .phil files Gd-Lysozyme-27-t007.phil and Gd-Lysozyme-3-t006.phil We use the trials from each experiment where the most images were integrated (only difference was e239 was indexed at 1.8 Angstrom resolution and e240 was indexed at 1.9 Angstrom resolution). Start with merging these indexed images first.

Submit integration jobs:
for m in 27 28 29 31; \
 do echo $m; cxi.lsf -c ~/myrelease/cxi84914/Gd-Lysozyme-27.cfg \
 -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e239/ \
 -i /reg/d/psdm/cxi/cxi84914/xtc/e239 -q psanacsq -p 8 -x 239 -r ${m} -t 7; done
 for m in 3 4 5 6 7 10 12 14 16 17 18 19 21 24 25 26 27 28 29 30 32 33 34 35 36 37 38 39 40; \
 do echo $m; cxi.lsf -c ~/myrelease/cxi84914/Gd-Lysozyme-3.cfg \
 -o /reg/d/psdm/cxi/cxi84914/scratch/$USER/results/e240/ \
 -i /reg/d/psdm/cxi/cxi84914/xtc/e240 -q psanacsq -p 8 -x 239 -r ${m} -t 6; done

Merge the data

On pslogin, download the reference structure 4etc from the pdb (must be done on pslogin, the only outward-facing host). 4etc.pdb will be used for per-image scaling; 4etc.mtz will be used for reporting correlation to isomorphous synchrotron data.

phenix.fetch_pdb --mtz 4etc

back to psana. Get the command file mergeLysozyme0.csh.

mkdir /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/
cd /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/
./mergeLysozyme0.csh e239 trail# e240 trial#

Log file with model scaling and no anomalous flag

Log file with model scaling and anomalous flag


Trying to merge not scaling to a model: We will merge the data as a new structure see Advanced Merging Tutorial in particular use case 2 (new structure).

Get the command file mergeLysozyme.csh. We are merging all the runs from both experiments.

mkdir /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge
cd /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge
./mergeLysozyme.csh e239 trail# e240 trial#

The statistics resulting from merging over 105,000 images are contained here in the Log file.

In summary, out of 105,912 indexed images there were:

 105526 of 105912 integration files were accepted
 284 rejected due to wrong Bravais group
 3 rejected for unit cell outliers
 99 rejected for low signal
 0 rejected due to poor correlation
 0 rejected for file errors or no reindex matrix

Solve the structure

Some quick commands to evaluate the data. Here the result *.mtz files must be moved back to your $HOME directory and transferred to your laptop to run PHENIX:

cp /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/GdLysozymeanom_4etc_s0_mark0.mtz $HOME
cp /reg/d/psdm/cxi/cxi84914/scratch/$USER/merge/GdLysozymenoanom_4etc_s0_mark0.mtz $HOME

Always use the "s0" files; "s1" and "s2" are the semi-datasets used only to calculate CC1/2.

phenix.fetch_pdb --mtz 4etc
phenix.xtriage GdLysozymenoanom_4etc_s0_mark0.mtz scaling.input.xray_data.obs_labels=Iobs > triage_noanom.log
phenix.xtriage GdLysozymeanom_4etc_s0_mark0.mtz scaling.input.xray_data.obs_labels=Iobs > triage_anom.log

Wilson B factor: 15.58 no anomalous

Wilson B factor: 15.12 anomalous

Use Gd-Lysozyme anomalous structure to get Gd sites:

phenix.fetch_pdb --mtz 4n5r


Obtain the set of R-free-flags using those generated in the refinement from 4n5r.pdb and extending the resolution from 1.5 - 100 using phenix (new R-free-flags file is reflections.mtz) generate restraints for the "DO3" ligands using phenix.elbow

Refine the model

phenix.refine 4n5r.pdb DO3.cif refinement.output.prefix=001 \
 xray_data.file_name=GdLysozymeanom_4etc_s0_mark0.mtz \
 xray_data.r_free_flags.file_name=reflections.mtz \
 xray_data.r_free_flags.label=FreeR_flag \
 main.number_of_macro_cycles=7 optimize_xyz_weight=True \
 optimize_adp_weight=True nproc=20 refinement.input.xray_data.labels=Iobs unit_cell="78.033, 78.033, 38.833" \
 ordered_solvent=true ordered_solvent.mode=every_macro_cycle


RWORK = 20.25% RFREE = 23.7% out to 1.9 Angstrom

Run an ersatz script (provided by Nat Echols) to measure the peak heights of the anomalous scatterers.

libtbx.python map_height_at_atoms.py \
 001_001.pdb GdLysozymeanom_4etc_s0_mark0.mtz  \
 selection="element GD"
 input.xray_data.labels=Iobs \
 xray_data.r_free_flags.file_name=001_data.mtz \
 xray_data.r_free_flags.label=R-free-flags

Promising results for the two Gd sites:

pdb="GD    GD A 201 " :  17.85 sigma
pdb="GD    GD A 202 " :  23.46 sigma