Processing L498 thermolysin: Difference between revisions
(Created instructions to reproduce L498 thermolysin processing results.) |
(Cleanups, spell checks, etc.) |
||
Line 1: | Line 1: | ||
This page contains instructions for reproducing the results reported in the 2014 ''cctbx.xfel'' paper<ref>[http://dx.doi.org/10.1038/nmeth.2887 Hattne, J <i>et al.</i> Accurate macromolecular structures using minimal measurements from X-ray free-electron lasers. <i>Nat. Methods</i> <b>In press</b> (2014).]</ref>. The data are available from the [http://cxidb.org | This page contains instructions for reproducing the results reported in the 2014 ''cctbx.xfel'' paper<ref name="Hattne:2014">[http://dx.doi.org/10.1038/nmeth.2887 Hattne, J <i>et al.</i> Accurate macromolecular structures using minimal measurements from X-ray free-electron lasers. <i>Nat. Methods</i> <b>In press</b> (2014).]</ref>. The data are available from the [http://cxidb.org Coherent X-ray Imaging Data Bank] ([http://cxidb.org/id-23 CXIDB ID 23]; [http://cxidb.org/id-23-raw.html raw XTC files], 4.0 TiB), and must be downloaded to a local disk prior to starting the analysis. Furthermore, the [https://confluence.slac.stanford.edu/display/PSDM/Software+Distribution PSDM Software Distribution] must be [[Set up PSDM software | set up]], along with a test release and an empty analysis package. A [http://www.mysql.com MySQL] database is required to merge the integrated diffraction intensities. In the interest of keeping this guide as general as possible, no particular queuing system is assumed to be available. Processing time depends strongly on the computational resources available; using 48 processors on a 64-core 1.4 GHz Opteron-based computer, the analysis takes around 24 hours. The instructions assume some familiarity with ''cctbx.xfel'' and its installation and configuration procedure. | ||
== Installing a ''cctbx.xfel'' snapshot from March 28, 2013 == | == Installing a ''cctbx.xfel'' snapshot from March 28, 2013 == | ||
The thermolysin data for the ''cctbx.xfel'' paper was processed around March 28, 2013. Unfortunately, regular nightly releases are not available for that time, but a | The thermolysin data for the ''cctbx.xfel'' paper was processed around March 28, 2013. Unfortunately, regular nightly releases are not available for that time, but a custom source-code bundle has been prepared. This bundle differs from the regular ''cctbx'' bundles in that [http://adder.lbl.gov/labelit ''LABELIT''] is included, and that the directory layout is identical to that of a developer installation. To download and unpack the bundle in the current directory: | ||
$ wget http://adder.lbl.gov/cctbx.xfel/downloads/cctbx.xfel-20130328.tar.gz | $ wget http://adder.lbl.gov/cctbx.xfel/downloads/cctbx.xfel-20130328.tar.gz | ||
$ tar -xpvzf cctbx.xfel-20130328.tar.gz | $ tar -xpvzf cctbx.xfel-20130328.tar.gz | ||
Next, create a build directory (called <code><i>phenix-build-20130328</i></code> below, but that is an arbitrary choice), in which the sources are configured and compiled. The Python interpreter required to complete this step | Next, create a build directory (called <code><i>phenix-build-20130328</i></code> below, but that is an arbitrary choice), in which the sources are configured and compiled. The Python interpreter required to complete this step <em>must</em> be the one supplied by the PSDM Software Distribution. Once the PSDM software has been [[Set up PSDM software | set up]], this interpreter can be located using | ||
$ find $SIT_ROOT/sw/external/python -perm /0111 -type f -wholename "*/$SIT_ARCH/*/python" | $ find $SIT_ROOT/sw/external/python -perm /0111 -type f -wholename "*/$SIT_ARCH/*/python" | ||
To prepare the build directory using this interpreter | To prepare the build directory using this interpreter | ||
Line 14: | Line 14: | ||
$ <b><i>python</i></b> ../phenix-src-20130328/cctbx_project/libtbx/configure.py cxi_xdr_xes xfel | $ <b><i>python</i></b> ../phenix-src-20130328/cctbx_project/libtbx/configure.py cxi_xdr_xes xfel | ||
$ . setpaths.sh | $ . setpaths.sh | ||
where <code><b><i>python</i></b></code> is the path to the Python interpreter located using the previous find-command. Note that csh-users should run <code>source setpaths.csh</code> instead of <code>. setpaths.sh</code>. Then compile the sources | where <code><b><i>python</i></b></code> is the path to the Python interpreter located using the previous find-command. Note that csh-users should run <code>source setpaths.csh</code> instead of <code>. setpaths.sh</code>. Then compile the sources: | ||
$ make | $ make | ||
$ make | $ make | ||
Line 26: | Line 26: | ||
$ cd .. | $ cd .. | ||
$ scons | $ scons | ||
where <code><b><i>/path/to/test/release</i></b></code> and <code><b><i>my_ana_pkg</i></b></code> are the path to the test release and the name of the analysis package chosen while [[Set up PSDM software | setting up the PSDM software distribution]]. <code><b><i>/path/to</i></b></code> denotes the path to the directory containing the unpacked ''cctbx.xfel'' sources. The last step compiles the ''cctbx.xfel'' analysis modules. | where <code><b><i>/path/to/test/release</i></b></code> and <code><b><i>my_ana_pkg</i></b></code> are the path to the test release and the name of the analysis package chosen while [[Set up PSDM software | setting up the PSDM software distribution]], respectively. <code><b><i>/path/to</i></b></code> denotes the path to the directory containing the unpacked ''cctbx.xfel'' sources. The last step compiles the ''cctbx.xfel'' analysis modules. | ||
== | == Creating a dark image == | ||
To meaningfully process the thermolysin diffraction data, an average of all the images in a <i>dark run</i>—a run without any X-rays impinging on the detector—must be subtracted from the individual diffraction images. The configuration file below can be used to produce such an average, as well as an image of the standard deviation of all the pixels over the course of the run. | To meaningfully process the thermolysin diffraction data, an average of all the images in a <i>dark run</i>—a run without any X-rays impinging on the detector—must be subtracted from the individual diffraction images. The configuration file below can be used to produce such an average, as well as an image of the standard deviation of all the pixels over the course of the run. | ||
[pyana] | [pyana] | ||
modules = my_ana_pkg.mod_average | modules = my_ana_pkg.mod_average | ||
Line 47: | Line 45: | ||
<b><i>/path/to</i></b> refers to the directory containing the unpacked ''cctbx.xfel'' sources. All other options with values set in italics can be modified without adversely affecting averaging. The above file will use four simultaneous processes, and write the average and standard deviation images to files whose names start with <code>Ds1-avg</code> and <code>Ds1-stddev</code>, respectively, both in a directory called <code><i>r0031</i></code>. | <b><i>/path/to</i></b> refers to the directory containing the unpacked ''cctbx.xfel'' sources. All other options with values set in italics can be modified without adversely affecting averaging. The above file will use four simultaneous processes, and write the average and standard deviation images to files whose names start with <code>Ds1-avg</code> and <code>Ds1-stddev</code>, respectively, both in a directory called <code><i>r0031</i></code>. | ||
The data deposited at the CXIDB contains a dark run, <code>r0031</code>. To average the images in that run, save the above configuration file to disk, <i>e.g.</i> <code><i>L498-dark.cfg</i></code>, apply modifications as necessary, and execute | The data deposited at the [http://cxidb.org/id-23 CXIDB] contains a dark run, <code>r0031</code>. To average the images in that run, save the above configuration file to disk, <i>e.g.</i> <code><i>L498-dark.cfg</i></code>, apply modifications as necessary, and execute | ||
$ cxi.pyana -c <i>L498-dark.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0031-*.xtc | $ cxi.pyana -c <i>L498-dark.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0031-*.xtc | ||
where <b><i>/path/to/xtc/files</i></b> is the path to the directory containing the raw XTC files | where <b><i>/path/to/xtc/files</i></b> is the path to the directory containing the raw XTC files downloaded from [http://cxidb.org/id-23 CXIDB]. The files written to the <code><i>r0031</i></code> directory will have a current date stamp appended, which can safely be removed to simplify subsequent configuration files. | ||
$ cd <i>r0031</i> | $ cd <i>r0031</i> | ||
$ mv Ds1-avg20140308104336073.pickle Ds1-avg.pickle | $ mv <i>Ds1-avg20140308104336073.pickle</i> Ds1-avg.pickle | ||
$ mv Ds1-stddev20140308104336371.pickle Ds1-stddev.pickle | $ mv <i>Ds1-stddev20140308104336371.pickle</i> Ds1-stddev.pickle | ||
$ cd .. | $ cd .. | ||
Further details are available on the [[Preparatory steps#Create a dark average|Create a dark image]] page of the [[Tutorials|tutorials]]. | Note that the date stamp on the generated files above depends on the time of their creation. Further details are available on the [[Preparatory steps#Create a dark average|Create a dark image]] page of the [[Tutorials|tutorials]]. | ||
== | == Indexing the thermolysin data == | ||
A configuration file for processing the primary lattices in the thermolysin data is shown below. | A configuration file for processing the primary lattices in the thermolysin data is shown below. | ||
[pyana] | [pyana] | ||
modules = my_ana_pkg.mod_hitfind:threshold \ | modules = my_ana_pkg.mod_hitfind:threshold \ | ||
Line 87: | Line 83: | ||
integration_basename = int- | integration_basename = int- | ||
xtal_target = thermolysin27 | xtal_target = thermolysin27 | ||
The configuration file above instructs <code>mod_hitfind</code> to use 48 processes. It disables all image output, which reduces the | The configuration file above instructs <code>mod_hitfind</code> to use 48 processes. It disables all image output, which reduces the amount of disk space required to perform the analysis to about 3.4 GiB. <b><i>/path/to</i></b> again refers to the directory containing the unpacked ''cctbx.xfel'' sources, and <code>dark_path</code> as well as <code>dark_stddev</code> may have to be changed to reflect the location of the previously generated dark images. Integration results will be written to the directory <i>integration-first-lattice</i>. | ||
Due to particularities of the thermolysin measurement, processing | Due to particularities of the thermolysin measurement, processing needs to proceed in two batches. To analyze the first batch, runs 16 through 27, save the above configuration file to disk, <i>e.g.</i> <code><i>L498-indexigrate.cfg</i></code>, apply modifications as necessary, and execute | ||
$ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0016-*.xtc | $ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0016-*.xtc | ||
$ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0017-*.xtc | $ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0017-*.xtc | ||
$ … | $ … | ||
$ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0027-*.xtc | $ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0027-*.xtc | ||
where <b><i>/path/to/xtc/files</i></b> is the path to the directory containing the raw XTC files. The second batch, runs 71 through 73, was recorded using a different distance between the interaction region and the detector. | where <b><i>/path/to/xtc/files</i></b> is the path to the directory containing the raw XTC files. The second batch, runs 71 through 73, was recorded using a different distance between the interaction region and the detector. Whilst the changes to the detector position are automatically handled by ''cctbx.xfel'', the resulting difference in shadowing is not. Different areas of the detector should be ignored at the different distances, and this is accounted for by the value of the <code>xtal_target</code> option in the configuration file. To analyze this set of runs, edit <code><i>L498-indexigrate.cfg</i></code>, change <code>thermolysin27</code> to <code>thermolysin73</code>, and | ||
$ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0071-*.xtc | $ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0071-*.xtc | ||
$ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0072-*.xtc | $ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0072-*.xtc | ||
$ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0073-*.xtc | $ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0073-*.xtc | ||
On successful completion, the number of files in the <i>integration-first-lattice</i> directory corresponds to the number of successfully integrated images. Owing to variations in hardware and compiler internals, it may deviate slightly from 11,583, the number reported in the ''cctbx.xfel'' paper. | On successful completion, the number of files in the <i>integration-first-lattice</i> directory corresponds to the number of successfully integrated images. Owing to variations in hardware and compiler internals, it may deviate slightly from 11,583, the number reported in the ''cctbx.xfel'' paper<ref name="Hattne:2014"/>. Further details are available on the [[Indexing and integration|indexing and integration]] page of the [[Tutorials|tutorials]]. | ||
=== | === Indexing the secondary lattice === | ||
Indexing the secondary lattice is very similar to indexing the primary lattice, but requires a change to the source code. Edit <code><b><i>/path/to</i></b>/phenix-src-20130328/labelit_regression/xfel/xfel_targets.py</code>, and uncomment (<i>i.e.</i> remove the leading <code>#</code> character) <code>"outlier_detection_switch=True"</code> on line 25. Then edit the configuration file, <code><i>L498-indexigrate.cfg</i></code> above, and change <i>integration-first-lattice</i> to <i>integration-second-lattice</i> in order not to overwrite the results of the previous analysis of the primary lattice. Before | Indexing the secondary lattice is very similar to indexing the primary lattice, but requires a change to the source code. Edit <code><b><i>/path/to</i></b>/phenix-src-20130328/labelit_regression/xfel/xfel_targets.py</code>, and uncomment (<i>i.e.</i> remove the leading <code>#</code> character) <code>"outlier_detection_switch=True"</code> on line 25. Then edit the configuration file, <code><i>L498-indexigrate.cfg</i></code> above, and change <i>integration-first-lattice</i> to <i>integration-second-lattice</i> in order not to overwrite the results of the previous analysis of the primary lattice. Before reanalyzing the first batch, ensure that <code>xtal_target</code> is set to <code>thermolysin27</code>. | ||
$ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0016-*.xtc | $ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0016-*.xtc | ||
$ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0017-*.xtc | $ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0017-*.xtc | ||
Line 112: | Line 108: | ||
$ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0072-*.xtc | $ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0072-*.xtc | ||
$ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0073-*.xtc | $ cxi.pyana -c <i>L498-indexigrate.cfg</i> <b><i>/path/to/xtc/files</i></b>/e157-r0073-*.xtc | ||
The number of integrated secondary lattices should be close to 2,021, the number reported in the ''cctbx.xfel'' paper. | The number of integrated secondary lattices should be close to 2,021, the number reported in the ''cctbx.xfel'' paper<ref name="Hattne:2014"/>. | ||
== | == Merging all integrated images == | ||
The phil-file below defines values suitable for merging the primary and secondary lattices previously integrated. | The phil-file below defines values suitable for merging the primary and secondary lattices previously integrated. | ||
Line 146: | Line 142: | ||
log_cutoff = 0.0 | log_cutoff = 0.0 | ||
} | } | ||
<code><i>integration-first-lattice</i></code> and <code><i>integration-second-lattice</i></code> may need to be adjusted to point to the directories where | <code><i>integration-first-lattice</i></code> and <code><i>integration-second-lattice</i></code> may need to be adjusted to point to the directories where the [[#Indexing and integration | indexing and integration step]] left its results. <code><i>db_name</i></code><code>, <i>db_user</i></code>, and <code><i>db_passwd</i></code> must be substituted with the database name and access credentials to a [http://www.mysql.com MySQL] database. Databases on hosts other than the one used to merge the thermolysin data can be accessed by additionally specifying the <code>mysql.host</code> and <code>mysql.port</code> options. The model and structure factors for the scaling reference, <code>model</code> and <code>scaling.mtz_file</code> above, are both available for download from the [http://www.rcsb.org RCSB Protein Data Bank] ([http://www.rcsb.org/pdb/explore/explore.do?structureId=2tli PDB ID 2tli]). If the [http://www.phenix-online.org ''PHENIX''] suite is installed, these are conveniently obtained at the command line using | ||
$ phenix.fetch_pdb --mtz 2tli | $ phenix.fetch_pdb --mtz 2tli | ||
To merge the thermolysin data, save the suitably modified configuration to <i>e.g.</i> <code>L498-merge.phil</code>, and run | To merge the thermolysin data, save the suitably modified configuration file to <i>e.g.</i> <code>L498-merge.phil</code>, and run | ||
$ cxi.merge L498-merge.phil | $ cxi.merge L498-merge.phil | ||
$ cxi.xmerge L498-merge.phil | $ cxi.xmerge L498-merge.phil | ||
Merging statistics are printed on standard output. The merged MTZ-file is written to a file whose name is determined by the value of <code>output.prefix</code> in the configuration file (<code>L498_thermolysin.mtz</code> | Merging statistics are printed on standard output. The merged MTZ-file is written to a file whose name is determined by the value of <code>output.prefix</code> in the configuration file (with the values shown above, the output file would be <code>L498_thermolysin.mtz</code>). Note that the version of merging programs from 28 March, 2013 do <em>not</em> not report the <i>R</i><sub>split</sub> statistic. | ||
Revision as of 18:09, 14 March 2014
This page contains instructions for reproducing the results reported in the 2014 cctbx.xfel paper<ref name="Hattne:2014">Hattne, J et al. Accurate macromolecular structures using minimal measurements from X-ray free-electron lasers. Nat. Methods In press (2014).</ref>. The data are available from the Coherent X-ray Imaging Data Bank (CXIDB ID 23; raw XTC files, 4.0 TiB), and must be downloaded to a local disk prior to starting the analysis. Furthermore, the PSDM Software Distribution must be set up, along with a test release and an empty analysis package. A MySQL database is required to merge the integrated diffraction intensities. In the interest of keeping this guide as general as possible, no particular queuing system is assumed to be available. Processing time depends strongly on the computational resources available; using 48 processors on a 64-core 1.4 GHz Opteron-based computer, the analysis takes around 24 hours. The instructions assume some familiarity with cctbx.xfel and its installation and configuration procedure.
Installing a cctbx.xfel snapshot from March 28, 2013
The thermolysin data for the cctbx.xfel paper was processed around March 28, 2013. Unfortunately, regular nightly releases are not available for that time, but a custom source-code bundle has been prepared. This bundle differs from the regular cctbx bundles in that LABELIT is included, and that the directory layout is identical to that of a developer installation. To download and unpack the bundle in the current directory:
$ wget http://adder.lbl.gov/cctbx.xfel/downloads/cctbx.xfel-20130328.tar.gz $ tar -xpvzf cctbx.xfel-20130328.tar.gz
Next, create a build directory (called phenix-build-20130328
below, but that is an arbitrary choice), in which the sources are configured and compiled. The Python interpreter required to complete this step must be the one supplied by the PSDM Software Distribution. Once the PSDM software has been set up, this interpreter can be located using
$ find $SIT_ROOT/sw/external/python -perm /0111 -type f -wholename "*/$SIT_ARCH/*/python"
To prepare the build directory using this interpreter
$ mkdir phenix-build-20130328 $ cd phenix-build-20130328 $ python ../phenix-src-20130328/cctbx_project/libtbx/configure.py cxi_xdr_xes xfel $ . setpaths.sh
where python
is the path to the Python interpreter located using the previous find-command. Note that csh-users should run source setpaths.csh
instead of . setpaths.sh
. Then compile the sources:
$ make $ make
Note that the make
command may need to be run twice in order to complete the build. Once make
does not produce any output from the compiler, the build is complete.
To make the cctbx.xfel analysis modules available to PSDM's pyana,
$ cd /path/to/test/release $ sit_setup $ cd my_ana_pkg $ ln -fns /path/to/phenix-src-20130328/cctbx_project/xfel/cxi/cspad_ana src $ cd .. $ scons
where /path/to/test/release
and my_ana_pkg
are the path to the test release and the name of the analysis package chosen while setting up the PSDM software distribution, respectively. /path/to
denotes the path to the directory containing the unpacked cctbx.xfel sources. The last step compiles the cctbx.xfel analysis modules.
Creating a dark image
To meaningfully process the thermolysin diffraction data, an average of all the images in a dark run—a run without any X-rays impinging on the detector—must be subtracted from the individual diffraction images. The configuration file below can be used to produce such an average, as well as an image of the standard deviation of all the pixels over the course of the run.
[pyana] modules = my_ana_pkg.mod_average num-cpu = 4 [my_ana_pkg.mod_average] address = CxiDs1-0|Cspad-0 calib_dir = /path/to/phenix-src-20130328/cctbx_project/xfel/metrology/CSPad/run4/CxiDs1.0_Cspad.0 avg_basename = Ds1-avg avg_dirname = r0031 stddev_basename = Ds1-stddev stddev_dirname = r0031
/path/to refers to the directory containing the unpacked cctbx.xfel sources. All other options with values set in italics can be modified without adversely affecting averaging. The above file will use four simultaneous processes, and write the average and standard deviation images to files whose names start with Ds1-avg
and Ds1-stddev
, respectively, both in a directory called r0031
.
The data deposited at the CXIDB contains a dark run, r0031
. To average the images in that run, save the above configuration file to disk, e.g. L498-dark.cfg
, apply modifications as necessary, and execute
$ cxi.pyana -c L498-dark.cfg /path/to/xtc/files/e157-r0031-*.xtc
where /path/to/xtc/files is the path to the directory containing the raw XTC files downloaded from CXIDB. The files written to the r0031
directory will have a current date stamp appended, which can safely be removed to simplify subsequent configuration files.
$ cd r0031 $ mv Ds1-avg20140308104336073.pickle Ds1-avg.pickle $ mv Ds1-stddev20140308104336371.pickle Ds1-stddev.pickle $ cd ..
Note that the date stamp on the generated files above depends on the time of their creation. Further details are available on the Create a dark image page of the tutorials.
Indexing the thermolysin data
A configuration file for processing the primary lattices in the thermolysin data is shown below.
[pyana] modules = my_ana_pkg.mod_hitfind:threshold \ my_ana_pkg.mod_hitfind:index num-cpu = 48 [my_ana_pkg.mod_hitfind] address = CxiDs1-0|Cspad-0 calib_dir = /path/to/phenix-src-20130328/cxi_xdr_xes/cftbx/metrology/CSPad/run4/CxiDs1.0:Cspad.0 dark_path = r0031/Ds1-avg.pickle dark_stddev = r0031/Ds1-stddev.pickle db_logging = False detz_offset = 575 [my_ana_pkg.mod_hitfind:threshold] dispatch = nop distl_flags = permissive distl_min_peaks = 16 threshold = 450 xtal_target = hitfind [my_ana_pkg.mod_hitfind:index] dispatch = index integration_dirname = integration-first-lattice integration_basename = int- xtal_target = thermolysin27
The configuration file above instructs mod_hitfind
to use 48 processes. It disables all image output, which reduces the amount of disk space required to perform the analysis to about 3.4 GiB. /path/to again refers to the directory containing the unpacked cctbx.xfel sources, and dark_path
as well as dark_stddev
may have to be changed to reflect the location of the previously generated dark images. Integration results will be written to the directory integration-first-lattice.
Due to particularities of the thermolysin measurement, processing needs to proceed in two batches. To analyze the first batch, runs 16 through 27, save the above configuration file to disk, e.g. L498-indexigrate.cfg
, apply modifications as necessary, and execute
$ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0016-*.xtc $ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0017-*.xtc $ … $ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0027-*.xtc
where /path/to/xtc/files is the path to the directory containing the raw XTC files. The second batch, runs 71 through 73, was recorded using a different distance between the interaction region and the detector. Whilst the changes to the detector position are automatically handled by cctbx.xfel, the resulting difference in shadowing is not. Different areas of the detector should be ignored at the different distances, and this is accounted for by the value of the xtal_target
option in the configuration file. To analyze this set of runs, edit L498-indexigrate.cfg
, change thermolysin27
to thermolysin73
, and
$ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0071-*.xtc $ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0072-*.xtc $ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0073-*.xtc
On successful completion, the number of files in the integration-first-lattice directory corresponds to the number of successfully integrated images. Owing to variations in hardware and compiler internals, it may deviate slightly from 11,583, the number reported in the cctbx.xfel paper<ref name="Hattne:2014"/>. Further details are available on the indexing and integration page of the tutorials.
Indexing the secondary lattice
Indexing the secondary lattice is very similar to indexing the primary lattice, but requires a change to the source code. Edit /path/to/phenix-src-20130328/labelit_regression/xfel/xfel_targets.py
, and uncomment (i.e. remove the leading #
character) "outlier_detection_switch=True"
on line 25. Then edit the configuration file, L498-indexigrate.cfg
above, and change integration-first-lattice to integration-second-lattice in order not to overwrite the results of the previous analysis of the primary lattice. Before reanalyzing the first batch, ensure that xtal_target
is set to thermolysin27
.
$ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0016-*.xtc $ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0017-*.xtc $ … $ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0027-*.xtc
Then set xtal_target
is set to thermolysin73
in the configuration file, and process the second batch.
$ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0071-*.xtc $ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0072-*.xtc $ cxi.pyana -c L498-indexigrate.cfg /path/to/xtc/files/e157-r0073-*.xtc
The number of integrated secondary lattices should be close to 2,021, the number reported in the cctbx.xfel paper<ref name="Hattne:2014"/>.
Merging all integrated images
The phil-file below defines values suitable for merging the primary and secondary lattices previously integrated.
data = integration-first-lattice data = integration-second-lattice d_min = 2.10 merge_anomalous = True min_corr = -1 model = 2tli.pdb nproc = 16 plot_single_index_histograms = False raw_data.sdfac_auto = True rescale_with_average_cell = True significance_filter.apply = True set_average_unit_cell = True mysql { database = db_name passwd = db_passwd runtag = L498_thermolysin user = db_user } output { n_bins = 10 prefix = L498_thermolysin } scaling { algorithm = mark0 mtz_file = 2tli.mtz show_plots = False log_cutoff = 0.0 }
integration-first-lattice
and integration-second-lattice
may need to be adjusted to point to the directories where the indexing and integration step left its results. db_name
, db_user
, and db_passwd
must be substituted with the database name and access credentials to a MySQL database. Databases on hosts other than the one used to merge the thermolysin data can be accessed by additionally specifying the mysql.host
and mysql.port
options. The model and structure factors for the scaling reference, model
and scaling.mtz_file
above, are both available for download from the RCSB Protein Data Bank (PDB ID 2tli). If the PHENIX suite is installed, these are conveniently obtained at the command line using
$ phenix.fetch_pdb --mtz 2tli
To merge the thermolysin data, save the suitably modified configuration file to e.g. L498-merge.phil
, and run
$ cxi.merge L498-merge.phil $ cxi.xmerge L498-merge.phil
Merging statistics are printed on standard output. The merged MTZ-file is written to a file whose name is determined by the value of output.prefix
in the configuration file (with the values shown above, the output file would be L498_thermolysin.mtz
). Note that the version of merging programs from 28 March, 2013 do not not report the Rsplit statistic.
References
<references/>