cctbx_xfel - User contributions [en]

Data Processing at SACLA

2017-11-18T00:13:50Z

Nicksauter: /* Obtaining metadata like detector position */

== Obtaining metadata like detector position ==
1) AgBeh (silver behenate). Determine detector distance and beam center
--> update SACLA-provided *.geom file (CrystFEL format)
--> run sacla geom to json on *.geom to get equivalent for DIALS processing

2) h5_mpi_submit --> launches dials.stills_process with process.phil, and queueing options.

Specify what runs (integers)
/work/jkern/2017B8085/xrd/r234567-0/*.h5

2.5) Data visualization.

3) metrology refinement
dials.combine_experiments reference_from_experiment.detector=0.

Takes 1000 images, puts into 1 file. Output: combined_experiments.json + combined_reflections.pickle

1 Experiment = crystal + detector + beam

Must cherry pick data if there is scare data out to the corners. (but not covered here. but: largest pickle files are highly diffracting).

dials.refine combined* hierarchy_level=[0|1] # Use 0 first (refine detector as a block) then 1(refine each panel)
To keep detector flat: refinement.parameterisation.detector.fix_list=Tau2,Tau3

Level 0: refine dist, shift1, shift2. Fix: tau

Level 1: refine shift1, shift2, tau1 Fix: dist, tau2, tau3

Evaluation--how do you know if it made a difference?

dev.cctbx.xfel.detector_residuals json pickle # also specify hierarchy_level=1 residuals.plot_max=0.3

program -c -e 10 -a 2# gets all config parameters for a program at expert level 10, giving all help strings.

4) redo integration with reference geometry:

reference_geometry=refined_experiments.json

5) merge

take merging script from LQ79. Take it verbatim. Use cxi.merge
#!/bin/bash
#PBS -q [smp|serial]

smp: lots of memory up to 44 pros
serial: up to 14 pros, 1 node
b13-occupancy: reserved for you

== DIALS workflow ==
dials.import file.h5 (the h5 will have 1000's of images in it)
--> datablock.json. Has experimental models as abstracted from image header

dials.find_spots datablock.json
--> strong.pickle

dials.index strong.pickle datablock.json
--> indexed.pickle experiments.json

dials.refine
--> refined_experiments.json refined_reflections.pickle

dials.integrate

== Aggregate processing at XFELS ==
Need to submit a single job for each *.h5 file (manually, or write a script)
Instead of running the individual steps:
dials.stills_process *.h5 process.phil

Phil file must have good parameters for data processing. Take one from previous users.

Data Processing at SACLA

2017-11-17T23:15:12Z

Nicksauter:

== Obtaining metadata like detector position ==
1) AgBeh (silver behenate). Determine detector distance and beam center
--> update SACLA-provided *.geom file (CrystFEL format)
--> run sacla geom to json on *.geom to get equivalent for DIALS processing

2) h5_mpi_submit --> launches dials.stills_process with process.phil, and queueing options.

Specify what runs (integers)
/work/jkern/2017B8085/xrd/r234567-0/*.h5

3) metrology refinement
dials.combine_experiments reference_from_experiment.detector=0.

Takes 1000 images, puts into 1 file. Output: combined_experiments.json + combined_reflections.pickle

1 Experiment = crystal + detector + beam

Must cherry pick data if there is scare data out to the corners. (but not covered here. but: largest pickle files are highly diffracting).

dials.refine combined* hierarchy_level=[0|1] # Use 0 first (refine detector as a block) then 1(refine each panel)
To keep detector flat: refinement.parameterisation.detector.fix_list=Tau2,Tau3

Level 0: refine dist, shift1, shift2. Fix: tau

Level 1: refine shift1, shift2, tau1 Fix: dist, tau2, tau3

Evaluation--how do you know if it made a difference?

dev.cctbx.xfel.detector_residuals json pickle # also specify hierarchy_level=1 residuals.plot_max=0.3

program -c -e 10 -a 2# gets all config parameters for a program at expert level 10, giving all help strings.

4) redo integration with reference geometry:

reference_geometry=refined_experiments.json

5) merge

take merging script from LQ79. Take it verbatim. Use cxi.merge
#!/bin/bash
#PBS -q [smp|serial]

smp: lots of memory up to 44 pros
serial: up to 14 pros, 1 node
b13-occupancy: reserved for you

== DIALS workflow ==
dials.import file.h5 (the h5 will have 1000's of images in it)
--> datablock.json. Has experimental models as abstracted from image header

dials.find_spots datablock.json
--> strong.pickle

dials.index strong.pickle datablock.json
--> indexed.pickle experiments.json

dials.refine
--> refined_experiments.json refined_reflections.pickle

dials.integrate

== Aggregate processing at XFELS ==
Need to submit a single job for each *.h5 file (manually, or write a script)
Instead of running the individual steps:
dials.stills_process *.h5 process.phil

Phil file must have good parameters for data processing. Take one from previous users.

Data Processing at SACLA

2017-11-17T22:25:24Z

Nicksauter: /* Obtaining metadata like detector position */

Data Processing at SACLA

2017-11-17T22:21:53Z

Nicksauter:

== Obtaining metadata like detector position ==
1) AgBeh (silver behenate). Determine detector distance and beam center
--> update SACLA-provided *.geom file (CrystFEL format)
--> run sacla geom to json on *.geom to get equivalent for DIALS processing

== DIALS workflow ==
dials.import file.h5 (the h5 will have 1000's of images in it)
--> datablock.json. Has experimental models as abstracted from image header

dials.find_spots datablock.json
--> strong.pickle

dials.index strong.pickle datablock.json
--> indexed.pickle experiments.json

dials.refine
--> refined_experiments.json refined_reflections.pickle

dials.integrate

== Aggregate processing at XFELS ==
Need to submit a single job for each *.h5 file (manually, or write a script)
Instead of running the individual steps:
dials.stills_process *.h5 process.phil

Phil file must have good parameters for data processing. Take one from previous users.

Data Processing at SACLA

2017-11-17T22:18:52Z

Nicksauter: /* Headline text */

== Obtaining metadata like detector position ==
1) AgBeh (silver behenate). Determine detector distance and beam center
--> update SACLA-provided *.geom file (CrystFEL format)
--> run sacla geom to json on *.geom to get equivalent for DIALS processing

== DIALS workflow ==
dials.import file.h5 (the h5 will have 1000's of images in it)
--> datablock.json. Has experimental models as abstracted from image header

dials.find_spots datablock.json
--> strong.pickle

dials.index strong.pickle datablock.json
--> indexed.pickle experiments.json

== Aggregate processing at XFELS ==
dials.stills_process (substitutes for the individual steps)

Data Processing at SACLA

2017-11-17T22:17:58Z

Nicksauter:

== Headline text ==
1) AgBeh (silver behenate). Determine detector distance and beam center
--> update SACLA-provided *.geom file (CrystFEL format)
--> run sacla geom to json on *.geom to get equivalent for DIALS processing

== DIALS workflow ==
dials.import file.h5 (the h5 will have 1000's of images in it)
--> datablock.json. Has experimental models as abstracted from image header

dials.find_spots datablock.json
--> strong.pickle

dials.index strong.pickle datablock.json
--> indexed.pickle experiments.json

== Aggregate processing at XFELS ==
dials.stills_process (substitutes for the individual steps)

Data Processing at SACLA

2017-11-17T22:12:23Z

Nicksauter: Created page with "1) AgBeh (silver behenate). Determine detector distance and beam center"

1) AgBeh (silver behenate). Determine detector distance and beam center

Main Page

2017-11-17T22:10:45Z

Nicksauter: /* cctbx.xfel resources */

= Open-source tools for free-electron laser data processing =

''cctbx.xfel'' is a suite of software tools designed to process diffraction data from serial femtosecond crystallography (SFX) measurements at an X-ray free-electron laser (XFEL). Built on the Computational Crystallographic Toolbox ([http://cctbx.sourceforge.net ''cctbx'']), the same toolbox on which [http://www.phenix-online.org ''PHENIX''], [http://adder.lbl.gov/labelit ''LABELIT''], and post-refinement and merging program, [http://viper.lbl.gov/cctbx.xfel/index.php/Cctbx.prime ''PRIME''] are built, it enables the user to solve difficult problems relating to processing XFEL data. The programs and modules provided by ''cctbx.xfel'' can reduce a large set of still diffraction images recorded at Stanford’s Linac Coherent Light Source ([http://lcls.slac.stanford.edu LCLS]) to a single MTZ file containing merged reflection intensities suitable for structure solution.

== ''cctbx.xfel'' resources ==

The tutorials on this wiki provide detailed instructions for indexing and integrating still diffraction images extracted from the raw data streams recorded at the LCLS, including pre-processing steps such as dark pedestal generation and refinement of the detector geometry of the Cornell–SLAC pixel array detectors (CSPAD) in use at the CXI and XPP end stations. The tutorials also cover tools to efficiently leverage the LCLS computing cluster to process the thousands to millions of diffraction images that can be recorded in a short time.

* [[Overview]] to the system architecture at LCLS and real-time progress monitoring of data processing
* Installation: there are two ways to get ''cctbx.xfel'':
** ''cctbx.xfel'' is installed for general use at SLAC. To start using an existing installation, first [[Set up PSDM software | set up the PSDM software distribution]] and then [[Setup | set up ''cctbx.xfel'']].
** Configure ''cctbx.xfel'' from an [[Configure cctbx.xfel from existing Phenix install | existing Phenix installation]].
* [[Tutorials]] on pre-processing, data reduction, and merging.
* ''[[cctbx.prime]]'': tutorials on post-refinement using PRIME.
* "[[IOTA]]": tutorial on spotfinding optimization using IOTA.

Other related information:

* [[Serial XFEL Crystallography References]]
* [[Data Processing at SACLA]]
* [[Indexing individual stills]]: how to index stills not necessarily from an XFEL source.
* [[Processing Blog]] describing how we solved typical problems.
* [[Processing L498 thermolysin]] details instructions to reproduce the results published in the forthcoming ''cctbx.xfel'' paper.
* [[2017 Tutorials]]. Berkeley Lab Serial Crystallography Workshop, 16-17 Feb 2017.
* [[2014 Tutorials | 2014 Workshop Tutorial: ''cctbx.xfel'' ]]. Note, due to recent changes at LCLS, many of these tutorials are out of date for data collected late 2014 and onwards. For more up to date information, see above [[tutorials]]. For data collected before late 2014, these workshop tutorials are useful.
* [[2014_workshop | Powerpoint presentations]]
* [[Experiment-day guidelines]]: a short, very hands-on set of notes for online monitoring and data-processing using cctbx.xfel. This page assumes that the setup and tutorial have been completed.
* [[File formats]]: for developers, information on file formats ''cctbx.xfel'' uses
* [[Cppxfel]]: XFEL data processing with the CPPXFEL package from Diamond/Wellcome Trust.
* For developers only: [[Installation]] instructions for cctbx developmental build.

This project is under active development. For any assistance, please contact the authors.

Main Page

2017-11-17T22:10:26Z

Nicksauter: /* cctbx.xfel resources */

= Open-source tools for free-electron laser data processing =

''cctbx.xfel'' is a suite of software tools designed to process diffraction data from serial femtosecond crystallography (SFX) measurements at an X-ray free-electron laser (XFEL). Built on the Computational Crystallographic Toolbox ([http://cctbx.sourceforge.net ''cctbx'']), the same toolbox on which [http://www.phenix-online.org ''PHENIX''], [http://adder.lbl.gov/labelit ''LABELIT''], and post-refinement and merging program, [http://viper.lbl.gov/cctbx.xfel/index.php/Cctbx.prime ''PRIME''] are built, it enables the user to solve difficult problems relating to processing XFEL data. The programs and modules provided by ''cctbx.xfel'' can reduce a large set of still diffraction images recorded at Stanford’s Linac Coherent Light Source ([http://lcls.slac.stanford.edu LCLS]) to a single MTZ file containing merged reflection intensities suitable for structure solution.

== ''cctbx.xfel'' resources ==

The tutorials on this wiki provide detailed instructions for indexing and integrating still diffraction images extracted from the raw data streams recorded at the LCLS, including pre-processing steps such as dark pedestal generation and refinement of the detector geometry of the Cornell–SLAC pixel array detectors (CSPAD) in use at the CXI and XPP end stations. The tutorials also cover tools to efficiently leverage the LCLS computing cluster to process the thousands to millions of diffraction images that can be recorded in a short time.

* [[Overview]] to the system architecture at LCLS and real-time progress monitoring of data processing
* Installation: there are two ways to get ''cctbx.xfel'':
** ''cctbx.xfel'' is installed for general use at SLAC. To start using an existing installation, first [[Set up PSDM software | set up the PSDM software distribution]] and then [[Setup | set up ''cctbx.xfel'']].
** Configure ''cctbx.xfel'' from an [[Configure cctbx.xfel from existing Phenix install | existing Phenix installation]].
* [[Tutorials]] on pre-processing, data reduction, and merging.
* ''[[cctbx.prime]]'': tutorials on post-refinement using PRIME.
* "[[IOTA]]": tutorial on spotfinding optimization using IOTA.

Other related information:

* [[Serial XFEL Crystallography References]]
* [[Data Processing as SACLA]]
* [[Indexing individual stills]]: how to index stills not necessarily from an XFEL source.
* [[Processing Blog]] describing how we solved typical problems.
* [[Processing L498 thermolysin]] details instructions to reproduce the results published in the forthcoming ''cctbx.xfel'' paper.
* [[2017 Tutorials]]. Berkeley Lab Serial Crystallography Workshop, 16-17 Feb 2017.
* [[2014 Tutorials | 2014 Workshop Tutorial: ''cctbx.xfel'' ]]. Note, due to recent changes at LCLS, many of these tutorials are out of date for data collected late 2014 and onwards. For more up to date information, see above [[tutorials]]. For data collected before late 2014, these workshop tutorials are useful.
* [[2014_workshop | Powerpoint presentations]]
* [[Experiment-day guidelines]]: a short, very hands-on set of notes for online monitoring and data-processing using cctbx.xfel. This page assumes that the setup and tutorial have been completed.
* [[File formats]]: for developers, information on file formats ''cctbx.xfel'' uses
* [[Cppxfel]]: XFEL data processing with the CPPXFEL package from Diamond/Wellcome Trust.
* For developers only: [[Installation]] instructions for cctbx developmental build.

This project is under active development. For any assistance, please contact the authors.

Ha14 installation

2017-03-31T18:43:31Z

Nicksauter: /* Finish up the build */

== Prerequisites ==

While these tutorials assume you wish to process XTC streams at SLAC, some users have stills collected from other sources and do not need the full PSDM suite. If this is the case, see the installation directions below for installing on a non PSDM system. Otherwise, it is assumed that the [[Set up PSDM software | PSDM software distribution has been set up]]. Note again that several sites already have ''cctbx.xfel'' installed, and so regular users not involved in the development of the software will not need the instructions here.

Once ''cctbx.xfel'' has been installed it must be [[Setup | set up]] before it can be used. Developers may additionally want to [[Set up ssh-agent | set up an ssh-agent]].

Finally, you must have a user account on cci.lbl.gov in order to proceed, or else you will not be able to download the sources. A sourceforge account is only needed if you wish to commit changes back to the repository. If you don't have or don't want to use a sourceforge account, you can leave it off in the below commands where it is specified.

== Standard installation using PSDM ==
===Quick start===
This step has to be performed on a host with Internet access. Not all hosts at SLAC have that, but the members of e.g. pslogin pool do. Make and change to a working directory to contain the new source code and build (this directory should be accessible from any computing nodes ''cctbx.xfel'' will be run. Then download these bootstrap modules:

wget https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py

Then:

python bootstrap.py hot update --builder=xfel --cciuser=<cciusername> --sfuser=<githubusername>

This command instructs bootstrap.py to download static packages required for ''cctbx.xfel'' (hot), and then to checkout the rest of the sources from cci.lbl.gov and github (update), using the user accounts specified by the cciuser and sfuser parameters, respectively.

===Alternate instructions for non-LBNL developers===
Our non-LBNL colleagues will now realize that the "python bootstrap" step does not work without cci username credentials, needed to obtain source code for the program LABELIT. Developers are instructed to use the following alternate procedure:

python bootstrap.py hot --builder=xfel --sfuser=<githubusername> # download the static packages
python bootstrap.py update --builder=dials --sfuser=<githubusername> # download the github code for dials & cctbx
mkdir modules/cxi_xdr_xes # create a stub directory for experiment-specific python code

Then download phenix from http://phenix-online.org and untar the package. Locate the modules directory in the phenix directory tree, and copy-paste the labelit subdirectory into the modules directory of your current working area.

===Finish up the build===
After downloading the sources, you need to be sure you have the appropriate compilers before executing the next command. At SLAC, that means you need to ssh to one of the psana nodes, as the pslogin nodes do not have the requisite compilers. When ready, configure and compile thusly:

python bootstrap.py build --builder=xfel --with-python=`which python` --nproc=<# cores available for compile>l

On SLAC's interactive nodes, this takes just over 6 minutes. To avoid problems with run-time dynamic linking of Python extensions, the Python interpreter required for the above command must be the one provided by the PSDM software distribution. That interpreter can be located using find, e.g.
$ find $SIT_ROOT/sw/external/python -perm /0111 -type f -wholename "*/$SIT_ARCH/*/python" 2> /dev/null
At SLAC this interpreter is located somewhere under <code>/reg/g/psdm/sw/external/python</code>. Here we use `which python` to get this path automatically.

Initialise the running shell using the newly created configuration files. bash-users should
$ . build/setpaths.sh
while csh-users will instead need to run
% source build/setpaths.csh

To finalize the installation, see [[Setup]].

Note, you may also follow this procedure if you are on a machine where you have your own python that you want to use, instead of the one provided by PSDM.

== Installation on a non-PSDM system ==

If you find yourself on a machine without PSDM and won't be dealing with XTC directly, for example if you have your own data collected as stills from a non-XFEL source, you can use the below commands in a new directory (again assuming you have a cci user account):

# On Linux:
wget https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py
# On MacOSX:
curl https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py > bootstrap.py

python bootstrap.py hot update base build --builder=xfel --cciuser=<cciusername> --sfuser=<sourceforgeusername>

Here, in addition to hot, update and build, there's a command called 'base', which downloads a python and needed packages and compiles it. After this process completes, source the installation (for details see above), and then add scipy to it:

. build/setpaths.sh
libtbx.python -m easy_install scipy

Please contact the authors if with any issues. Some gotchas:

1) On a Mac, you will need to make sure you have the latest version of xcode installed. Consider the command xcode-select --install. [NOTE: Xcode v. 8.2.1. for Mac OS 10.12 (Sierra) does not include a gFortran compiler. You will need to install gFortran separately if you need scipy.]

2) On linux you will likely need to install some dependencies, including gFortran. [Note: gFortran is needed only for the scipy installation, not for the core build based on cctbx. Therefore, only install gFortran if you need scipy.]

== External links ==

* [http://cctbx.sourceforge.net/current_cvs/installation.html#manually-building-from-sources-under-unix Manually building from sources under Unix]

Ha14 installation

2017-03-31T18:32:51Z

Nicksauter: /* Alternate instructions for non-LBNL developers */

== Prerequisites ==

While these tutorials assume you wish to process XTC streams at SLAC, some users have stills collected from other sources and do not need the full PSDM suite. If this is the case, see the installation directions below for installing on a non PSDM system. Otherwise, it is assumed that the [[Set up PSDM software | PSDM software distribution has been set up]]. Note again that several sites already have ''cctbx.xfel'' installed, and so regular users not involved in the development of the software will not need the instructions here.

Once ''cctbx.xfel'' has been installed it must be [[Setup | set up]] before it can be used. Developers may additionally want to [[Set up ssh-agent | set up an ssh-agent]].

Finally, you must have a user account on cci.lbl.gov in order to proceed, or else you will not be able to download the sources. A sourceforge account is only needed if you wish to commit changes back to the repository. If you don't have or don't want to use a sourceforge account, you can leave it off in the below commands where it is specified.

== Standard installation using PSDM ==
===Quick start===
This step has to be performed on a host with Internet access. Not all hosts at SLAC have that, but the members of e.g. pslogin pool do. Make and change to a working directory to contain the new source code and build (this directory should be accessible from any computing nodes ''cctbx.xfel'' will be run. Then download these bootstrap modules:

wget https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py

Then:

python bootstrap.py hot update --builder=xfel --cciuser=<cciusername> --sfuser=<githubusername>

This command instructs bootstrap.py to download static packages required for ''cctbx.xfel'' (hot), and then to checkout the rest of the sources from cci.lbl.gov and github (update), using the user accounts specified by the cciuser and sfuser parameters, respectively.

===Alternate instructions for non-LBNL developers===
Our non-LBNL colleagues will now realize that the "python bootstrap" step does not work without cci username credentials, needed to obtain source code for the program LABELIT. Developers are instructed to use the following alternate procedure:

python bootstrap.py hot --builder=xfel --sfuser=<githubusername> # download the static packages
python bootstrap.py update --builder=dials --sfuser=<githubusername> # download the github code for dials & cctbx
mkdir modules/cxi_xdr_xes # create a stub directory for experiment-specific python code

Then download phenix from http://phenix-online.org and untar the package. Locate the modules directory in the phenix directory tree, and copy-paste the labelit subdirectory into the modules directory of your current working area.

===Finish up the build===
After downloading the sources, you need to be sure you have the appropriate compilers before executing the next command. At SLAC, that means you need to ssh to one of the psana nodes, as the pslogin nodes do not have the requisite compilers. When ready, configure and compile thusly:

python bootstrap.py build --builder=xfel --with-python=`which python`

On SLAC's interactive nodes, this takes just over 6 minutes. To avoid problems with run-time dynamic linking of Python extensions, the Python interpreter required for the above command must be the one provided by the PSDM software distribution. That interpreter can be located using find, e.g.
$ find $SIT_ROOT/sw/external/python -perm /0111 -type f -wholename "*/$SIT_ARCH/*/python" 2> /dev/null
At SLAC this interpreter is located somewhere under <code>/reg/g/psdm/sw/external/python</code>. Here we use `which python` to get this path automatically.

Initialise the running shell using the newly created configuration files. bash-users should
$ . build/setpaths.sh
while csh-users will instead need to run
% source build/setpaths.csh

To finalize the installation, see [[Setup]].

Note, you may also follow this procedure if you are on a machine where you have your own python that you want to use, instead of the one provided by PSDM.

== Installation on a non-PSDM system ==

If you find yourself on a machine without PSDM and won't be dealing with XTC directly, for example if you have your own data collected as stills from a non-XFEL source, you can use the below commands in a new directory (again assuming you have a cci user account):

# On Linux:
wget https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py
# On MacOSX:
curl https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py > bootstrap.py

python bootstrap.py hot update base build --builder=xfel --cciuser=<cciusername> --sfuser=<sourceforgeusername>

Here, in addition to hot, update and build, there's a command called 'base', which downloads a python and needed packages and compiles it. After this process completes, source the installation (for details see above), and then add scipy to it:

. build/setpaths.sh
libtbx.python -m easy_install scipy

Please contact the authors if with any issues. Some gotchas:

1) On a Mac, you will need to make sure you have the latest version of xcode installed. Consider the command xcode-select --install. [NOTE: Xcode v. 8.2.1. for Mac OS 10.12 (Sierra) does not include a gFortran compiler. You will need to install gFortran separately if you need scipy.]

2) On linux you will likely need to install some dependencies, including gFortran. [Note: gFortran is needed only for the scipy installation, not for the core build based on cctbx. Therefore, only install gFortran if you need scipy.]

== External links ==

* [http://cctbx.sourceforge.net/current_cvs/installation.html#manually-building-from-sources-under-unix Manually building from sources under Unix]

Ha14 installation

2017-03-31T18:21:23Z

Nicksauter: /* Alternate instructions for non-LBNL developers */

== Prerequisites ==

While these tutorials assume you wish to process XTC streams at SLAC, some users have stills collected from other sources and do not need the full PSDM suite. If this is the case, see the installation directions below for installing on a non PSDM system. Otherwise, it is assumed that the [[Set up PSDM software | PSDM software distribution has been set up]]. Note again that several sites already have ''cctbx.xfel'' installed, and so regular users not involved in the development of the software will not need the instructions here.

Once ''cctbx.xfel'' has been installed it must be [[Setup | set up]] before it can be used. Developers may additionally want to [[Set up ssh-agent | set up an ssh-agent]].

Finally, you must have a user account on cci.lbl.gov in order to proceed, or else you will not be able to download the sources. A sourceforge account is only needed if you wish to commit changes back to the repository. If you don't have or don't want to use a sourceforge account, you can leave it off in the below commands where it is specified.

== Standard installation using PSDM ==
===Quick start===
This step has to be performed on a host with Internet access. Not all hosts at SLAC have that, but the members of e.g. pslogin pool do. Make and change to a working directory to contain the new source code and build (this directory should be accessible from any computing nodes ''cctbx.xfel'' will be run. Then download these bootstrap modules:

wget https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py

Then:

python bootstrap.py hot update --builder=xfel --cciuser=<cciusername> --sfuser=<githubusername>

This command instructs bootstrap.py to download static packages required for ''cctbx.xfel'' (hot), and then to checkout the rest of the sources from cci.lbl.gov and github (update), using the user accounts specified by the cciuser and sfuser parameters, respectively.

===Alternate instructions for non-LBNL developers===
Our non-LBNL colleagues will now realize that the "python bootstrap" step does not work without cci username credentials, needed to obtain source code for the program LABELIT. Developers are instructed to use the following alternate procedure:

python bootstrap.py hot --builder=xfel --sfuser=<githubusername> # download the static packages
python bootstrap.py update --builder=dials --sfuser=<githubusername> # download the github code for dials & cctbx

Then download phenix from http://phenix-online.org and untar the package. Locate the modules directory in the phenix directory tree, and copy-paste the labelit subdirectory into the modules directory of your current working area.

===Finish up the build===
After downloading the sources, you need to be sure you have the appropriate compilers before executing the next command. At SLAC, that means you need to ssh to one of the psana nodes, as the pslogin nodes do not have the requisite compilers. When ready, configure and compile thusly:

python bootstrap.py build --builder=xfel --with-python=`which python`

On SLAC's interactive nodes, this takes just over 6 minutes. To avoid problems with run-time dynamic linking of Python extensions, the Python interpreter required for the above command must be the one provided by the PSDM software distribution. That interpreter can be located using find, e.g.
$ find $SIT_ROOT/sw/external/python -perm /0111 -type f -wholename "*/$SIT_ARCH/*/python" 2> /dev/null
At SLAC this interpreter is located somewhere under <code>/reg/g/psdm/sw/external/python</code>. Here we use `which python` to get this path automatically.

Initialise the running shell using the newly created configuration files. bash-users should
$ . build/setpaths.sh
while csh-users will instead need to run
% source build/setpaths.csh

To finalize the installation, see [[Setup]].

Note, you may also follow this procedure if you are on a machine where you have your own python that you want to use, instead of the one provided by PSDM.

== Installation on a non-PSDM system ==

If you find yourself on a machine without PSDM and won't be dealing with XTC directly, for example if you have your own data collected as stills from a non-XFEL source, you can use the below commands in a new directory (again assuming you have a cci user account):

# On Linux:
wget https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py
# On MacOSX:
curl https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py > bootstrap.py

python bootstrap.py hot update base build --builder=xfel --cciuser=<cciusername> --sfuser=<sourceforgeusername>

Here, in addition to hot, update and build, there's a command called 'base', which downloads a python and needed packages and compiles it. After this process completes, source the installation (for details see above), and then add scipy to it:

. build/setpaths.sh
libtbx.python -m easy_install scipy

Please contact the authors if with any issues. Some gotchas:

1) On a Mac, you will need to make sure you have the latest version of xcode installed. Consider the command xcode-select --install. [NOTE: Xcode v. 8.2.1. for Mac OS 10.12 (Sierra) does not include a gFortran compiler. You will need to install gFortran separately if you need scipy.]

2) On linux you will likely need to install some dependencies, including gFortran. [Note: gFortran is needed only for the scipy installation, not for the core build based on cctbx. Therefore, only install gFortran if you need scipy.]

== External links ==

* [http://cctbx.sourceforge.net/current_cvs/installation.html#manually-building-from-sources-under-unix Manually building from sources under Unix]

Ha14 installation

2017-03-31T18:09:10Z

Nicksauter: /* Standard installation using PSDM */

== Prerequisites ==

While these tutorials assume you wish to process XTC streams at SLAC, some users have stills collected from other sources and do not need the full PSDM suite. If this is the case, see the installation directions below for installing on a non PSDM system. Otherwise, it is assumed that the [[Set up PSDM software | PSDM software distribution has been set up]]. Note again that several sites already have ''cctbx.xfel'' installed, and so regular users not involved in the development of the software will not need the instructions here.

Once ''cctbx.xfel'' has been installed it must be [[Setup | set up]] before it can be used. Developers may additionally want to [[Set up ssh-agent | set up an ssh-agent]].

Finally, you must have a user account on cci.lbl.gov in order to proceed, or else you will not be able to download the sources. A sourceforge account is only needed if you wish to commit changes back to the repository. If you don't have or don't want to use a sourceforge account, you can leave it off in the below commands where it is specified.

== Standard installation using PSDM ==
===Quick start===
This step has to be performed on a host with Internet access. Not all hosts at SLAC have that, but the members of e.g. pslogin pool do. Make and change to a working directory to contain the new source code and build (this directory should be accessible from any computing nodes ''cctbx.xfel'' will be run. Then download these bootstrap modules:

wget https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py

Then:

python bootstrap.py hot update --builder=xfel --cciuser=<cciusername> --sfuser=<githubusername>

This command instructs bootstrap.py to download static packages required for ''cctbx.xfel'' (hot), and then to checkout the rest of the sources from cci.lbl.gov and github (update), using the user accounts specified by the cciuser and sfuser parameters, respectively.

===Alternate instructions for non-LBNL developers===
Our non-LBNL colleagues will now realize that the "python bootstrap" step does not work without cci username credentials, needed to obtain source code for the program LABELIT. Developers are instructed to use the following alternate procedure:

python bootstrap.py hot --builder=xfel --sfuser=<githubusername> # download the static packages
python bootstrap.py update --builder=dials --sfuser=<githubusername> # download the github code for dials & cctbx

Then download phenix from http://phenix-online.org and untar the package. Locate the modules directory in the phenix directory tree, and copy-paste the labelit_sources subdirectory into the modules directory of your current working area.

===Finish up the build===
After downloading the sources, you need to be sure you have the appropriate compilers before executing the next command. At SLAC, that means you need to ssh to one of the psana nodes, as the pslogin nodes do not have the requisite compilers. When ready, configure and compile thusly:

python bootstrap.py build --builder=xfel --with-python=`which python`

On SLAC's interactive nodes, this takes just over 6 minutes. To avoid problems with run-time dynamic linking of Python extensions, the Python interpreter required for the above command must be the one provided by the PSDM software distribution. That interpreter can be located using find, e.g.
$ find $SIT_ROOT/sw/external/python -perm /0111 -type f -wholename "*/$SIT_ARCH/*/python" 2> /dev/null
At SLAC this interpreter is located somewhere under <code>/reg/g/psdm/sw/external/python</code>. Here we use `which python` to get this path automatically.

Initialise the running shell using the newly created configuration files. bash-users should
$ . build/setpaths.sh
while csh-users will instead need to run
% source build/setpaths.csh

To finalize the installation, see [[Setup]].

Note, you may also follow this procedure if you are on a machine where you have your own python that you want to use, instead of the one provided by PSDM.

== Installation on a non-PSDM system ==

If you find yourself on a machine without PSDM and won't be dealing with XTC directly, for example if you have your own data collected as stills from a non-XFEL source, you can use the below commands in a new directory (again assuming you have a cci user account):

# On Linux:
wget https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py
# On MacOSX:
curl https://raw.githubusercontent.com/cctbx/cctbx_project/master/libtbx/auto_build/bootstrap.py > bootstrap.py

python bootstrap.py hot update base build --builder=xfel --cciuser=<cciusername> --sfuser=<sourceforgeusername>

Here, in addition to hot, update and build, there's a command called 'base', which downloads a python and needed packages and compiles it. After this process completes, source the installation (for details see above), and then add scipy to it:

. build/setpaths.sh
libtbx.python -m easy_install scipy

Please contact the authors if with any issues. Some gotchas:

1) On a Mac, you will need to make sure you have the latest version of xcode installed. Consider the command xcode-select --install. [NOTE: Xcode v. 8.2.1. for Mac OS 10.12 (Sierra) does not include a gFortran compiler. You will need to install gFortran separately if you need scipy.]

2) On linux you will likely need to install some dependencies, including gFortran. [Note: gFortran is needed only for the scipy installation, not for the core build based on cctbx. Therefore, only install gFortran if you need scipy.]

== External links ==

* [http://cctbx.sourceforge.net/current_cvs/installation.html#manually-building-from-sources-under-unix Manually building from sources under Unix]

2017 cxi merge tutorial

2017-02-25T01:00:12Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers. Math derivations are further described in the source code release in file postrefinement_rs_model.pdf.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = i_obs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------
Bin Resolution Range # images %accept
--------------------------------------
1 -1.0000 - 5.3861 4712 100.00
2 5.3861 - 4.2749 4663 98.96
3 4.2749 - 3.7345 4646 98.60
4 3.7345 - 3.3930 4614 97.92
5 3.3930 - 3.1498 4578 97.16
6 3.1498 - 2.9641 4552 96.60
7 2.9641 - 2.8156 4521 95.95
8 2.8156 - 2.6930 4499 95.48
9 2.6930 - 2.5894 4477 95.01
10 2.5894 - 2.5000 4416 93.72

All 4721
--------------------------------------
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.475

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.477
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.480
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.508
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.518
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from trial 3 as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3716 (3432)
| style="padding: 5px;"| 92.8% (48.1%)
| style="padding: 5px;"| 93.3% (85.7%)
| style="padding: 5px;"| 0.522
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-25T00:59:09Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers. Math derivations are further described in the source code release in file postrefinement_rs_model.pdf.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = i_obs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------
Bin Resolution Range # images %accept
--------------------------------------
1 -1.0000 - 5.3861 4712 100.00
2 5.3861 - 4.2749 4663 98.96
3 4.2749 - 3.7345 4646 98.60
4 3.7345 - 3.3930 4614 97.92
5 3.3930 - 3.1498 4578 97.16
6 3.1498 - 2.9641 4552 96.60
7 2.9641 - 2.8156 4521 95.95
8 2.8156 - 2.6930 4499 95.48
9 2.6930 - 2.5894 4477 95.01
10 2.5894 - 2.5000 4416 93.72

All 4721
--------------------------------------
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.477
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.480
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.508
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.518
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from trial 3 as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3716 (3432)
| style="padding: 5px;"| 92.8% (48.1%)
| style="padding: 5px;"| 93.3% (85.7%)
| style="padding: 5px;"| 0.522
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-25T00:57:30Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers. Math derivations are further described in the source code release in file postrefinement_rs_model.pdf.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = i_obs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------
Bin Resolution Range # images %accept
--------------------------------------
1 -1.0000 - 5.3861 4712 100.00
2 5.3861 - 4.2749 4663 98.96
3 4.2749 - 3.7345 4646 98.60
4 3.7345 - 3.3930 4614 97.92
5 3.3930 - 3.1498 4578 97.16
6 3.1498 - 2.9641 4552 96.60
7 2.9641 - 2.8156 4521 95.95
8 2.8156 - 2.6930 4499 95.48
9 2.6930 - 2.5894 4477 95.01
10 2.5894 - 2.5000 4416 93.72

All 4721
--------------------------------------
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.480
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.508
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.518
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from trial 3 as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3716 (3432)
| style="padding: 5px;"| 92.8% (48.1%)
| style="padding: 5px;"| 93.3% (85.7%)
| style="padding: 5px;"| 0.522
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-25T00:55:58Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers. Math derivations are further described in the source code release in file postrefinement_rs_model.pdf.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = i_obs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------
Bin Resolution Range # images %accept
--------------------------------------
1 -1.0000 - 5.3861 4712 100.00
2 5.3861 - 4.2749 4663 98.96
3 4.2749 - 3.7345 4646 98.60
4 3.7345 - 3.3930 4614 97.92
5 3.3930 - 3.1498 4578 97.16
6 3.1498 - 2.9641 4552 96.60
7 2.9641 - 2.8156 4521 95.95
8 2.8156 - 2.6930 4499 95.48
9 2.6930 - 2.5894 4477 95.01
10 2.5894 - 2.5000 4416 93.72

All 4721
--------------------------------------
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.508
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.518
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from trial 3 as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3716 (3432)
| style="padding: 5px;"| 92.8% (48.1%)
| style="padding: 5px;"| 93.3% (85.7%)
| style="padding: 5px;"| 0.522
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-25T00:53:09Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers. Math derivations are further described in the source code release in file postrefinement_rs_model.pdf.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = i_obs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------
Bin Resolution Range # images %accept
--------------------------------------
1 -1.0000 - 5.3861 4712 100.00
2 5.3861 - 4.2749 4663 98.96
3 4.2749 - 3.7345 4646 98.60
4 3.7345 - 3.3930 4614 97.92
5 3.3930 - 3.1498 4578 97.16
6 3.1498 - 2.9641 4552 96.60
7 2.9641 - 2.8156 4521 95.95
8 2.8156 - 2.6930 4499 95.48
9 2.6930 - 2.5894 4477 95.01
10 2.5894 - 2.5000 4416 93.72

All 4721
--------------------------------------
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.518
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from trial 3 as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3716 (3432)
| style="padding: 5px;"| 92.8% (48.1%)
| style="padding: 5px;"| 93.3% (85.7%)
| style="padding: 5px;"| 0.522
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-25T00:52:38Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers. Math derivations are further described in the source code release in file postrefinement_rs_model.pdf.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = i_obs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------
Bin Resolution Range # images %accept
--------------------------------------
1 -1.0000 - 5.3861 4712 100.00
2 5.3861 - 4.2749 4663 98.96
3 4.2749 - 3.7345 4646 98.60
4 3.7345 - 3.3930 4614 97.92
5 3.3930 - 3.1498 4578 97.16
6 3.1498 - 2.9641 4552 96.60
7 2.9641 - 2.8156 4521 95.95
8 2.8156 - 2.6930 4499 95.48
9 2.6930 - 2.5894 4477 95.01
10 2.5894 - 2.5000 4416 93.72

All 4721
--------------------------------------
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.518
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the trial 2 rs2 as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3716 (3432)
| style="padding: 5px;"| 92.8% (48.1%)
| style="padding: 5px;"| 93.3% (85.7%)
| style="padding: 5px;"| 0.522
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-24T23:49:11Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers. Math derivations are further described in the source code release in file postrefinement_rs_model.pdf.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = i_obs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------
Bin Resolution Range # images %accept
--------------------------------------
1 -1.0000 - 5.3861 4712 100.00
2 5.3861 - 4.2749 4663 98.96
3 4.2749 - 3.7345 4646 98.60
4 3.7345 - 3.3930 4614 97.92
5 3.3930 - 3.1498 4578 97.16
6 3.1498 - 2.9641 4552 96.60
7 2.9641 - 2.8156 4521 95.95
8 2.8156 - 2.6930 4499 95.48
9 2.6930 - 2.5894 4477 95.01
10 2.5894 - 2.5000 4416 93.72

All 4721
--------------------------------------
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.518
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-21T01:27:46Z

Nicksauter: /* 2) Sort the lattices */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers. Math derivations are further described in the source code release in file postrefinement_rs_model.pdf.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = i_obs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------
Bin Resolution Range # images %accept
--------------------------------------
1 -1.0000 - 5.3861 4712 100.00
2 5.3861 - 4.2749 4663 98.96
3 4.2749 - 3.7345 4646 98.60
4 3.7345 - 3.3930 4614 97.92
5 3.3930 - 3.1498 4578 97.16
6 3.1498 - 2.9641 4552 96.60
7 2.9641 - 2.8156 4521 95.95
8 2.8156 - 2.6930 4499 95.48
9 2.6930 - 2.5894 4477 95.01
10 2.5894 - 2.5000 4416 93.72

All 4721
--------------------------------------
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-17T19:24:51Z

Nicksauter: /* cxi.xmerge program output */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers. Math derivations are further described in the source code release in file postrefinement_rs_model.pdf.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = i_obs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------
Bin Resolution Range # images %accept
--------------------------------------
1 -1.0000 - 5.3861 4712 100.00
2 5.3861 - 4.2749 4663 98.96
3 4.2749 - 3.7345 4646 98.60
4 3.7345 - 3.3930 4614 97.92
5 3.3930 - 3.1498 4578 97.16
6 3.1498 - 2.9641 4552 96.60
7 2.9641 - 2.8156 4521 95.95
8 2.8156 - 2.6930 4499 95.48
9 2.6930 - 2.5894 4477 95.01
10 2.5894 - 2.5000 4416 93.72

All 4721
--------------------------------------
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-17T17:57:25Z

Nicksauter:

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers. Math derivations are further described in the source code release in file postrefinement_rs_model.pdf.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = i_obs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 Tutorials

2017-02-17T17:32:04Z

Nicksauter: /* Feb 16th */

= Feb 16th =

9:00am: Session 1

*Nick Sauter: Welcome and "Trials and tribulations merging still image data"

*Aaron Brewster: "Metrology and non-isomorphism: hidden challenges in still image data reduction"

*Axel Brunger: "Data processing of XFEL data from a limited number of crystals"

10:20am: Break

10:40am: Session 2

*Art Lyubimov: "IOTA: Integration optimization, triage and analysis tool for XFEL data processing"

*Monarin Uervirojnangkoorn: "Up and Running with Prime"

*Jacques-Philippe Colletier: "Mosquito larvicide BinAB revealed by de novo phasing with an X-ray laser"

*James Holton: "What if? Using at-scale image simulations to optimize data processing algorithms"

Noon: Working Lunch - Roundtable discussion of data processing challenges

1:00pm: Session 3

*Graeme Winter and Richard Gildea: "DIALS - new methods for processing X-ray diffraction data"

*James Parkhurst: "Robust background modelling in the presence of outliers in DIALS"

*Jan Kern: TBD

*Franklin Fuller: "Drop-by-Drop Transient Serial Crystallography of Metalloenzymes at an X-ray free electron laser"

*Iris Young: "Room temperature studies of the oxygen-evolving complex of photosystem II using an X-ray free electron laser (XFEL)"

2:40pm: Break

3:00pm: Session 4

*Aina Cohen: TBD

*Rahel Woldeyes: "Using X-ray Free Electron lasers to visualize solvent in the M2 proton channel"

*Danny Axford: "Highly efficient serial data collection from high-density fixed targets"

*Christoph Mueller-Dieckmann: "Serial Synchrotron Crystallography at the ESRF using a high viscosity extruder"

*Allen Orville: TBD

= Feb 17th =

== 9am: Tutorials 1 ==
Aaron Brewster and Iris Young: cctbx.xfel
Break: 10:00 am
== 10:15 am: Tutorials 2 ==
* Aaron Brewster and James Parkhurst: dials.stills_process
* Art Lyubimov: IOTA
* Monarin Uervirojnangkoorn: [[2017_prime_tutorial | PRIME]]
* Nick Sauter: [[2017_cxi_merge_tutorial | cxi.merge]]

== 12:15pm-2:30pm: Round table discussion. ==
Topics will include:
Is the current software meeting needs?
What are essential/timely avenues most useful for near-term development (first half of 2017)?
Where should we be going next (longer term future)?
Time resolved experiments?
Synchrotron serial crystallography?
== 2:30-4pm: Breakout sessions: ==
users work with developers and instructors on their own data. Hands-on walkthroughs and data analysis.

2017 cxi merge tutorial

2017-02-17T17:22:30Z

Nicksauter:

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers. Math derivations are further described in the source code release in file postrefinement_rs_model.pdf.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-17T17:21:00Z

Nicksauter:

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the [http://dx.doi.org/10.7554/eLife.05421 PRIME paper], the [http://dx.doi.org/10.1107/S1399004714024134 Sauter (2014)] and [http://dx.doi.org/10.1107/S1600577514028203 Sauter (2015)] papers.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-17T17:17:44Z

Nicksauter:

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]]. Literature description is in the [http://dx.doi.org/10.1038/nmeth.2887 Hattne (2014)], the PRIME paper, the Sauter (2014) and Sauter (2015) papers.

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-17T16:50:17Z

Nicksauter:

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

Useful:
export BOOST_ADAPTBX_FPE_DEFAULT=1
nproc=1
postrefinement.show_trumpet_plot=True

2017 cxi merge tutorial

2017-02-17T16:42:15Z

Nicksauter:

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}
phenix.xtriage ${tag}_s0_mark0.mtz scaling.input.xray_data.obs_labels=imean

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

2017 cxi merge tutorial

2017-02-17T16:26:04Z

Nicksauter:

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== cxi.xmerge program output ==
<pre>
--------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
--------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [1490/1490] 87.3% 1490 88.1% 1484 46.3% 32.9% 42.6% 0.772 300.328 8084.8580
2 5.3861 - 4.2749 [1500/1500] 76.4% 1500 89.3% 1500 43.8% 30.6% 34.5% 0.761 425.498 1728.0907
3 4.2749 - 3.7345 [1499/1499] 80.1% 1499 91.6% 1499 42.5% 26.7% 34.5% 0.684 430.028 1556.6316
4 3.7345 - 3.3930 [1497/1497] 80.5% 1497 90.3% 1497 37.9% 27.2% 29.9% 0.846 481.795 600.5001
5 3.3930 - 3.1498 [1477/1477] 84.2% 1477 90.0% 1477 37.2% 26.4% 31.4% 0.838 477.825 269.5784
6 3.1498 - 2.9641 [1492/1492] 80.0% 1492 91.5% 1492 39.8% 28.6% 28.3% 0.866 511.386 165.9517
7 2.9641 - 2.8156 [1483/1483] 76.7% 1483 90.0% 1483 39.3% 28.7% 30.1% 0.865 470.331 102.0659
8 2.8156 - 2.6930 [1451/1451] 76.8% 1451 90.7% 1451 38.5% 28.2% 27.3% 0.883 492.758 88.6666
9 2.6930 - 2.5894 [1532/1532] 76.6% 1532 89.4% 1532 40.1% 29.3% 30.5% 0.879 452.831 52.0092
10 2.5894 - 2.5000 [1472/1472] 77.2% 1472 88.9% 1474 42.9% 31.4% 35.3% 0.801 393.866 52.6667

All [14893/14893] 84.7% 14893 88.6% 14889 41.6% 29.0% 39.8% 0.771 378.964 804.8
--------------------------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

2017 cxi merge tutorial

2017-02-17T16:23:18Z

Nicksauter: /* Fine tuning */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== Table of results ==
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

2017 cxi merge tutorial

2017-02-17T16:16:02Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

=== Table of results ===
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;"|
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape
| style="padding: 5px;"| rs2: LBFGS LevMar to refine Rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid gentle weighting (|I|/sigma**2) gaussian lineshape recycle model
| style="padding: 5px;"| Use mtz from the first cycle as a scaling reference
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

2017 cxi merge tutorial

2017-02-17T16:11:26Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

=== Table of results ===
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Details
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;"| scale only
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;"| refine scale, B, rotx,roty
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;"| analytical derivatives better convergence test Flex database
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid unit weighting gaussian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid unit weighting gaussian lineshape recycle model
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

2017 cxi merge tutorial

2017-02-17T16:03:42Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

=== Table of results ===
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid unit weighting gaussian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid unit weighting gaussian lineshape recycle model
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

2017 cxi merge tutorial

2017-02-17T16:03:06Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

=== Table of results ===
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid unit weighting gaussian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3 / cycle2
| style="padding: 5px;"| rs_hybrid unit weighting gaussian lineshape recycle model
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

2017 cxi merge tutorial

2017-02-17T16:02:29Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

=== Table of results ===
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| rs2 unit weighting gaussian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4721 (4416)
| style="padding: 5px;"| 90.9% (69.6%)
| style="padding: 5px;"| 90.9% (89.1%)
| style="padding: 5px;"| 0.470
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| rs_hybrid unit weighting gaussian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4059 (3783)
| style="padding: 5px;"| 93.5% (37.3%)
| style="padding: 5px;"| 95.4% (89.1%)
| style="padding: 5px;"| 0.504
|-
| style="padding: 10px;"| trial3noprime
| style="padding: 5px;"| rs_hybrid unit weighting gaussian lineshape recycle model
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 3973 (3700)
| style="padding: 5px;"| 93.3% (55.5%)
| style="padding: 5px;"| 93.6% (87.0%)
| style="padding: 5px;"| 0.509
|-
|}

2017 cxi merge tutorial

2017-02-17T15:49:56Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

=== Table of results ===
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| nopost
| style="padding: 5px;"| no postrefinement
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4962 (4828)
| style="padding: 5px;"| 77.5% (66.2%)
| style="padding: 5px;"| 84.0% (85.8%)
| style="padding: 5px;"| 0.423

|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719 (4458)
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| Method
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| Method
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| trial4
| style="padding: 5px;"| Method
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e157</code>
| style="padding: 5px;"| Hattne, 2014 (cctbx processing & weak Zn anomalous signal)
| style="padding: 5px;"| [http://cxidb.org/id-23.html 23]
|-
| style="padding: 5px;"| Method
| style="padding: 5px;"| Method
| style="padding: 5px;"| Method
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e350</code>
| style="padding: 5px;"| unpublished (illustrate CSPAD hi/lo gain settings)
|}

2017 cxi merge tutorial

2017-02-17T15:41:57Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

=== Table of results ===
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
| style="padding: 5px;"| <|L|> test (0.5 perfect)
|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4942 (4650)
| style="padding: 5px;"| 84.7% (77.2%)
| style="padding: 5px;"| 88.6% (88.9%)
| style="padding: 5px;"| 0.455
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5% (89.1%)
| style="padding: 5px;"| 0.459
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| Method
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| Method
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| trial4
| style="padding: 5px;"| Method
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e157</code>
| style="padding: 5px;"| Hattne, 2014 (cctbx processing & weak Zn anomalous signal)
| style="padding: 5px;"| [http://cxidb.org/id-23.html 23]
|-
| style="padding: 5px;"| Method
| style="padding: 5px;"| Method
| style="padding: 5px;"| Method
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e350</code>
| style="padding: 5px;"| unpublished (illustrate CSPAD hi/lo gain settings)
|}

2017 cxi merge tutorial

2017-02-17T15:29:38Z

Nicksauter: /* Table of results */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

=== Table of results ===
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Method
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
| style="padding: 5px;"| CCiso (highest shell)
|-
| style="padding: 10px;"| basic
| style="padding: 5px;"| rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Boutet, 2012 (initial report) Hattne, 2014 (cctbx reprocessing)
| style="padding: 5px;"| 84.7%
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;"| rs2 unit weighting lorentzian lineshape
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| 4719
| style="padding: 5px;"| 88.2% (74.8%)
| style="padding: 5px;"| 89.5%% (89.1%)
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;"| Method
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;"| Method
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| trial4
| style="padding: 5px;"| Method
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e157</code>
| style="padding: 5px;"| Hattne, 2014 (cctbx processing & weak Zn anomalous signal)
| style="padding: 5px;"| [http://cxidb.org/id-23.html 23]
|-
| style="padding: 5px;"| Method
| style="padding: 5px;"| Method
| style="padding: 5px;"| Method
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e350</code>
| style="padding: 5px;"| unpublished (illustrate CSPAD hi/lo gain settings)
|}

2017 cxi merge tutorial

2017-02-17T15:29:31Z

Nicksauter:

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== cxi.merge program output ==
<pre>
----------------------------------------------------------------------------------------
<asu <obs
Bin Resolution Range Completeness % multi> multi> n_meas <I/sig(I)>
----------------------------------------------------------------------------------------
1 -1.000 - 5.386 [1490/1490] 100.00 102.21 102.21 152295 103994 103.244
2 5.386 - 4.275 [1500/1500] 100.00 62.76 62.76 94141 128403 95.046
3 4.275 - 3.735 [1499/1499] 100.00 53.90 53.90 80795 143552 92.607
4 3.735 - 3.393 [1497/1497] 100.00 47.14 47.14 70571 112723 70.575
5 3.393 - 3.150 [1477/1477] 100.00 43.96 43.96 64928 76925 51.011
6 3.150 - 2.964 [1488/1488] 100.00 39.87 39.87 59330 57060 37.899
7 2.964 - 2.816 [1483/1483] 100.00 38.17 38.17 56611 44079 32.085
8 2.816 - 2.693 [1455/1455] 100.00 36.34 36.34 52874 37117 27.460
9 2.693 - 2.589 [1530/1530] 100.00 34.49 34.49 52763 30496 24.443
10 2.589 - 2.500 [1476/1476] 100.00 31.83 31.83 46974 27147 21.564

All [14895/14895] 100.00 49.10 49.10 731282 76275 55.681
----------------------------------------------------------------------------------------
</pre>

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

=== Table of results ===
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
|-
| style="padding: 10px;"| basic rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Boutet, 2012 (initial report) Hattne, 2014 (cctbx reprocessing)
| style="padding: 5px;"| 84.7%
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Boutet, 2012 (initial report) Hattne, 2014 (cctbx reprocessing)
| style="padding: 5px;"| [http://cxidb.org/id-17.html 17]
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| trial4
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e157</code>
| style="padding: 5px;"| Hattne, 2014 (cctbx processing & weak Zn anomalous signal)
| style="padding: 5px;"| [http://cxidb.org/id-23.html 23]
|-
| style="padding: 10px;"| '''[[LB67 Thermolysin]]'''
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e350</code>
| style="padding: 5px;"| unpublished (illustrate CSPAD hi/lo gain settings)
|}

2017 cxi merge tutorial

2017-02-17T15:21:26Z

Nicksauter: /* List of Examples */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

=== Table of results ===
{| class="wikitable"
| style="padding: 5px;"| Tag
| style="padding: 5px;"| Resolution (Angstrom)
| style="padding: 5px;"| # files accepted
| style="padding: 5px;"| CC1/2 (highest shell)
|-
| style="padding: 10px;"| basic rs
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Boutet, 2012 (initial report) Hattne, 2014 (cctbx reprocessing)
| style="padding: 5px;"| 84.7%
|-
| style="padding: 10px;"| trial1
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Boutet, 2012 (initial report) Hattne, 2014 (cctbx reprocessing)
| style="padding: 5px;"| [http://cxidb.org/id-17.html 17]
|-
| style="padding: 10px;"| trial2
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| trial3
| style="padding: 5px;" | 2.5
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| trial4
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e157</code>
| style="padding: 5px;"| Hattne, 2014 (cctbx processing & weak Zn anomalous signal)
| style="padding: 5px;"| [http://cxidb.org/id-23.html 23]
|-
| style="padding: 10px;"| '''[[LB67 Thermolysin]]'''
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e350</code>
| style="padding: 5px;"| unpublished (illustrate CSPAD hi/lo gain settings)
|}

2017 cxi merge tutorial

2017-02-17T15:10:20Z

Nicksauter: /* Initial characterization */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

== List of Examples ==
{| class="wikitable"
| style="padding: 10px;"| <big>Dataset</big>
| style="padding: 5px;"| File Paths
| style="padding: 5px;"| Reference & purpose
| style="padding: 5px;"| CXIDB
|-
| style="padding: 10px;"| '''L220 Lysozyme'''
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e60</code>
| style="padding: 5px;"| Boutet, 2012 (initial report) Hattne, 2014 (cctbx reprocessing)
| style="padding: 5px;"| [http://cxidb.org/id-17.html 17]
|-
| style="padding: 10px;"| '''[[Gd-Lysozyme]]'''
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e239 /reg/d/psdm/cxi/cxi84914/xtc/e240</code>
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| '''[[Gd-Lysozyme-psana]]''' Tutorial revised for psana migration
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e239 /reg/d/psdm/cxi/cxi84914/xtc/e240</code>
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| '''[[L498 Thermolysin]]'''
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e157</code>
| style="padding: 5px;"| Hattne, 2014 (cctbx processing & weak Zn anomalous signal)
| style="padding: 5px;"| [http://cxidb.org/id-23.html 23]
|-
| style="padding: 10px;"| '''[[LB67 Thermolysin]]'''
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e350</code>
| style="padding: 5px;"| unpublished (illustrate CSPAD hi/lo gain settings)
|}

2017 Tutorials

2017-02-17T15:02:00Z

Nicksauter: /* Feb 16th */

= Feb 16th =

9:00am: Session 1

Nick Sauter: Welcome and "Trials and tribulations merging still image data"

Aaron Brewster: "Metrology and non-isomorphism: hidden challenges in still image data reduction"

Axel Brunger: "Data processing of XFEL data from a limited number of crystals"

10:20am: Break

10:40am: Session 2

Art Lyubimov: "IOTA: Integration optimization, triage and analysis tool for XFEL data processing"

Monarin Uervirojnangkoorn: "Up and Running with Prime"

Jacques-Philippe Colletier: "Mosquito larvicide BinAB revealed by de novo phasing with an X-ray laser"

James Holton: "What if? Using at-scale image simulations to optimize data processing algorithms"

Noon: Working Lunch - Roundtable discussion of data processing challenges

1:00pm: Session 3

Graeme Winter and Richard Gildea: "DIALS - new methods for processing X-ray diffraction data"

James Parkhurst: "Robust background modelling in the presence of outliers in DIALS"

Jan Kern: TBD

Franklin Fuller: "Drop-by-Drop Transient Serial Crystallography of Metalloenzymes at an X-ray free electron laser"

Iris Young: "Room temperature studies of the oxygen-evolving complex of photosystem II using an X-ray free electron laser (XFEL)"

2:40pm: Break

3:00pm: Session 4

Aina Cohen: TBD

Rahel Woldeyes: "Using X-ray Free Electron lasers to visualize solvent in the M2 proton channel"

Danny Axford: "Highly efficient serial data collection from high-density fixed targets"

Christoph Mueller-Dieckmann: "Serial Synchrotron Crystallography at the ESRF using a high viscosity extruder"

Allen Orville: TBD

= Feb 17th =

== 9am: Tutorials 1 ==
Aaron Brewster and Iris Young: cctbx.xfel
Break: 10:00 am
== 10:15 am: Tutorials 2 ==
* Aaron Brewster and James Parkhurst: dials.stills_process
* Art Lyubimov: IOTA
* Monarin Uervirojnangkoorn: [[2017_prime_tutorial | PRIME]]
* Nick Sauter: [[2017_cxi_merge_tutorial | cxi.merge]]

== 12:15pm-2:30pm: Round table discussion. ==
Topics will include:
Is the current software meeting needs?
What are essential/timely avenues most useful for near-term development (first half of 2017)?
Where should we be going next (longer term future)?
Time resolved experiments?
Synchrotron serial crystallography?
== 2:30-4pm: Breakout sessions: ==
users work with developers and instructors on their own data. Hands-on walkthroughs and data analysis.

2017 cxi merge tutorial

2017-02-10T23:37:30Z

Nicksauter: /* Fine tuning */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

== List of Examples ==
{| class="wikitable"
| style="padding: 10px;"| <big>Dataset</big>
| style="padding: 5px;"| File Paths
| style="padding: 5px;"| Reference & purpose
| style="padding: 5px;"| CXIDB
|-
| style="padding: 10px;"| '''L220 Lysozyme'''
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e60</code>
| style="padding: 5px;"| Boutet, 2012 (initial report) Hattne, 2014 (cctbx reprocessing)
| style="padding: 5px;"| [http://cxidb.org/id-17.html 17]
|-
| style="padding: 10px;"| '''[[Gd-Lysozyme]]'''
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e239 /reg/d/psdm/cxi/cxi84914/xtc/e240</code>
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| '''[[Gd-Lysozyme-psana]]''' Tutorial revised for psana migration
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e239 /reg/d/psdm/cxi/cxi84914/xtc/e240</code>
| style="padding: 5px;"| Barends, 2013 (SAD anomalous phasing)
| style="padding: 5px;"| [http://cxidb.org/id-22.html 22]
|-
| style="padding: 10px;"| '''[[L498 Thermolysin]]'''
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e157</code>
| style="padding: 5px;"| Hattne, 2014 (cctbx processing & weak Zn anomalous signal)
| style="padding: 5px;"| [http://cxidb.org/id-23.html 23]
|-
| style="padding: 10px;"| '''[[LB67 Thermolysin]]'''
| style="padding: 5px;" | <code>/reg/d/psdm/cxi/cxi84914/xtc/e350</code>
| style="padding: 5px;"| unpublished (illustrate CSPAD hi/lo gain settings)
|}

2017 cxi merge tutorial

2017-02-09T22:34:18Z

Nicksauter: /* Fine tuning */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%

trial1 rs2 unit-weighting lorentzian lineshape 88.2%

trial 2 gaussian line shape 90.9%

trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted

trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

2017 cxi merge tutorial

2017-02-09T22:33:50Z

Nicksauter:

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

=== 3) Apply reindexing operators and merge ===

== Fine tuning ==

postrefine rs. cc1/2 = 84.7%
trial1 rs2 unit-weighting lorentzian lineshape 88.2%
trial 2 gaussian line shape 90.9%
trial 3 gaussian rs_hybrid 93.5% only 4059 files accepted
trial 4 extend to 2.0 angstrom 87.9% (but 97.8% on lowest shell)

2017 cxi merge tutorial

2017-02-09T20:57:23Z

Nicksauter: /* 2) Sort the lattices */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2503 h,-h-k,-1=2485 total 4988

2017 cxi merge tutorial

2017-02-09T20:52:21Z

Nicksauter: /* 1) Generate a database of observations */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark1 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
raw_data.sdfac_auto=False \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

This yields 4988 of 5031 integration files accepted.

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2492 h,-h-k,-1=2506 total 4998

2017 cxi merge tutorial

2017-02-09T20:41:54Z

Nicksauter: /* 1) Generate a database of observations */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=iobs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=False \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2492 h,-h-k,-1=2506 total 4998

2017 cxi merge tutorial

2017-02-09T20:41:38Z

Nicksauter: /* 2) Sort the lattices */

This is an updated, worked example of data merging using cxi.merge, for presentation at the Feb 17, 2017 Berkeley Lab Serial Crystallography Workshop. Previous documentation sets are [[Merging | here]] and [[Advanced Merging | here]].

== Initial characterization ==
In this example, we are given integrated still-shot data collected by Danny Axford at Diamond, for P6 myoglobin, PDB code [http://www.rcsb.org/pdb/explore/explore.do?structureId=5M3S 5M3S].

* /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted # cctbx-style integration pickles
* /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted # same data, with per-image resolution cutoff during integration

Unix ls reveals 5031 *.pickle files in each directory.

Immediately there is a problem:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/sig_filter/split_reintegrated/extracted/*.pickle

...fails on image 0059 with a traceback; it looks like the file is corrupted.

So focus on the data without integration resolution cutoff:

$ cxi.print_pickle /net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle

Some conclusions with the aid of grep:
* all integration pickles have space group P6 (good)
* distance and beam center is fixed throughout the integrated dataset
* Unit cells are variable but do seem to cluster around 91.4 91.4 45.9 90 90 120

phenix.fetch_pdb --mtz 5m3s

Merge command file:
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted/*.pickle \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=1 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=i-obs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=True \
postrefinement.algorithm=rs \
output.prefix=TAG"
set tag = p6m
set dmin = 2.5
set neg = True
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
exit
cxi.xmerge ${eff}

Initial trial nproc=1 just to see if it runs. Had to fix PDB reference. Can't use *.pickle on the data= line

Scale-up trial nproc=60, no postrefinement.
set the MTZ flag = jobs
4493 of 5031 integration files were accepted
0 rejected due to wrong Bravais group
11 rejected for unit cell outliers
22 rejected for low signal
505 rejected due to up-front poor correlation under min_corr parameter
0 rejected for file errors or no reindex matrix
Usage: 5m3s.mtz does not contain any observations labelled [fobs, imean, i-obs]. Please set scaling.mtz_column_F to one of [iobs].
File "/net/viper/raid1/sauter/proj-e/modules/cctbx_project/xfel/cxi/util.py", line 13, in is_odd_numbered
return int(os.path.basename(file_name).split(allowable)[0][-1])%2==1
ValueError: invalid literal for int() with base 10: 'd'

Something is wrong in the ability to determine even/odd numbered-ness. Added "_extracted.pickle" in the code; had to put it first.

Table of Scaling Results:

---------------------------------------------------------------------------------------------------------
CC N CC N R R R Scale Scale SpSig
Bin Resolution Range Completeness int int iso iso int split iso int iso Test
---------------------------------------------------------------------------------------------------------
1 -1.0000 - 5.3861 [809/809] 80.0% 809 75.2% 805 61.0% 40.1% 52.9% 0.551 214.059 12489.8850
2 5.3861 - 4.2749 [791/791] 54.9% 791 74.5% 791 53.0% 38.8% 49.7% 0.693 270.307 1785.4625
3 4.2749 - 3.7345 [781/781] 65.8% 781 81.6% 781 46.5% 33.6% 40.7% 0.762 337.287 1149.4218
4 3.7345 - 3.3930 [776/776] 63.9% 776 74.5% 776 49.3% 36.4% 48.6% 0.764 283.109 758.0388
5 3.3930 - 3.1498 [765/765] 67.1% 765 81.9% 765 48.4% 35.6% 43.4% 0.795 338.091 533.7650
6 3.1498 - 2.9641 [771/771] 58.6% 771 72.4% 771 49.3% 36.6% 50.7% 0.759 286.707 222.4718
7 2.9641 - 2.8156 [765/765] 56.0% 765 72.3% 765 48.5% 35.3% 46.7% 0.765 320.954 154.5299
8 2.8156 - 2.6930 [746/746] 63.0% 746 76.1% 746 46.4% 34.3% 42.6% 0.867 357.183 99.4430
9 2.6930 - 2.5894 [790/790] 52.1% 790 69.4% 790 50.4% 37.4% 47.5% 0.814 314.326 113.1264
10 2.5894 - 2.5000 [757/757] 54.9% 757 78.6% 757 52.4% 38.9% 44.4% 0.794 306.403 109.0768

All [7751/7751] 74.9% 7751 78.8% 7747 51.9% 36.9% 50.1% 0.680 266.538 1298.0
---------------------------------------------------------------------------------------------------------

Of course we know the data do not scale because this is a polar space group, and data must be sorted by Brehm/Diederichs method.

== Breaking the indexing ambiguity ==

Take note of our detail instructions on [[Resolving an Indexing Ambiguity]]. Do this in three steps:

=== 1) Generate a database of observations ===

step1.csh:

<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
data=/net/dials/raid1/aaron/zurich0038/jr_006_batches/split_reintegrated/extracted \
output.n_bins=10 \
pixel_size=0.172 \
backend=FS \
nproc=60 \
model=5m3s.pdb \
merge_anomalous=True \
plot_single_index_histograms=False \
scaling.algorithm=mark0 \
raw_data.sdfac_auto=False \
scaling.mtz_file=5m3s.mtz \
scaling.show_plots=False \
scaling.log_cutoff=None \
scaling.mtz_column_F=iobs \
scaling.report_ML=True \
set_average_unit_cell=True \
rescale_with_average_cell=False \
significance_filter.apply=True \
significance_filter.min_ct=30 \
significance_filter.sigma=0.2 \
include_negatives=NEG \
postrefinement.enable=False \
output.prefix=TAG"

set tag = p6m
set dmin = 2.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.merge ${eff}
</pre>

=== 2) Sort the lattices ===

step2.csh:
<pre>
#!/bin/csh -f

set effective_params = "d_min=DMIN \
pixel_size=0.172 \
target_unit_cell=91.4,91.4,45.9,90,90,120 \
target_space_group=P6 \
backend=FS \
nproc=60 \
merge_anomalous=True \
output.prefix=TAG"

set tag = p6m
set dmin = 3.5
set neg = False
set eff = `echo $effective_params|sed -e "s,FS,Flex,g"|sed -e "s,DMIN,$dmin,g"|sed -e "s,NEG,$neg,g"|sed -e "s,TAG,$tag,g"`

cxi.brehm_diederichs ${eff}
</pre>

BOOST crash--floating point error

^Z; kill %%

Try using d_min 3.5 instead of 2.5--still crash

Try using fewer proc; use 30 instead of 60. (increases problem size by 2**2=4) --still crash

Try nproc=15

It looks like the crash is associated with the matplotlib plot as I only experience it when I mouse-over the plot.

setenv BOOST_ADAPTBX_FPE_DEFAULT 1

14 plots total. h,k,l=2492 h,-h-k,-1=2506 total 4998