Difference between revisions of "Overview"

From cctbx_xfel
Jump to: navigation, search
(Phil)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
Cctbx.xfel at LCLS is built on five systems: the CS-PAD detector, pyana, LCLS’s queuing system, phil, and fundamentally, cctbx.
+
LCLS provides several frameworks for data analysis, and also provides methods for translating collected data to HDF5, a general and portable container format for storage of large amounts of numerical data.  ''cctbx.xfel'' extends the LCLS analysis packages with a set of analysis modules.  In particular, the ''cctbx.xfel'' analysis modules are run through ''pyana'', the Python-implementation of LCLS's analysis framework.  Since analysis proceeds directly from the raw data, no intermediate conversion is necessary, and it can be done while an experiment is running.
  
== The CS-PAD detector ==
+
''cctbx.xfel'' at LCLS is built on five systems: the CSPAD detector, pyana, LCLS’s queuing system, phil, and fundamentally, cctbx.
The LCLS at full capacity operates at 120 Hz. The incident photon packets are delivered in ~40 femtosecond wide pulses, each containing ~10^15 photons.  This high repetition rate and compact beam delivery time necessitated the construction of a new detector, where the work of reading out and streaming recorded data at these high speeds is accomplished through the use of 64 sensors, arranged in a quadrangular pattern around a central hole (in the place of a beam stop).  Each of the 4 quadrants, containing 16 of the sensors, is adjustable on rails radially away from the central hole to adjust the size of this hole.  Indexing, predicting spot locations using a crystal orientation matrix, and integrating reflection intensities requires precise knowledge of the location of these sensors in three-dimensional space.  For this reason, a portion of this tutorial describes the calibration and refinement of the tile metrology.
+
  
More detailed information about the CS-PAD detector is available here: [http://www-public.slac.stanford.edu/sciDoc/docMeta.aspx?slacPubNumber=SLAC-PUB-15284]
+
== The CSPAD detector ==
 +
[[File:750.png|200px|thumb|right|Example CSPAD image]]
 +
The LCLS at full capacity operates at 120 Hz.  The incident photon packets are delivered in ≈40 femtosecond wide pulses, each containing ≈10<sup>12</sup> photons.  This high repetition rate and compact beam delivery time necessitated the construction of a new detector<ref>[http://www-public.slac.stanford.edu/sciDoc/docMeta.aspx?slacPubNumber=SLAC-PUB-15284 Hart, P The Cornell-SLAC Pixel Array Detector at LCLS. <i>SLAC Scientific Documents</i> (2012).]</ref>, where the work of reading out and streaming recorded data at these high speeds is accomplished through the use of 64 sensors, arranged in a quadrangular pattern around a central hole (in the place of a beam stop).  Each of the 4 quadrants, containing 16 of the sensors, is adjustable on rails radially away from the central hole to adjust the size of this hole.  Indexing, predicting spot locations using a crystal orientation matrix, and integrating reflection intensities requires precise knowledge of the location of these sensors in three-dimensional space.  For this reason, a portion of this tutorial describes the calibration and refinement of the tile metrology.
  
== Pyana ==
 
  
The LCLS Data Acquisition Systems stream the terabytes of diffraction data collected from the CS-PAD detector to container files in XTC format.  XTC is a linear, non-random access file format, where individual images can be recorded rapidly by the file system as they are collected.  The programmatic interface to interact with these files at LCLS is psana and pyana.  Psana is a C++ interface.  Cctbx.xfel uses the python-based pyana.
+
== ''psana'' ==
  
Pyana is driven by configuration files to process frames individually, and is designed with computational parallelization in mindAs each image is independent, processing of each image can be done by separate computer coresCctbx.xfel uses pyana and pyana's config files to read and process image files stored in XTC format.  The user specifies how each image is to be processed in the configuration file, and the passes the config file and the path to the XTC streams of interest to cctbx.xfel, which calls pyana and submits the job to the queuing system.
+
The LCLS data acquisition systems stream the terabytes of diffraction data collected from the CSPAD detector to container files in XTC formatXTC is a linear, sequential-access file format, where individual images can be recorded rapidly by the file system as they are collectedThe programmatic interface to interact with these files at LCLS is ''psana'', a C++/Python-based interface.
  
For example, if the user wanted to filter an XTC stream for hits, index the hits and then integrate images which successfully indexed, the user would supply a configuration file which specified cctbx.xfel modules that did these tasks, provide options to these modules, and submit the job.  Specific details are in the tutorials.
+
''psana'' is driven by configuration files to process frames individually, and is designed with computational parallelization in mind.  As each image is independent, processing of each image can be done by separate computer cores.  ''cctbx.xfel'' uses ''psana'' and ''psana'''s configuration files to read and process image files stored in XTC format.  The user specifies how each image is to be processed in the configuration file, and the passes the configuration file and the path to the XTC streams of interest to ''cctbx.xfel'', which calls ''psana'' and submits the job to the queuing system.
  
During processing, hits are extracted from the XTC stream and written to separate files for each individual image.  At the moment these separate files are in a in a python-programming language friendly format called pickle format.  However, by the end of 2013, cctbx.xfel will be exclusively using CBF and HDF5 formats to output results.
+
For example, if the user wanted to filter an XTC stream for hits, index the hits and then integrate images which successfully indexed, the user would supply a configuration file which specified ''cctbx.xfel'' modules that did these tasks, provide options to these modules, and submit the job.  Specific details are in the tutorials.
  
More information about pyana: [http://www-public.slac.stanford.edu/sciDoc/docMeta.aspx?slacPubNumber=SLAC-PUB-15284]
+
During processing, hits are extracted from the XTC stream and written to separate files for each individual image.  At the moment these separate files are in a in a Python-programming language friendly format called pickle format. However, by the end of 2014, ''cctbx.xfel'' will be exclusively using CBF and HDF5 formats to output results.
  
== LCLS Queuing System ==
+
More information about psana: [https://confluence.slac.stanford.edu/display/PSDM/psana+-+Python+Script+Analysis+Manual]
  
 +
== LCLS queuing system ==
  
 +
SLAC maintains several computing clusters available to its users for processing data.  While detailed knowledge of their workings isn't required for ''cctbx.xfel'' operation, an overview of these systems is provided here: [https://confluence.slac.stanford.edu/display/PCDS/Computing].  Specific commands for submitting ''cctbx.xfel'' jobs to the cluster are given in the tutorials.
  
== Phil ==
+
General instructions for submitting batch jobs can be found here: [https://confluence.slac.stanford.edu/display/PCDS/Submitting+Batch+Jobs].  Of note are these commands:
 +
* bsub: used to submit jobs to the queuing system
 +
* bjobs: used to list the jobs being run by the current user
 +
* bkill: used to stop a job that is running
  
While pyana is configured using its own .cfg files, cctbx.xfel itself is driven using Python-based hierarchical interchange language (phil) files, the same format that drives Labelit and Phenix (though  Phenix calls them .eff files).  The format is intuitive and allows easy specification of per-processing run parameters.
+
All of these commands have extensive man pages available at LCLS.
  
== Cctbx ==
+
== ''Phil'' ==
 +
 
 +
While ''psana'' is configured using its own configuration files, ''cctbx.xfel'' itself is driven using Python-based hierarchical interchange language (''phil'') files, the same format that drives Labelit and ''PHENIX'' (though ''PHENIX'' calls them .eff files).  The format is intuitive and allows easy specification of per-processing run parameters.
 +
 
 +
A user's ''psana'' configuration file will have an entry called xfel_target.  This entry will provide a phil filename that contains ''cctbx.xfel'' configuration settings.  These settings will include thing such as thresholds for determining hits (number of spots on an image, spot brightness cutoff, etc.), unit cell targets for indexing, resolution cutoffs, and so forth.
 +
 
 +
Technical information regarding ''phil'': [http://cctbx.sourceforge.net/libtbx_phil.html]
 +
 
 +
Specific ''phil'' files used in this tutorial: ''[[phil]]''
 +
 
 +
== ''cctbx'' ==
  
 
The computational crystallographic toolbox is a foundational set of python and C++ modules that allow abstraction of the crystallographic experiment.    Under continual development, the toolbox provides interfaces for working with crystal models, reflection data, and much more.
 
The computational crystallographic toolbox is a foundational set of python and C++ modules that allow abstraction of the crystallographic experiment.    Under continual development, the toolbox provides interfaces for working with crystal models, reflection data, and much more.
 +
 +
Introduction: [http://cctbx.sourceforge.net/current/]
 +
Homepage: [http://cctbx.sourceforge.net/]
 +
 +
== References ==
 +
<references/>

Latest revision as of 00:25, 12 December 2014

LCLS provides several frameworks for data analysis, and also provides methods for translating collected data to HDF5, a general and portable container format for storage of large amounts of numerical data. cctbx.xfel extends the LCLS analysis packages with a set of analysis modules. In particular, the cctbx.xfel analysis modules are run through pyana, the Python-implementation of LCLS's analysis framework. Since analysis proceeds directly from the raw data, no intermediate conversion is necessary, and it can be done while an experiment is running.

cctbx.xfel at LCLS is built on five systems: the CSPAD detector, pyana, LCLS’s queuing system, phil, and fundamentally, cctbx.

The CSPAD detector

Example CSPAD image

The LCLS at full capacity operates at 120 Hz. The incident photon packets are delivered in ≈40 femtosecond wide pulses, each containing ≈1012 photons. This high repetition rate and compact beam delivery time necessitated the construction of a new detector<ref>Hart, P The Cornell-SLAC Pixel Array Detector at LCLS. SLAC Scientific Documents (2012).</ref>, where the work of reading out and streaming recorded data at these high speeds is accomplished through the use of 64 sensors, arranged in a quadrangular pattern around a central hole (in the place of a beam stop). Each of the 4 quadrants, containing 16 of the sensors, is adjustable on rails radially away from the central hole to adjust the size of this hole. Indexing, predicting spot locations using a crystal orientation matrix, and integrating reflection intensities requires precise knowledge of the location of these sensors in three-dimensional space. For this reason, a portion of this tutorial describes the calibration and refinement of the tile metrology.


psana

The LCLS data acquisition systems stream the terabytes of diffraction data collected from the CSPAD detector to container files in XTC format. XTC is a linear, sequential-access file format, where individual images can be recorded rapidly by the file system as they are collected. The programmatic interface to interact with these files at LCLS is psana, a C++/Python-based interface.

psana is driven by configuration files to process frames individually, and is designed with computational parallelization in mind. As each image is independent, processing of each image can be done by separate computer cores. cctbx.xfel uses psana and psana's configuration files to read and process image files stored in XTC format. The user specifies how each image is to be processed in the configuration file, and the passes the configuration file and the path to the XTC streams of interest to cctbx.xfel, which calls psana and submits the job to the queuing system.

For example, if the user wanted to filter an XTC stream for hits, index the hits and then integrate images which successfully indexed, the user would supply a configuration file which specified cctbx.xfel modules that did these tasks, provide options to these modules, and submit the job. Specific details are in the tutorials.

During processing, hits are extracted from the XTC stream and written to separate files for each individual image. At the moment these separate files are in a in a Python-programming language friendly format called pickle format. However, by the end of 2014, cctbx.xfel will be exclusively using CBF and HDF5 formats to output results.

More information about psana: [1]

LCLS queuing system

SLAC maintains several computing clusters available to its users for processing data. While detailed knowledge of their workings isn't required for cctbx.xfel operation, an overview of these systems is provided here: [2]. Specific commands for submitting cctbx.xfel jobs to the cluster are given in the tutorials.

General instructions for submitting batch jobs can be found here: [3]. Of note are these commands:

  • bsub: used to submit jobs to the queuing system
  • bjobs: used to list the jobs being run by the current user
  • bkill: used to stop a job that is running

All of these commands have extensive man pages available at LCLS.

Phil

While psana is configured using its own configuration files, cctbx.xfel itself is driven using Python-based hierarchical interchange language (phil) files, the same format that drives Labelit and PHENIX (though PHENIX calls them .eff files). The format is intuitive and allows easy specification of per-processing run parameters.

A user's psana configuration file will have an entry called xfel_target. This entry will provide a phil filename that contains cctbx.xfel configuration settings. These settings will include thing such as thresholds for determining hits (number of spots on an image, spot brightness cutoff, etc.), unit cell targets for indexing, resolution cutoffs, and so forth.

Technical information regarding phil: [4]

Specific phil files used in this tutorial: phil

cctbx

The computational crystallographic toolbox is a foundational set of python and C++ modules that allow abstraction of the crystallographic experiment. Under continual development, the toolbox provides interfaces for working with crystal models, reflection data, and much more.

Introduction: [5] Homepage: [6]

References

<references/>