Overview

From cctbx_xfel
Jump to navigation Jump to search

LCLS provides several frameworks for data analysis, and also provides methods for translating collected data to HDF5, a general and portable container format for storage of large amounts of numerical data. cctbx.xfel extends the LCLS analysis packages with a set of analysis modules. In particular, the cctbx.xfel analysis modules are run through pyana, the Python-implementation of LCLS's analysis framework. Since analysis proceeds directly from the raw data, no intermediate conversion is necessary, and it can be done while an experiment is running.

cctbx.xfel at LCLS is built on five systems: the CSPAD detector, pyana, LCLS’s queuing system, phil, and fundamentally, cctbx.

The CSPAD detector

Example CSPAD image

The LCLS at full capacity operates at 120 Hz. The incident photon packets are delivered in ≈40 femtosecond wide pulses, each containing ≈1012 photons. This high repetition rate and compact beam delivery time necessitated the construction of a new detector<ref>Hart, P The Cornell-SLAC Pixel Array Detector at LCLS. SLAC Scientific Documents (2012).</ref>, where the work of reading out and streaming recorded data at these high speeds is accomplished through the use of 64 sensors, arranged in a quadrangular pattern around a central hole (in the place of a beam stop). Each of the 4 quadrants, containing 16 of the sensors, is adjustable on rails radially away from the central hole to adjust the size of this hole. Indexing, predicting spot locations using a crystal orientation matrix, and integrating reflection intensities requires precise knowledge of the location of these sensors in three-dimensional space. For this reason, a portion of this tutorial describes the calibration and refinement of the tile metrology.


psana

The LCLS data acquisition systems stream the terabytes of diffraction data collected from the CSPAD detector to container files in XTC format. XTC is a linear, sequential-access file format, where individual images can be recorded rapidly by the file system as they are collected. The programmatic interface to interact with these files at LCLS is psana, a C++/Python-based interface.

psana is driven by configuration files to process frames individually, and is designed with computational parallelization in mind. As each image is independent, processing of each image can be done by separate computer cores. cctbx.xfel uses psana and psana's configuration files to read and process image files stored in XTC format. The user specifies how each image is to be processed in the configuration file, and the passes the configuration file and the path to the XTC streams of interest to cctbx.xfel, which calls psana and submits the job to the queuing system.

For example, if the user wanted to filter an XTC stream for hits, index the hits and then integrate images which successfully indexed, the user would supply a configuration file which specified cctbx.xfel modules that did these tasks, provide options to these modules, and submit the job. Specific details are in the tutorials.

During processing, hits are extracted from the XTC stream and written to separate files for each individual image. At the moment these separate files are in a in a Python-programming language friendly format called pickle format. However, by the end of 2014, cctbx.xfel will be exclusively using CBF and HDF5 formats to output results.

More information about psana: [1]

LCLS queuing system

SLAC maintains several computing clusters available to its users for processing data. While detailed knowledge of their workings isn't required for cctbx.xfel operation, an overview of these systems is provided here: [2]. Specific commands for submitting cctbx.xfel jobs to the cluster are given in the tutorials.

General instructions for submitting batch jobs can be found here: [3]. Of note are these commands:

  • bsub: used to submit jobs to the queuing system
  • bjobs: used to list the jobs being run by the current user
  • bkill: used to stop a job that is running

All of these commands have extensive man pages available at LCLS.

Phil

While psana is configured using its own configuration files, cctbx.xfel itself is driven using Python-based hierarchical interchange language (phil) files, the same format that drives Labelit and PHENIX (though PHENIX calls them .eff files). The format is intuitive and allows easy specification of per-processing run parameters.

A user's psana configuration file will have an entry called xfel_target. This entry will provide a phil filename that contains cctbx.xfel configuration settings. These settings will include thing such as thresholds for determining hits (number of spots on an image, spot brightness cutoff, etc.), unit cell targets for indexing, resolution cutoffs, and so forth.

Technical information regarding phil: [4]

Specific phil files used in this tutorial: phil

cctbx

The computational crystallographic toolbox is a foundational set of python and C++ modules that allow abstraction of the crystallographic experiment. Under continual development, the toolbox provides interfaces for working with crystal models, reflection data, and much more.

Introduction: [5] Homepage: [6]

References

<references/>