File formats

From cctbx_xfel
Revision as of 12:22, 21 July 2017 by Aaron (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page is under construction

Introduction

cctbx.xfel's current native file formats, image pickles and integration pickles, are mainly intermediate file formats useful for debugging and software development. These binary pickle files are serialized python dictionaries optimized for machine readability. The rationale here is that the raw data from an experiment at LCLS is stored in the xtc streams. We process these in memory without needing to write images to disk at all, but provide image pickles if requested for use with the cctbx image viewer for diagnostic purposes. Integration pickles contain integrated intensities and crystal cell and orientation parameters and are read by the merging programs cxi.merge and prime.postrefine.

In an effort to provide more human-readable data files and conform to international standards, we have worked with representatives of ImageCIF to create 64 tile segmented data in CBF format, the same format used in a wide variety of detectors. These CSPAD CBFs are designed to be used with other crystallographic software packages.

Format specifications for these data files are provided below.

Image pickles

cctbx.xfel image pickles are a binary file format containing pixel data and image metadata. Under the hood these are serialized python dictionaries, but for general users it's sufficient to know that the blob of data consists of name/value pairs. The contents of an image pickle can be inspected with this command:

 cxi.print_pickle image.pickle

For example, here's the output from an image with a thermolysin diffraction pattern collected recently:

Printing contents of idx-20141201073601249.pickle
Detector format version: CXI 10.1
DISTANCE 119.002
DETECTOR_ADDRESS CxiDs1-0|Cspad-0
SIZE1 1765
SIZE2 1765
TIMESTAMP 2014-12-01T07:36Z01.249
CCD_IMAGE_SATURATION 90000
SATURATED_VALUE 90000
64 active areas, first one:  [715, 439, 909, 624]
PIXEL_SIZE 0.11
BEAM_CENTER_Y 97.075
MIN_TRUSTED_VALUE -2000
WAVELENGTH 1.75124427107 , converted to eV: 7079.77687909
xtal_target None
BEAM_CENTER_X 97.075
SEQUENCE_NUMBER 0
DATA len=3115225 max=106065.000000 min=-104526.000000 dimensions=(1765, 1765)
cxi_versioned_extract()::cxi_version: CXI 10.1
64 translated active areas, first one:  [725, 449, 917, 632]

Each of the lines here is a name/value pair of data or metadata for the image. LABELIT, DIALS, cctbx.image_viewer, are all programs capable of processing these files directly. Here's a description of each of the name/value pairs:

  • Detector format version: CXI 10.1. Technically this value is not stored in the image pickle. The detector format version is used internally by cctbx.xfel for data collected up to LCLS run 11, and is a way of looking up tile corrections stored in the software. After run 11, this information should be stored in the phil file used to processes the data. The detector format version is looked up based on the detector address and event timestamp. All known detector format versions can be displayed with the command cxi.detector_format_versions.
  • DISTANCE: detector distance (mm)
  • DETECTOR_ADDRESS: string to identify the detector in the XTC stream. Recorded here as a unique identifier for the data when coupled with the event timestamp.
  • SIZE1 and SIZE2: image dimensions (pixels)
  • TIMESTAMP: unique time stamp for this event in the form year-month-dayThour:minuteZsecond.microsecond
  • CCD_IMAGE_SATURATION and SATURATED_VALUE: overload value
  • PIXEL_SIZE: pixel size in mm
  • BEAM_CENTER_X, BEAM_CENTER_Y: beam center, specified as positive mm offsets from the origin of the detector (upper-left corner)
  • MIN_TRUSTED_VALUE: underload
  • WAVELENGTH: wavelength as recorded in the XTC stream (angstroms)
  • DATA: pixel data. Some vital statistics are shown.
  • xtal_target and SEQUENCE_NUMBER: unused
  • 64 active areas: the active areas are coordinates that specify where the 64 tiles are in the image. They are in the form of quartets of numbers [x1,y1,x2,y2], one quartet for each of the 64 tiles. The first active area is shown. Note, for non-CSPAD image pickles, there will only be one active area and it will be the size of the image entire.
  • 64 translated active areas: if corrections were available for this image, the first corrected active area is shown here.

A note on coordinate systems

Internally, our software uses the ImageCIF coordinate system with the exception that no goniometer is typically available at XFELs, therefore Z is along the beam (sample to source), Y points against gravity and X completes a right-handed coordinate system. In the image pickles, the origin of the detector pixel readout is assumed to be in the upper left of the image, therefore the beam center is specified as positive offsets from that point. In example above, assuming the sample is at the laboratory origin, the vector pointing to the beam center on the detector in the ImageCIF coordinate system would be (0, 0, -119.002). A vector from the laboratory origin to the pixel readout origin (upper-left of the detector) would be -97.075, 97.075, -119.002). dxtbx.print_header is a useful tool to get these vectors from any file format understood by cctbx.xfel. For more information see Parkhurst et al, 2014.

Creating image pickles from xtc streams

The program cctbx.xfel.xtc_dump can be used to convert xtc streams to image pickle format. Here we provide an example for the Rayonix detector that is often found at XPP and MFX. The psana module mod_image_dict is used to read the data and generate the images (example: mod_image_dict.cfg). After setting up a release directory as described in Setup:

 cd ~/myrelease # or wherever your release is
 sit_setup
 cctbx.xfel.xtc_dump experiment=mfxm9316 run_num=14 address="MfxEndstation-0|Rayonix-0" file_format=pickle cfg=mod_image_dict.cfg 

This will fill your working folder with image pickles. You can change your output directory with output.output_dir and you can limit the number of images saved with dispatch.max_events.

This approach will also work for CSPAD images. Here, you must add calib_dir to your config file, as described in the indexing tutorials.

Integration pickles

Integration pickles are the output of cctbx.xfel indexing and integration. They are like image pickles in that they are serialized python dictionaries and can be inspected with cxi.print_pickle. They are the primary input to cxi.merge and prime.run.

CSPAD CBFs

CSPAD CBFs are used while processing CSPAD images using the DIALS backend of cctbx.xfel. The complete specification for CSPAD CBFs is laid out in an article in the Computational Crystallographic Newsletter: "XFEL Detectors and ImageCIF", Computational Crystallography Newsletter 5, 19-24. (Reprint). Documentation for using DIALS to index and integrate CSPAD data can be found in an article in the Computational Crystallographic Newsletter: "Processing XFEL data with cctbx.xfel and DIALS", Computational Crystallography Newsletter 7, 32-53 (Reprint).

Creating CSPAD CBFs from XTC streams

cctbx.xfel.xtc_dump is useful for converting XTC streams to CBF files as needed. Use -c to get a listing of all options, and -c -a 2 to get a full listing with help strings. Example command:

 cd ~/myrelease # or wherever your release is
 sit_setup
 cctbx.xfel.xtc_dump dispatch.max_events=1 input.experiment=xpptut15 input.address=cspad \
   input.run_num=54 format.file_format=cbf output.output_dir=xpptut15/out \
   format.cbf.detz_offset=100 input.override_energy=7000

Here, one image is created from experiment xpptut15, run 54. You can display the image with cctbx.image_viewer xpptut15/out/*.cbf. Note the pink gaps between the tiles. This results from the segmented nature of the CSPAD, preserved in the CBF file.

xpptut15 is data from XPP's CSPAD and was collected without xray's on. Hence format.cbf.detz_offset=100 input.override_energy=7000 are set to fake values in this command.

Converting from SLAC's metrology to CBF

The tile positions of the hierarchical CSPAD detector are specified by SLAC in a geometry file in the calib folder of each experiment. Typically this file is named something like 0-end.data. If desired, this file can be converted to just the human readable header portion of a CSPAD cbf using the cctbx.xfel command cxi.slaccalib2cbfheader. Example:

 cxi.slaccalib2cbfheader metrology_file=0-end.data out=tmp.cbf

This cbf header can be converted back to SLAC format using cxi.cbfheader2slaccalib:

 cxi.cbfheader2slaccalib cbf_header=tmp.cbf out_metrology_file=tmp.data

Displaying metrology files

cxi.display_metrology <filename> can be used to show a plot of tile positions. It accepts SLAC metrology files, CSPAD CBFs and image pickles.