IOTA: Difference between revisions
Line 30: | Line 30: | ||
Here, you can also turn on the "random subset" option (which will cause IOTA to only process a random subset of input images) and select the number of processors for multiprocessing needs. Currently, two multiprocessing modes are available (by clicking on the "Preferences" toolbar button): 'multiprocessing' refers to merely using multiple cores on your local machine, while "lsf" will allow you to submit jobs to an LSF queue. The queues can be selected from a drop-down list or, if not found on the list, a queue name can be supplied by user. | Here, you can also turn on the "random subset" option (which will cause IOTA to only process a random subset of input images) and select the number of processors for multiprocessing needs. Currently, two multiprocessing modes are available (by clicking on the "Preferences" toolbar button): 'multiprocessing' refers to merely using multiple cores on your local machine, while "lsf" will allow you to submit jobs to an LSF queue. The queues can be selected from a drop-down list or, if not found on the list, a queue name can be supplied by user. | ||
The toolbar also contains buttons for running the IOTA job ("Run"), converting input raw images into image pickles ("Convert Images") or writing out a text file with absolute paths for images (") | |||
[[File:IOTA Run Screen.png|thumb|left|IOTA run-time statistics display screen]] | [[File:IOTA Run Screen.png|thumb|left|IOTA run-time statistics display screen]] |
Revision as of 22:56, 23 August 2016
IOTA: integration optimization, triage and analysis
IOTA is a user-friendly front end for the cctbx.xfeland DIALS suites of serial diffraction data processing programs. It is comprised of three main modules:
- Raw image import, conversion, pre-processing and triage
- Image indexing and integration using cctbx.xfel (with optimization of spot-finding parameters) or DIALS (this is currently in the process of being adapted for diffraction stills)
- Analysis of the integrated dataset
Please note that IOTA is a front-end for (currently) two pieces of data processing software: cctbx.xfel and DIALS. Therefore, the preferred construction for citation should be something like "diffraction data were processed with IOTA [1] using data reduction algorithms implemented in cctbx.xfel [2] (or DIALS [3])".
[1] IOTA: integration optimization, triage and analysis tool for the processing of XFEL diffraction images. Lyubimov AY, Uervirojnangkoorn M, Zeldin OB, Brewster AS, Murray TD, Sauter NK, Berger JM, Weis WI, Brunger AT. J Appl Crystallogr. 2016 May 11;49(Pt 3):1057-1064
[2] Accurate macromolecular structures using minimal measurements from X-ray free-electron lasers. Accurate macromolecular structures using minimal measurements from X-ray free-electron lasers. Hattne J, Echols N, Tran R, Kern J, Gildea RJ, Brewster AS, Alonso-Mori R, Glöckner C, Hellmich J, Laksmono H, Sierra RG, Lassalle-Kaiser B, Lampe A, Han G, Gul S, DiFiore D, Milathianaki D, Fry AR, Miahnahri A, White WE, Schafer DW, Seibert MM, Koglin JE, Sokaras D, Weng TC, Sellberg J, Latimer MJ, Glatzel P, Zwart PH, Grosse-Kunstleve RW, Bogan MJ, Messerschmidt M, Williams GJ, Boutet S, Messinger J, Zouni A, Yano J, Bergmann U, Yachandra VK, Adams PD, Sauter NK. Nat Methods. 2014 May;11(5):545-8.
[3] Diffraction-geometry refinement in the DIALS framework. Waterman DG, Winter G, Gildea RJ, Parkhurst JM, Brewster AS, Sauter NK, Evans G. Acta Crystallogr D Struct Biol. 2016 Apr;72(Pt 4):558-75.
Running IOTA: GUI Mode
The most user-friendly way to run IOTA is in GUI mode. This starts up simply by issuing
iota
in your command line. First you will see the main input screen, which will allow you to enter basic information, such as the input and output folders (the current folder is automatically designated as output, but can be changed). This version, therefore, will only accept a single path (which must be a folder) as input. A folder with image-containing subfolders will also work as input.
Here, you can also turn on the "random subset" option (which will cause IOTA to only process a random subset of input images) and select the number of processors for multiprocessing needs. Currently, two multiprocessing modes are available (by clicking on the "Preferences" toolbar button): 'multiprocessing' refers to merely using multiple cores on your local machine, while "lsf" will allow you to submit jobs to an LSF queue. The queues can be selected from a drop-down list or, if not found on the list, a queue name can be supplied by user.
The toolbar also contains buttons for running the IOTA job ("Run"), converting input raw images into image pickles ("Convert Images") or writing out a text file with absolute paths for images (")
Running IOTA: Auto Mode
The simplest way to run IOTA is in Auto Mode. To do so, simply issue:
iota.run /path/to/image/files/
The path may contain a tree of folders in any configuration. IOTA will then carry out a conversion step if the source folder contains raw diffraction images. The converted image pickles will be saved in the current folder under the subfolder "converted_pickles". Inside that folder, converted pickles will be saved separately for each IOTA run, under subfolders named "001", "002", "003", etc. Alternatively, once raw images have been successfully converted to image pickles, IOTA can be pointed to the image pickles instead, e.g.:
iota.run ./converted_pickles/001/
Alternatively, if a text file with a list of images exists, IOTA can accept that file as input (IOTA creates the input list automatically and saves it under ./integration/###/input_images.lst):
iota.run ./integration/001/input_images.lst
Once running, IOTA will display a program logo, some information about the configuration of the run and a progress bar for each major step, e.g.:
-bash-4.1$ iota.run converted_pickles/003/ IIIIII OOOOOOO TTTTTTTTTT A II O O TT A A II O O TT A A >------INTEGRATION----OPTIMIZATION--------TRIAGE-------ANALYSIS------------> II O O TT A A II O O TT A A IIIIII OOOOOOO TT A A v1.0.012 with CCTBX.XFEL Tuesday, May 03, 2016. 10:37 AM IOTA will run in AUTO mode using /net/cci-filer2/raid1/home/art/iota/converted_pickles/001: Reading files from data folder -- DONE.....................................0.00s IMPORTING IMAGES: 23% [ ---- ] [ =========> ]
IOTA will automatically create two script files: iota.param (which contains settings for running IOTA) and target.phil (a cctbx.xfel target file), which can be modified by a user to fine-tune various settings. The output will be collected in the folder named "integration", which will contain subfolders for each integration run, titled "001", "002", "003", etc. Each run generates a folder named "final" with the final integrated pickles as well as individual cctbx.xfel logs for each image. Furthermore, lists of files that have been successfully integrated (integrated.lst), failed integration (not_integrated.lst), etc. can be found there. Finally, a pre-populated script for PRIME (prime.phil) can be found there as well. (Currently, the user must manually edit prime.phil to specify the number of residues - "n_residues" - in order to run PRIME successfully.)
Running IOTA: Target Files
IOTA itself is a front-end to the data processing programs cctbx.xfel and DIALS. These programs require their own set of parameters, distinct from IOTA parameters, which are located in so-called "target" files: text files containing parameters encoded in Python-based hierarchical interchange language or PHIL. When run in AUTO mode, IOTA generates an appropriate target file for cctbx.xfel or DIALS using defaults deemed reasonable for most serial crystallography projects. (NOTE: since the DIALS stills indexer remains a work in progress, those defaults may not work very well.) These default target files can also serve as a starting point for the user to modify those settings as they see fit. The user has the option to provide their own target file (perhaps generated during a previous data processing attempt). The user can edit the IOTA settings to specify the target file
cctbx { target = "cctbx.phil" }
or use a command-line argument
iota.run /path/to/image/files/ cctbx.target=user_params.phil
Running IOTA: Script Mode
IOTA can be run using a script file, e.g.:
iota.run script.param
The script contains settings in PHIL format, e.g.:
description = "IOTA run #5, with some modified settings" input = "/path/to/raw/images/" output = "/path/to/iota/output/" cctbx { target = "cctbx.phil" grid_search { type = None *brute_force smart area_median = 5 area_range = 2 height_median = 4 height_range = 2 sig_height_search = False } selection { select_only { flag_on = False grid_search_path = None } min_sigma = 5 select_by = *epv mosaicity prefilter { flag_on = False target_pointgroup = None target_unit_cell = None target_uc_tolerance = None min_reflections = None min_resolution = None } } } . . .
The script can be auto-generated (with an accompanying target.phil file with some default cctbx.xfel settings) via a "dry run" by issuing
iota.run -d
The same "-d" command-line option will print to terminal the full IOTA script file with help statements (which will also be included at the end of this page).
Additionally, IOTA settings can be modified by command-line statements, e.g.:
iota.run script.param cctbx.grid_search.type=smart cctbx.grid_search.area_median=7
Running IOTA: Single-Image Mode
IOTA can accept a single image as input:
iota.run images/img_00001.pickle
Alternatively, IOTA can be run in bare-bones "single-image mode"
iota.single_image images/img_00001.pickle
These options are best for testing purposes.
Running IOTA: Command-line Options
In addition to a command script, IOTA runs can be modified by command-line options:
-h, --help show help message and exit --version Prints version info -l, --list Output a file (input_images.lst) with input image paths and exit -c, --convert Convert raw images to pickles and exit -d, --default Generate default iota.param and target.phil files and exit -p PREFIX, --prefix PREFIX Specify custom prefix for converted pickles (e.g. -p user) -s, --select Selection only, no grid search -r RANDOM Run IOTA with a random subset of images, e.g. "-r 5" -n NPROC Specify a number of cores for a multiprocessor run" --mpi [MPI] Specify stage of process - for MPI only --analyze [ANALYZE] Use for analysis only; specify run number or folder with results
These options can be shown by issuing:
iota.run -h
Perhaps the most useful of these are -r and -n options, as they allow the user to adjust an IOTA run in Auto-mode on the fly. Alternatively, both of these settings can be changed within the script file.
All of the options in the script can be introduced as command-line statements by using a "compressed" PHIL format. Thus:
cctbx { grid_search { type = None *brute_force smart } }
translates into
iota.run script.param cctbx.grid_search.type=brute_force
IOTA Output
Due to IOTA's flexibility, there are several types of output that co-exist simultaneously and can be somewhat disconnected from one another. It helps to think of them as three separate stages of the process: pre-processing, grid-search / integration, and post-processing / analysis.
In pre-processing, raw images are read in and converted to Python pickles. These are saved under the "converted_pickles" folder in the format <prefix>_<run_no>_<#####>.pickle; each cycle of pre-processing is assigned a run number (e.g. "001", "002", "003", etc.). Pre-processing is only triggered if a) the read-in image is not already pickled or b) the image has to be modified in some way (e.g. override beamXY coordinates, change detector distance, etc.). Thus, if converted and modified pickles are submitted to IOTA, the "converted_pickles" folder will not be created. The purpose of this is to allow the user to experiment with image modification, then subsequently select the converted pickles that best fit the user's needs.
The output of the other two steps (grid-search / integration and post-processing / analysis) can be found under the "integration" folder. The grid-search results are saved to the "integration/###/image_objects" folder in the format <filename>.int. These are pickled dictionaries which contain all the information about the individual images (without the pixel values or integrated intensities), such as raw image filename, converted pickle filename, the details of the grid search, etc. These can be used for some of the advanced operations, such as experimentation with the selection process without repeating the grid search.
The integrated pickles are collected under "integration/###/final" folder, in the format int_<filename>.pickle. Only successfully integrated images are saved this way. For each of the input images, however, a log of cctbx.xfel or DIALS output is saved in the same folder, in the format <filename>.log. This log documents each integration attempt from the grid-search with the final integration attempt at the end (for cctbx.xfel) or the linear indexing/integration output (for DIALS) and can be used for troubleshooting.
If the user chooses to output any charts (e.g. grid-search heatmap, beam center plot, image visualization, etc.), these will be found under "integration/###/visualization" folder.
Finally, the "integration/###" folder itself contains text files with lists of images, e.g. input images, all integrated images, all images that failed integration, major clusters from the unit cell-clustering module, etc. The main logfile (iota.log) is also found here, as is the default input file for PRIME (prime.phil).
IOTA Full Script
The following is a full IOTA script file with help lines:
description = Integration Optimization, Transfer and Analysis (IOTA) .help = "Run description (optional)." input = None .help = "Can be a tree with folders" output = None .help = "Base output directory, current directory in command-line, can be set in GUI" image_conversion .help = "Parameters for raw image conversion to pickle format" { rename_pickle_prefix = Auto .help = "Set to None to keep original image filenames and directory tree" convert_only = False .help = "Set to True (or use -c option) to convert and exit" square_mode = None *pad crop .help = "Method to generate square image" beamstop = 0 .help = "Beamstop shadow threshold, zero to skip" distance = 0 .help = "Alternate crystal-to-detector distance (set to zero to leave the same)" beam_center .help = "Alternate beam center coordinates (set to zero to leave the same)" { x = 0 y = 0 } } image_triage .help = "Check if images have diffraction using basic spotfinding (-t option)" { type = None *simple grid_search .help = "Set to None to attempt integrating all images" min_Bragg_peaks = 10 .help = "Minimum number of Bragg peaks to establish diffraction" grid_search .help = "Parameters for the grid search." { area_min = 6 .help = "Minimal spot area." area_max = 24 .help = "Maximal spot area." height_min = 2 .help = "Minimal spot height." height_max = 20 .help = "Maximal spot height." step_size = 4 .help = "Grid search step size" } } cctbx .help = "Options for CCTBX-based image processing" { target = None .help = "Target (.phil) file with integration parameters" grid_search .help = "Parameters for the grid search." { type = None *brute_force smart .help = "Set to None to only use median spotfinding parameters" area_median = 5 .help = "Median spot area." area_range = 2 .help = "Plus/minus range for spot area." height_median = 4 .help = "Median spot height." height_range = 2 .help = "Plus/minus range for spot height." sig_height_search = False .help = "Set to true to scan signal height in addition to spot height" } selection .help = "Parameters for integration result selection" { select_only .help = "set to True to re-do selection with previous" { flag_on = False .help = "set to True to bypass grid search and just run selection" grid_search_path = None .help = "leave as None to use grid search results from previous run" } min_sigma = 5 .help = "minimum I/sigma(I) cutoff for strong spots" select_by = *epv mosaicity .help = "Use mosaicity or Ewald proximal volume for optimal parameter selection" prefilter .help = "Used to throw out integration results that do not fit user-defined unit cell information" { flag_on = False .help = "Set to True to activate prefilter" target_pointgroup = None .help = "Target point group, e.g. P4" target_unit_cell = None .help = "In format of a, b, c, alpha, beta, gamma , e.g. 79.4, 79.4, 38.1, 90.0, 90.0, 90.0" target_uc_tolerance = None .help = "Maximum allowed unit cell deviation from target" min_reflections = None .help = "Minimum integrated reflections per image" min_resolution = None .help = "Minimum resolution for accepted images" } } } dials .help = "This option is not yet ready for general use!" { target = None .help = "Target (.phil) file with integration parameters for DIALS" min_spot_size = 6 .help = "Minimal spot size" global_threshold = 0 .help = "Global threshold" } analysis .help = "Analysis / visualization options." { run_clustering = False .help = "Set to True to turn on hierarchical clustering of unit cells" cluster_threshold = 5000 .help = "threshold value for unit cell clustering" viz = *None integration cv_vectors .help = "Set to cv_vectors to visualize accuracy of CV vectors" charts = False .help = "If True, outputs PDF files w/ charts of mosaicity, rmsd, etc." } advanced .help = "Advanced, debugging and experimental options." { integrate_with = *cctbx dials .help = "Choose image processing software package" estimate_gain = False .help = "Estimates detector gain (sometimes helps indexing)" debug = False .help = "Used for various debugging purposes." experimental = False .help = "Set to true to run the experimental section of codes" random_sample .help = "Use a randomized subset of images (or -r <number> option)" { flag_on = False .help = "Set to run grid search on a random set of images." number = 0 .help = "Number of random samples. Set to zero to select 10% of input." } } n_processors = 32 .help = "No. of processing units" mp_method = *multiprocessing mpi .help = "Multiprocessing method"