IOTA: Difference between revisions
(105 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= ''IOTA'': '''i'''ntegration '''o'''ptimization, '''t'''riage and '''a'''nalysis = | = ''IOTA'': '''i'''ntegration '''o'''ptimization, '''t'''riage and '''a'''nalysis = | ||
''IOTA'' is a user-friendly front end for | ''IOTA'' is a user-friendly front end for <code>dials.stills_process</code>: a serial diffraction data processing program. ''IOTA'' is comprised of three main modules: | ||
# Raw image import | # Raw image import, pre-processing and triage | ||
# Image indexing and integration using | # Image indexing, lattice model refinement, and integration using <code>dials.stills_process</code> | ||
# Analysis of the integrated dataset | # Analysis of the integrated dataset | ||
''IOTA'' can be run as a GUI or from the command-line; scripts can be used for both, interchangeably. The GUI has the advantage of displaying useful statistics; it can also be run in "monitor mode" during live data collection, during which the program will wait for new images to be written into the specified input folder. The command-line mode is useful if the program is run remotely on servers that do not, for some reason, support graphics. | ''IOTA'' can be run as a GUI or from the command-line; scripts can be used for both, interchangeably. The GUI has the advantage of displaying useful statistics; it can also be run in "monitor mode" during live data collection, during which the program will wait for new images to be written into the specified input folder. The command-line mode is useful if the program is run remotely on servers that do not, for some reason, support graphics. | ||
Please note that ''IOTA'' is a front-end for | Please note that ''IOTA'' is a front-end for other processing software. Therefore, the preferred construction for citation should be something like "data were processed with ''IOTA'' [1] using serial diffraction data reduction algorithms implemented in ''DIALS'' [2, 3]". | ||
[1] [http://www.ncbi.nlm.nih.gov/pubmed/27275148 ''IOTA'': integration optimization, triage and analysis tool for the processing of XFEL diffraction images.] Lyubimov AY, Uervirojnangkoorn M, Zeldin OB, Brewster AS, Murray TD, Sauter NK, Berger JM, Weis WI, Brunger AT. J Appl Crystallogr. 2016 May 11;49(Pt 3):1057-1064 | [1] [http://www.ncbi.nlm.nih.gov/pubmed/27275148 ''IOTA'': integration optimization, triage and analysis tool for the processing of XFEL diffraction images.] Lyubimov AY, Uervirojnangkoorn M, Zeldin OB, Brewster AS, Murray TD, Sauter NK, Berger JM, Weis WI, Brunger AT. J Appl Crystallogr. 2016 May 11;49(Pt 3):1057-1064 | ||
[2] [ | [2] [https://www.ncbi.nlm.nih.gov/pubmed/29533234 DIALS: implementation and evaluation of a new integration package.] Winter, G., Waterman, D. G., Parkhurst, J. M., Brewster, A. S., Gildea, R. J., Gerstel, M., Fuentes-Montero, L., Vollmar, M., Michels-Clark, T., Young, I. D., Sauter, N. K. & Evans, G. (2018). Acta Cryst. D74, 85-97. | ||
[3] [ | [3] [https://www.ncbi.nlm.nih.gov/pubmed/30198898 Improving signal strength in serial crystallography with DIALS geometry refinement.] Brewster, A. S., Waterman, D. G., Parkhurst, J. M., Gildea, R. J., Young, I. D., O'Riordan, L. J., Yano, J., Winter, G., Evans, G. & Sauter, N. K. (2018). Acta Cryst. D74, 877-894. | ||
= ''IOTA'' GUI = | = ''IOTA'' GUI = | ||
Line 34: | Line 31: | ||
iota -h | iota -h | ||
== | == Main Window == | ||
[[File: | [[File:Iota_main_screen_08022019.png|thumb|right|''IOTA'' main input screen]] | ||
First you will see the main input screen, which will allow you to enter basic information, such as the output folder and the project description. The input file list has "Add Folder" and "Add File" buttons, which allow you to input multiple sources of data: individual diffraction images, folders with diffraction images (or subfolders, etc.), and text files containing lists of paths to diffraction images (absolute paths work best here). As each item is added, a line is generated showing the number of images therein, as well as "actions" that can be taken. The entry can be deleted from the list (middle button), or the images can be viewed using an image viewer (left button, with a diffraction icon). "IOTA" would launch the image viewer appropriate to the backend selected from the "Integrate with" dropdown menu; thus either "DIALS" or "cctbx" image viewer would open. | |||
As entries are added, a total number of read-in images is reported in the lower right corner. Once all the inputs are read in, the user can customize their ''IOTA'' run by changing the various preferences and options. | As entries are added, a total number of read-in images is reported in the lower right corner. Once all the inputs are read in, the user can customize their ''IOTA'' run by changing the various preferences and options. | ||
Line 44: | Line 41: | ||
=== GUI Preferences === | === GUI Preferences === | ||
[[File:iota_proc_settings_screen_08022019.png|thumb|right|''IOTA'' Settings dialog]] | |||
The Preferences toolbar button opens a dialog which allows the user to set some settings for the ''IOTA'' GUI, among them the choice of the multiprocessing method, monitor mode options, etc. | |||
Currently, three queueing modes are available (by clicking on the "Preferences" toolbar button): 'mpi' allows you to submit jobs using the MPI protocol, 'lsf' will allow you to submit jobs to an LSF queue (in use at LCLS), while 'torq' refers to the queue set up at SSRL's processing servers (this one is under construction). The queues can be selected from a drop-down list or, if not found on the list, a queue name can be supplied by user. If a different queueing protocol is present or if custom submission script is desired, a custom submit and kill commands can be entered into <code>Submit Command</code> and <code>Kill Command</code> fields | |||
=== Processing Options === | |||
The main screen contains two settings buttons: <code>IOTA Settings</code>, which opens a dialog with processing options that apply to the ''IOTA'' front end, and <code>Backend Settings</code>, which opens a pop-up menu that leads to dialogs with options for spotfinding, indexing, lattice refinement, integration, etc. for the processing backend (in this case <code>dials.stills_process</code>). There is some overlap between IOTA and backend options, which will probably diminish in the future. It is unlikely that any modification of the backend settings is necessary. We have tried to supply reasonable default presets for <code>dials.stills_process</code>; however, optimization of data processing algorithms is still a work in progress and advanced users may find it necessary to play with some of the more obscure parameters. At the moment, we encourage users to contact ''IOTA'' developers with any questions. | |||
=== | === Analysis Options === | ||
The | [[File:iota_run_screen_08022019.png|thumb|right|''IOTA'' run-time statistics display screen]] | ||
The analysis dialog allows you to output various charts summarizing ''IOTA'' output as well as individual image integration results. Most of the charts are a remnant of the older, command-line version of IOTA; they have been superseded by the charts shown in the run-time GUI. However, users who desire an in-depth, image-by-image look into their data processing, can turn on these features. They include: 1. Integration predictions overlayed on diffraction images; 2. Plots of lattice model shifts during refinement; 3. Mosaicity "trumpet" plots. Most of these are generated by the ''cctbx.xfel'' and ''DIALS'' backends. WARNING: The generation of these plots may slow down your ''IOTA'' run! | |||
Also note: the unit cell clustering option is off by default, as the module seems to conflict with some installations of the ''cctbx'' suite of software. If the user doesn't turn the clustering option on, it can be initiated after the processing run is concluded (see below). | |||
== Run Statistics and Analysis == | == Run Statistics and Analysis == | ||
[[File: | [[File:iota_analysis_screen_08022019.png|thumb|right|''IOTA'' end-run analysis display screen]] | ||
Once ''IOTA'' is running, a run-time processing window will appear with two tabs: a Log tab that will display ''iota.log'' as it is updated in real time, and the Charts tab, which will display several useful graphs: of resolution vs. frame, number of strong (I / sigI > threshold) spots per frame, a histogram of unit cell parameters, a plot of indices with measurements, and a bar chart breaking down indexing / integration success for the full dataset. The processing window will also allow the user to turn on the "Monitor Mode", in which ''IOTA'' will continuously check if any new diffraction images have been added to the input folder (or subfolders therein); this is a useful mode to use when running ''IOTA'' concurrently with data collection. | Once ''IOTA'' is running, a run-time processing window will appear with two tabs: a Log tab that will display ''iota.log'' as it is updated in real time, and the Charts tab, which will display several useful graphs: of resolution vs. frame, number of strong (I / sigI > threshold) spots per frame, a histogram of unit cell parameters, a plot of indices with measurements, and a bar chart breaking down indexing / integration success for the full dataset. The processing window will also allow the user to turn on the "Monitor Mode", in which ''IOTA'' will continuously check if any new diffraction images have been added to the input folder (or subfolders therein); this is a useful mode to use when running ''IOTA'' concurrently with data collection. | ||
Line 82: | Line 79: | ||
Once running, ''IOTA'' will display a program logo, some information about the configuration of the run and a progress bar for each major step, e.g.: | Once running, ''IOTA'' will display a program logo, some information about the configuration of the run and a progress bar for each major step, e.g.: | ||
% iota.run hewl_Br_data | |||
IIIIII OOOOOOO TTTTTTTTTT A | IIIIII OOOOOOO TTTTTTTTTT A | ||
II O O TT A A | II O O TT A A | ||
II O O TT A A | II O O TT A A | ||
>------INTEGRATION----OPTIMIZATION--------TRIAGE-------ANALYSIS------------> | >------INTEGRATION----OPTIMIZATION--------TRIAGE-------ANALYSIS------------> | ||
II O O TT A A | II O O TT A A | ||
II O O TT A A | II O O TT A A | ||
IIIIII OOOOOOO TT A A v1. | IIIIII OOOOOOO TT A A v1.4.006 | ||
Interpreting input -- DONE.................................................0.30s | |||
Initializing run parameters -- DONE........................................0.01s | |||
PROCESSING: 12% [ - ] [ ====> ] est: 3819s | |||
''IOTA'' will | ''IOTA'' will put all of the output into the folder named <code>integration</code>, which will contain subfolders for each integration run, titled "001", "002", "003", etc. Each run generates a folder named "final" with the final integrated pickles as well as individual ''cctbx.xfel'' logs for each image. Furthermore, lists of files that have been successfully integrated (integrated.lst), failed integration (not_integrated.lst), etc. can be found there. The ''IOTA'' script for this run (<code>iota_r#.param</code>), a backend script (<code>target.phil</code>), and a pre-populated script for ''PRIME'' (prime.phil) can be found there as well. These can be modified by the user for future runs, and/or read them into the ''IOTA'' or ''PRIME'' GUIs. (GUI-based runs also generate the same exact output, and can be run as scripts in command-line mode.) | ||
== Target Files == | == Target Files == | ||
''IOTA'' itself is a front-end to the data processing | ''IOTA'' itself is a front-end to the data processing program <code>dials.stills_process</code>. This program requires its own set of parameters, distinct from ''IOTA'' parameters, which are located in so-called "target" files: text files containing parameters encoded in Python-based hierarchical interchange language or PHIL. When run in AUTO mode, ''IOTA'' generates an appropriate target file using defaults deemed reasonable for most serial crystallography projects. This default target file can also serve as a starting point for the user to modify those settings as they see fit. The user has the option to provide their own target file (perhaps generated during a previous data processing attempt). The user can edit the ''IOTA'' settings to specify the target file | ||
cctbx { | cctbx { | ||
target = " | target = "user_params.phil" | ||
} | } | ||
Line 122: | Line 116: | ||
The script contains settings in PHIL format, e.g.: | The script contains settings in PHIL format, e.g.: | ||
description = "IOTA | description = "IOTA parameters auto-generated on Friday, Aug 02, 2019. 04:40 PM" | ||
input = "/ | input = "/Users/art/Science/iota_tutorial/hewl_Br_data" | ||
output = "/ | output = "/Users/art/Science/iota_tutorial" | ||
data_selection { | |||
image_triage { | |||
flag_on = True | |||
minimum_Bragg_peaks = 10 | |||
strong_sigma = 5 | |||
} | |||
image_range { | |||
flag_on = False | |||
range = None | |||
} | } | ||
random_sample { | |||
flag_on = False | |||
number = 0 | |||
} | } | ||
} | } | ||
Line 162: | Line 146: | ||
Additionally, IOTA settings can be modified by command-line statements, e.g.: | Additionally, IOTA settings can be modified by command-line statements, e.g.: | ||
iota.run script.param | iota.run script.param data_selection.image_range.flag_on=True data_selection.image_range.range=1-100 | ||
== Single-Image Mode == | == Single-Image Mode == | ||
Line 181: | Line 165: | ||
In addition to a command script, IOTA runs can be modified by command-line options: | In addition to a command script, IOTA runs can be modified by command-line options: | ||
-h, --help | -h, --help show this help message and exit | ||
--version | --version Prints version info of IOTA | ||
- | -d, --default Generate default settings files and stop | ||
- | --ha14 Run IOTA with old HA14 backend | ||
- | --random RANDOM Size of randomized subset, e.g. "--random 10" | ||
--range [RANGE] Range of images, e.g."--range 1-5,25,200-250" | |||
-n NPROC, --nproc NPROC | |||
Specify a number of cores for a multiprocessor run" | |||
-n NPROC | --analyze [ANALYZE] Use for analysis only; specify run number or folder | ||
--tmp TMP Path to temp folder | |||
--analyze [ANALYZE] | --silent Run IOTA in silent mode | ||
These options can be shown by issuing: | These options can be shown by issuing: | ||
iota.run -h | iota.run -h | ||
All of the options in the script can be introduced as command-line statements by using a "compressed" PHIL format. Thus: | All of the options in the script can be introduced as command-line statements by using a "compressed" PHIL format. Thus: | ||
cctbx_xfel { | |||
target_space_group = P422 | |||
} | } | ||
translates into | translates into | ||
iota.run script.param cctbx. | iota.run script.param cctbx_xfel.target_space_group=P422 | ||
= Output = | |||
Due to ''IOTA's'' flexibility, there are several types of output that co-exist simultaneously and can be somewhat disconnected from one another. It helps to think of them as three separate stages of the process: pre-processing, integration, and post-processing / analysis. | |||
In pre-processing, raw images are read in and converted to Python pickles. These are saved under the <code>converted_pickles</code> folder in the format <prefix>_<run_no>_<#####>.pickle; each cycle of pre-processing is assigned a run number (e.g. "001", "002", "003", etc.). Pre-processing is only triggered if a) the read-in image is not already pickled or b) the image has to be modified in some way (e.g. override beamXY coordinates, change detector distance, etc.). Thus, if converted and modified pickles are submitted to IOTA, the "converted_pickles" folder will not be created. The purpose of this is to allow the user to experiment with image modification, then subsequently select the converted pickles that best fit the user's needs. (<b>NOTE:</b> conversion to Python pickle format is only necessary for the deprecated "HA14" processing backend. The new <code>dials.stills_process</code> program can read any image format directly and doesn't require format conversion. Likewise, spotfinding parameter grid search is only applicable to the "HA14" backend and is not used with <code>dials.stills_process</code>.) | |||
The output of the other two steps (integration and post-processing / analysis) can be found under the <code>integration</code> folder. It mostly contains a lot of files that are necessary for IOTA to keep track of the processed data, including a JSON dictionary file <code>proc.info</code> that contains all the information about the current processing run. The integrated pickles are collected under <code>integration/###/final</code> folder, in the format int_<filename>.pickle. Only successfully integrated images are saved this way. For each of the input images, however, a log of the backend output is saved in the <code>logs</code> folder, in the format <filename>.log. This file is not for the faint of heart, as it contains the raw output of <code>dials.stills_process</code> and can be used for troubleshooting by advanced users. | |||
If the user chooses to output any charts (e.g. beam center plot, image visualization, etc.), these will be found under <code>integration/###/visualization</code> folder. | |||
Finally, the <code>integration/###</code> folder itself contains text files with lists of images, e.g. input images, all integrated images, all images that failed integration, major clusters from the unit cell-clustering module, etc. The main logfile (<code>iota.log</code>) is also found here, as are the script files for IOTA (<code>iota_r#.param</code>) and the backend (<code>target.phil</code>), and the default input file for ''PRIME'' (prime.phil). | |||
= ''IOTA'' Tutorial = | |||
== Sample Dataset: Good Ol' Lysozyme == | |||
For the purposes of this tutorial we will use serial diffraction data collected from good ol' hen egg-white lysozyme (HEWL) crystals using synchrotron radiation. I know, I know, you're tired of lysozyme, but before we tackle the Grand Problems of Crystallography (tm), we'll use a good, reliable dataset to learn how to use this software. Onward! | |||
== Obtaining the sample data and setting up the environment == | |||
1. Make sure the ''cctbx.xfel'' is installed and available on your configuration. The software is available with the latest ''Phenix'' distribution. As ''cctbx.xfel'' is under active and vigorous development, make sure you install the latest ''Phenix'' nightly build to obtain the latest version of ''cctbx.xfel''. | |||
2. Create a new directory (e.g. <code>iota_tutorial</code>) in your user space, to ensure that you'll have read-write permissions. Go into that folder. From now on, all the files will be written there. | |||
3. Download the [http://smb.slac.stanford.edu/~templates/sample_serial_data/HEWL_synch_serial.tar.gz compressed tarball containing diffraction images]. | |||
4. Once the download is complete (it might take a while), create a subfolder in your <code>iota_tutorial</code> folder called <code>images</code>, move the tarball there and issue: | |||
gunzip HEWL_synch_serial.tar.gz | |||
and then | |||
tar -xvf HEWL_synch_serial.tar | |||
which will decompress 324 diffraction images. Make sure to then delete <code>HEWL_synch_serial.tar</code> | |||
== Using ''IOTA'' GUI == | |||
=== Reading in the data and making a beamstop shadow mask === | |||
1. In your terminal window, make sure you're in the folder where you want to run ''IOTA'' and issue | |||
iota | |||
to open the main input window. It should look like this: | |||
[[File:Iota_tutorial_1_08022019.png|480px|Tutorial Fig. 1: ''IOTA'' main input screen]] | |||
NOTE: All the images in this tutorial have been generated by running ''IOTA'' on Mac OS X 10.14.5. On other systems, the graphic elements of the UI may look somewhat different, but the general look and feel, as well as functionality, should be the same. | |||
2. Our data are located entirely in a single <code>images</code> folder. You can read in the images via the input controls under the input window; this can be done either by entering in the full path of the data folder or files into the "Input Path" text box (you can use wild-cards here, e.g. <code>/folder/path/*.cbf</code>) or by clicking the "Browse..." button. Click the "Browse..." button now. A pop-up menu will appear; select "Browse folders..." This will open a standard folder dialog. | |||
3. Select the <code>images</code> folder. The absolute path will now appear in the input window, with an associated image count (which should be 324 in this case) on the left, and three "action" buttons (image viewer, delete item, information) on the right: | |||
[[File:Iota_tutorial_2_08022019.png|480px|Tutorial Fig. 2: Reading in diffraction images]] | |||
3a. OPTIONAL: Exit ''IOTA'' and issue | |||
iota ./images/ | |||
Neat, huh? ''IOTA'' has several command-line options that can be useful as shortcuts. Use the <code>-h</code> option to see which other options are available. The list should be growing as we add more options in the near future. | |||
4. Now we will make a mask for the beamstop shadow using the [https://dials.github.io/documentation/programs/dials_image_viewer.html ''DIALS'' image viewer]. | |||
4a. Locate the "action" buttons to the right of the image folder item; click the button that looks like a tiny diffraction image. This will launch a dialog asking you to select how many images from the dataset you'd like to display in the image viewer. | |||
[[File:Iota_tutorial_3_08022019.png|480px|Tutorial Fig. 3: Launching the image viewer]] | |||
4b. Select <code>First 1 image</code> and press OK. The ''DIALS'' image viewer will launch next (in some cases, especially if you're trying to display hundreds of images, it may take a while to load) and present the image you've selected. | |||
4c. Locate the <code>Actions</code> submenu in the main menu, and select <code>Show Mask Tool</code>. This will open up the mask tool dialog, which will allow you to draw the beamstop shadow mask. I used a circle and a separate polygon to circumscribe the beamstop shadow in the image, like this: | |||
[[File:Iota_tutorial_4_05172018.png|480px|Tutorial Fig. 4: Making beamstop shadow mask in ''DIALS'' image viewer]] | |||
4d. Click on "Save Mask", which will save a mask.pickle file in your <code>iota_tutorial</code> directory. You can close the ''DIALS'' image viewer now so as not to clutter your screen. | |||
5. In order to use the mask, we have to read it in. Click on the <code>IOTA Settings...</code> button, which will open the IOTA Settings dialog. You will see a lot of options, but we're only going to import the mask. Under <code>Beamstop Mask</code> click on the <code>(...)</code> button, select the <code>mask.pickle</code> file and click OK. The full path to the mask file should now show up in the dialog, like this: | |||
[[File:Iota_tutorial_5_08022019.png|480px|Tutorial Fig. 5: Import Options with beamstop mask file specified]] | |||
=== Running with defaults === | |||
6. The data processing backend for IOTA is <code>dials.stills_process</code> [[http://viper.lbl.gov/cctbx.xfel/index.php/2017_dials.stills_process]], which is implemented in 'cctbx.xfel' . It is possible to run an older backend (here referenced as "HA14", Fig. 6). The option to select the processing backend is found in the IOTA Settings Dialog under <code>Advanced</code> | |||
[[File:Iota_tutorial_6_08022019.png|480px|Tutorial Fig. 6: Select backend]] | |||
Keep the backend at <code>cctbx_xfel</code> for the purposes of this tutorial, as the ''HA14''-based backend is deprecated and will probably crash, anyway. | |||
7. A practice that has worked really well, especially for new data with unknown crystal parameters, is to run ''IOTA'' with defaults at first, and then re-run it using information gleaned from the analysis of the first run. To run IOTA, click on the big green <code>RUN IOTA</code> button in the lower right corner. A new window should open, which will display processing results as they arrive. | |||
7a. OPTIONAL: Before starting ''IOTA'', specify the number of cores you want to use for the processing run. The processing is very parallelizable, given that each diffraction image will be indexed and integrated independently of the others. Thus, the more cores you use, the faster the process will go. By default, ''IOTA'' sets the number of cores to 3/4 of those available; thus, if you're running this on a MacBook Pro with a 2.9GHz Intel Core i7 CPU, you will have access to 8 cores and the default value will be 6. To adjust the number of cores as desired, click on the <code>Preferences</code> toolbar button, then change the value under <code>No. Processors</code>. | |||
[[File:Iota_tutorial_7_08022019.png|480px|Tutorial Fig. 7: Run Window]] | |||
Top to bottom (and left to right as appropriate), the charts are: 1. Resolution per frame, 2. Number of strong (I/sigma > 5) spots per frame, 3. Wilson B-factor histogram, 4. Chart of indices with measurements (with color designating redundancy), 5. Processing summary (color-coded for different outcomes). | |||
8. Try clicking on individual points on the resolution / strong spots charts: a filename should appear in a text control above the charts, and clicking the diffraction pattern button should open that image in the <code>dials.image_viewer</code>. Does the integration result for that image correlate with what you see? | |||
9. Try double-clicking on the orange "failed indexing" bar on the summary chart. A dialog should pop up prompting you to select images for viewing; select <code>all images</code> and inspect them. Does it make sense why they were not indexed? | |||
10. When the run is complete, the Analysis tab will be added to the Log and Charts tabs. It contains a bunch of useful information and allows the user to pop open a few useful charts. At this point we are pretending that we do not know the crystal parameters of the system we're studying, and ''IOTA'' provides us with some preliminary information: | |||
[[File:Iota_tutorial_8_08022019.png|480px|Tutorial Fig. 8: Analysis of completed processing run]] | |||
11. While ''IOTA'' reports the consensus Bravais lattice determination of P422, it may not necessarily be correct, and the unit cell parameters are averages, which may be skewed: ''DIALS'' indexes each image separately, which causes the lattice model to be determined with relatively poor accuracy and precision; not all the images would be indexed in the tetragonal lattice, nor would the unit cell parameters be the same (or even close!) for each one. Therefore, we need to obtain more granular information by using unit cell clustering. To do so, click the <code>Run Clustering</code> button. (Post-processing clustering can also be run automatically after processing: turn it on in the <code>Analysis</code> box in the IOTA Settings dialog.) If the dataset contains many images, this step may take a while. For enormous datasets (10,000+ images), it's recommended to select a smaller subset for this analysis. The results will be displayed in the same window: | |||
[[File:Iota_tutorial_9_08022019.png|480px|Tutorial Fig. 9: Unit cell clustering]] | |||
Note that not all images indexed in the tetragonal lattice! Some are orthorhombic. Others are tetragonal, but with a different set of unit cell parameters. The majority of the indexed images, however, are consistent with the known parameters of the tetragonal hen egg-white lysozyme. Which is a relief. | |||
=== Re-processing with new information === | |||
12. Go ahead and close ''IOTA''. Don't worry, the information here is not gone! For starters, you can peruse the file <code>./iota_tutorial/integration/001/iota.log</code> for image-by-image information on processing and its results. Also, for the brave and foolish, the <code>./iota_tutorial/integration/001/logs/</code> folder will contain detailed ''DIALS'' logs for every image. But for casual users, the GUI offers something even easier: you can "recover" previous runs by clicking the <code>Recover</code> toolbar button and selecting the run you're interested in. So open ''IOTA'' again, click on the <code>Recover</code> button and select the single run: | |||
[[File:Iota_tutorial_10_08022019.png|480px|Tutorial Fig. 10: Recovering the previous run]] | |||
NOTE: if you don't want to open the full run, but only want to recover its settings, you can select "settings only" in the drop down menu in the Recovery Dialog. Otherwise, "everything" is the default, and it will open the processing window, with all the charts and (if the run was completed) with the analysis tab. If the run was aborted or terminated for any other reason, the results so-far will be displayed and you'll have the option to continue the processing where it left off. | |||
13. Now that we've recovered our run, we actually don't need the processing window - but make sure you take down the Bravais lattice and unit cell information: you will need them. | |||
14. Click the <code>IOTA Settings</code> button. In the IOTA Settings dialog find the <code>Processing Options</code> box; under <code>Target Space Group</code> supply the space group (here, let's use the Bravais lattice P422 we determined in the previous run); put unit cell parameters (79.2 79.2 38.1 90 90 90) into the <code>Target Unit Cell</code> text box: | |||
[[File:Iota_tutorial_11_08022019.png|480px|Tutorial Fig. 11: Customizing the processing options]] | |||
15. Close the IOTA Settings dialog and click <code>RUN IOTA</code> again. A new processing window should open, displaying processing results obtained using the new information on our dataset. Does this look obviously different from the previous run? | |||
The | 16. The graph of indices with measured reflections shows many gaps, which would imply that completeness is lacking in this dataset. However, by default the chart is displayed in P1 symmetry. If you check the <code>Space Group</code> checkbox, the Bravais lattice that you supplied in the settings will be applied (P422 in our case, P1 if no lattice information was provided): | ||
[[File:Iota_tutorial_12_05182018.png|480px|Tutorial Fig. 12: Index chart with and without space group information]] | |||
17. That looks better, doesn't it? In fact, you can check out the completeness in ''any'' space group imaginable. Try setting the space group at P 43 21 1, which is the correct space group for tetragonal lysozyme. The completeness should look even better. | |||
18. And now we're done! The information on the Analysis Tab should show significant improvement from the last run. The clustering should return a single cluster now. And you are ready for the next step, which is to run [[cctbx.prime]] to scale, post-refine, and merge the processed data. |
Latest revision as of 00:25, 3 August 2019
IOTA: integration optimization, triage and analysis
IOTA is a user-friendly front end for dials.stills_process
: a serial diffraction data processing program. IOTA is comprised of three main modules:
- Raw image import, pre-processing and triage
- Image indexing, lattice model refinement, and integration using
dials.stills_process
- Analysis of the integrated dataset
IOTA can be run as a GUI or from the command-line; scripts can be used for both, interchangeably. The GUI has the advantage of displaying useful statistics; it can also be run in "monitor mode" during live data collection, during which the program will wait for new images to be written into the specified input folder. The command-line mode is useful if the program is run remotely on servers that do not, for some reason, support graphics.
Please note that IOTA is a front-end for other processing software. Therefore, the preferred construction for citation should be something like "data were processed with IOTA [1] using serial diffraction data reduction algorithms implemented in DIALS [2, 3]".
[1] IOTA: integration optimization, triage and analysis tool for the processing of XFEL diffraction images. Lyubimov AY, Uervirojnangkoorn M, Zeldin OB, Brewster AS, Murray TD, Sauter NK, Berger JM, Weis WI, Brunger AT. J Appl Crystallogr. 2016 May 11;49(Pt 3):1057-1064
[2] DIALS: implementation and evaluation of a new integration package. Winter, G., Waterman, D. G., Parkhurst, J. M., Brewster, A. S., Gildea, R. J., Gerstel, M., Fuentes-Montero, L., Vollmar, M., Michels-Clark, T., Young, I. D., Sauter, N. K. & Evans, G. (2018). Acta Cryst. D74, 85-97.
[3] Improving signal strength in serial crystallography with DIALS geometry refinement. Brewster, A. S., Waterman, D. G., Parkhurst, J. M., Gildea, R. J., Young, I. D., O'Riordan, L. J., Yano, J., Winter, G., Evans, G. & Sauter, N. K. (2018). Acta Cryst. D74, 877-894.
IOTA GUI
The most user-friendly way to run IOTA is in GUI mode. This starts up simply by issuing
iota
As a shortcut, IOTA GUI can be launched with an existing script supplied as a command-line argument, like this
iota iota.param
If that is done, the elements of IOTA GUI will be populated with the parameters specified in the script. Also available as command-line arguments: the path to the data folder / file, turn on monitor mode, supply the number of processors for the multiprocessing run. New options are added all the time; check which options may be available by issuing
iota -h
Main Window
First you will see the main input screen, which will allow you to enter basic information, such as the output folder and the project description. The input file list has "Add Folder" and "Add File" buttons, which allow you to input multiple sources of data: individual diffraction images, folders with diffraction images (or subfolders, etc.), and text files containing lists of paths to diffraction images (absolute paths work best here). As each item is added, a line is generated showing the number of images therein, as well as "actions" that can be taken. The entry can be deleted from the list (middle button), or the images can be viewed using an image viewer (left button, with a diffraction icon). "IOTA" would launch the image viewer appropriate to the backend selected from the "Integrate with" dropdown menu; thus either "DIALS" or "cctbx" image viewer would open.
As entries are added, a total number of read-in images is reported in the lower right corner. Once all the inputs are read in, the user can customize their IOTA run by changing the various preferences and options.
Settings
GUI Preferences
The Preferences toolbar button opens a dialog which allows the user to set some settings for the IOTA GUI, among them the choice of the multiprocessing method, monitor mode options, etc.
Currently, three queueing modes are available (by clicking on the "Preferences" toolbar button): 'mpi' allows you to submit jobs using the MPI protocol, 'lsf' will allow you to submit jobs to an LSF queue (in use at LCLS), while 'torq' refers to the queue set up at SSRL's processing servers (this one is under construction). The queues can be selected from a drop-down list or, if not found on the list, a queue name can be supplied by user. If a different queueing protocol is present or if custom submission script is desired, a custom submit and kill commands can be entered into Submit Command
and Kill Command
fields
Processing Options
The main screen contains two settings buttons: IOTA Settings
, which opens a dialog with processing options that apply to the IOTA front end, and Backend Settings
, which opens a pop-up menu that leads to dialogs with options for spotfinding, indexing, lattice refinement, integration, etc. for the processing backend (in this case dials.stills_process
). There is some overlap between IOTA and backend options, which will probably diminish in the future. It is unlikely that any modification of the backend settings is necessary. We have tried to supply reasonable default presets for dials.stills_process
; however, optimization of data processing algorithms is still a work in progress and advanced users may find it necessary to play with some of the more obscure parameters. At the moment, we encourage users to contact IOTA developers with any questions.
Analysis Options
The analysis dialog allows you to output various charts summarizing IOTA output as well as individual image integration results. Most of the charts are a remnant of the older, command-line version of IOTA; they have been superseded by the charts shown in the run-time GUI. However, users who desire an in-depth, image-by-image look into their data processing, can turn on these features. They include: 1. Integration predictions overlayed on diffraction images; 2. Plots of lattice model shifts during refinement; 3. Mosaicity "trumpet" plots. Most of these are generated by the cctbx.xfel and DIALS backends. WARNING: The generation of these plots may slow down your IOTA run!
Also note: the unit cell clustering option is off by default, as the module seems to conflict with some installations of the cctbx suite of software. If the user doesn't turn the clustering option on, it can be initiated after the processing run is concluded (see below).
Run Statistics and Analysis
Once IOTA is running, a run-time processing window will appear with two tabs: a Log tab that will display iota.log as it is updated in real time, and the Charts tab, which will display several useful graphs: of resolution vs. frame, number of strong (I / sigI > threshold) spots per frame, a histogram of unit cell parameters, a plot of indices with measurements, and a bar chart breaking down indexing / integration success for the full dataset. The processing window will also allow the user to turn on the "Monitor Mode", in which IOTA will continuously check if any new diffraction images have been added to the input folder (or subfolders therein); this is a useful mode to use when running IOTA concurrently with data collection.
The log text is searchable, allowing the user to see the log entry for any specific image. Several of the charts are clickable: the resolution / number of spots charts allow the user to click on any individual point on the scatter plot, learn the associated filename, and launch DIALS image viewer to view the image; the plot of indices can be clicked to view a h=0, k=0, or l=0 slice; a double-click on any segment of the run summary plot will allow the user to view all or some of the images associated with that particular group (e.g. if the user double-clicks on the 'Failed Indexing' fraction, they can then view all or a portion of the images that could not be indexed).
When the run finishes, a new Analysis tab will appear in the processing window. There, the pertinent summary of the run would be displayed, along with buttons that will display several useful charts: a heatmap of the spot-finding results (if the cctbx.xfel backend was used), resolution histograms and beam XYZ charts. The user can run unit cell clustering from this window (results will be displayed in the table) with different options, if desired. The user can also choose to run PRIME from this window, in which case the PRIME GUI will launch with the parameters pertinent to this run filled in (e.g. input / output folders, resolution limits, pixel size, unit cell, etc.)
IOTA in Command Line
Auto Mode
The simplest way to run IOTA is in Auto Mode. To do so, simply issue:
iota.run /path/to/image/files/
Alternatively, if a text file with a list of images exists, IOTA can accept that file as input:
iota.run input_images.lst
Once running, IOTA will display a program logo, some information about the configuration of the run and a progress bar for each major step, e.g.:
% iota.run hewl_Br_data IIIIII OOOOOOO TTTTTTTTTT A II O O TT A A II O O TT A A >------INTEGRATION----OPTIMIZATION--------TRIAGE-------ANALYSIS------------> II O O TT A A II O O TT A A IIIIII OOOOOOO TT A A v1.4.006 Interpreting input -- DONE.................................................0.30s Initializing run parameters -- DONE........................................0.01s PROCESSING: 12% [ - ] [ ====> ] est: 3819s
IOTA will put all of the output into the folder named integration
, which will contain subfolders for each integration run, titled "001", "002", "003", etc. Each run generates a folder named "final" with the final integrated pickles as well as individual cctbx.xfel logs for each image. Furthermore, lists of files that have been successfully integrated (integrated.lst), failed integration (not_integrated.lst), etc. can be found there. The IOTA script for this run (iota_r#.param
), a backend script (target.phil
), and a pre-populated script for PRIME (prime.phil) can be found there as well. These can be modified by the user for future runs, and/or read them into the IOTA or PRIME GUIs. (GUI-based runs also generate the same exact output, and can be run as scripts in command-line mode.)
Target Files
IOTA itself is a front-end to the data processing program dials.stills_process
. This program requires its own set of parameters, distinct from IOTA parameters, which are located in so-called "target" files: text files containing parameters encoded in Python-based hierarchical interchange language or PHIL. When run in AUTO mode, IOTA generates an appropriate target file using defaults deemed reasonable for most serial crystallography projects. This default target file can also serve as a starting point for the user to modify those settings as they see fit. The user has the option to provide their own target file (perhaps generated during a previous data processing attempt). The user can edit the IOTA settings to specify the target file
cctbx { target = "user_params.phil" }
or use a command-line argument
iota.run /path/to/image/files/ dials.target=user_params.phil
Script Mode
IOTA can be run using a script file, e.g.:
iota.run script.param
The script contains settings in PHIL format, e.g.:
description = "IOTA parameters auto-generated on Friday, Aug 02, 2019. 04:40 PM" input = "/Users/art/Science/iota_tutorial/hewl_Br_data" output = "/Users/art/Science/iota_tutorial" data_selection { image_triage { flag_on = True minimum_Bragg_peaks = 10 strong_sigma = 5 } image_range { flag_on = False range = None } random_sample { flag_on = False number = 0 } } . . .
The script can be auto-generated (with an accompanying target.phil file with some default cctbx.xfel settings) via a "dry run" by issuing
iota.run -d
The same "-d" command-line option will print to terminal the full IOTA script file with help statements.
Additionally, IOTA settings can be modified by command-line statements, e.g.:
iota.run script.param data_selection.image_range.flag_on=True data_selection.image_range.range=1-100
Single-Image Mode
IOTA can accept a single image as input:
iota.run images/img_00001.pickle
Alternatively, IOTA can be run in bare-bones "single-image mode"
iota.single_image images/img_00001.pickle
These options are best for testing purposes.
Command-line Options
In addition to a command script, IOTA runs can be modified by command-line options:
-h, --help show this help message and exit --version Prints version info of IOTA -d, --default Generate default settings files and stop --ha14 Run IOTA with old HA14 backend --random RANDOM Size of randomized subset, e.g. "--random 10" --range [RANGE] Range of images, e.g."--range 1-5,25,200-250" -n NPROC, --nproc NPROC Specify a number of cores for a multiprocessor run" --analyze [ANALYZE] Use for analysis only; specify run number or folder --tmp TMP Path to temp folder --silent Run IOTA in silent mode
These options can be shown by issuing:
iota.run -h
All of the options in the script can be introduced as command-line statements by using a "compressed" PHIL format. Thus:
cctbx_xfel { target_space_group = P422 }
translates into
iota.run script.param cctbx_xfel.target_space_group=P422
Output
Due to IOTA's flexibility, there are several types of output that co-exist simultaneously and can be somewhat disconnected from one another. It helps to think of them as three separate stages of the process: pre-processing, integration, and post-processing / analysis.
In pre-processing, raw images are read in and converted to Python pickles. These are saved under the converted_pickles
folder in the format <prefix>_<run_no>_<#####>.pickle; each cycle of pre-processing is assigned a run number (e.g. "001", "002", "003", etc.). Pre-processing is only triggered if a) the read-in image is not already pickled or b) the image has to be modified in some way (e.g. override beamXY coordinates, change detector distance, etc.). Thus, if converted and modified pickles are submitted to IOTA, the "converted_pickles" folder will not be created. The purpose of this is to allow the user to experiment with image modification, then subsequently select the converted pickles that best fit the user's needs. (NOTE: conversion to Python pickle format is only necessary for the deprecated "HA14" processing backend. The new dials.stills_process
program can read any image format directly and doesn't require format conversion. Likewise, spotfinding parameter grid search is only applicable to the "HA14" backend and is not used with dials.stills_process
.)
The output of the other two steps (integration and post-processing / analysis) can be found under the integration
folder. It mostly contains a lot of files that are necessary for IOTA to keep track of the processed data, including a JSON dictionary file proc.info
that contains all the information about the current processing run. The integrated pickles are collected under integration/###/final
folder, in the format int_<filename>.pickle. Only successfully integrated images are saved this way. For each of the input images, however, a log of the backend output is saved in the logs
folder, in the format <filename>.log. This file is not for the faint of heart, as it contains the raw output of dials.stills_process
and can be used for troubleshooting by advanced users.
If the user chooses to output any charts (e.g. beam center plot, image visualization, etc.), these will be found under integration/###/visualization
folder.
Finally, the integration/###
folder itself contains text files with lists of images, e.g. input images, all integrated images, all images that failed integration, major clusters from the unit cell-clustering module, etc. The main logfile (iota.log
) is also found here, as are the script files for IOTA (iota_r#.param
) and the backend (target.phil
), and the default input file for PRIME (prime.phil).
IOTA Tutorial
Sample Dataset: Good Ol' Lysozyme
For the purposes of this tutorial we will use serial diffraction data collected from good ol' hen egg-white lysozyme (HEWL) crystals using synchrotron radiation. I know, I know, you're tired of lysozyme, but before we tackle the Grand Problems of Crystallography (tm), we'll use a good, reliable dataset to learn how to use this software. Onward!
Obtaining the sample data and setting up the environment
1. Make sure the cctbx.xfel is installed and available on your configuration. The software is available with the latest Phenix distribution. As cctbx.xfel is under active and vigorous development, make sure you install the latest Phenix nightly build to obtain the latest version of cctbx.xfel.
2. Create a new directory (e.g. iota_tutorial
) in your user space, to ensure that you'll have read-write permissions. Go into that folder. From now on, all the files will be written there.
3. Download the compressed tarball containing diffraction images.
4. Once the download is complete (it might take a while), create a subfolder in your iota_tutorial
folder called images
, move the tarball there and issue:
gunzip HEWL_synch_serial.tar.gz
and then
tar -xvf HEWL_synch_serial.tar
which will decompress 324 diffraction images. Make sure to then delete HEWL_synch_serial.tar
Using IOTA GUI
Reading in the data and making a beamstop shadow mask
1. In your terminal window, make sure you're in the folder where you want to run IOTA and issue
iota
to open the main input window. It should look like this:
NOTE: All the images in this tutorial have been generated by running IOTA on Mac OS X 10.14.5. On other systems, the graphic elements of the UI may look somewhat different, but the general look and feel, as well as functionality, should be the same.
2. Our data are located entirely in a single images
folder. You can read in the images via the input controls under the input window; this can be done either by entering in the full path of the data folder or files into the "Input Path" text box (you can use wild-cards here, e.g. /folder/path/*.cbf
) or by clicking the "Browse..." button. Click the "Browse..." button now. A pop-up menu will appear; select "Browse folders..." This will open a standard folder dialog.
3. Select the images
folder. The absolute path will now appear in the input window, with an associated image count (which should be 324 in this case) on the left, and three "action" buttons (image viewer, delete item, information) on the right:
3a. OPTIONAL: Exit IOTA and issue
iota ./images/
Neat, huh? IOTA has several command-line options that can be useful as shortcuts. Use the -h
option to see which other options are available. The list should be growing as we add more options in the near future.
4. Now we will make a mask for the beamstop shadow using the DIALS image viewer.
4a. Locate the "action" buttons to the right of the image folder item; click the button that looks like a tiny diffraction image. This will launch a dialog asking you to select how many images from the dataset you'd like to display in the image viewer.
4b. Select First 1 image
and press OK. The DIALS image viewer will launch next (in some cases, especially if you're trying to display hundreds of images, it may take a while to load) and present the image you've selected.
4c. Locate the Actions
submenu in the main menu, and select Show Mask Tool
. This will open up the mask tool dialog, which will allow you to draw the beamstop shadow mask. I used a circle and a separate polygon to circumscribe the beamstop shadow in the image, like this:
4d. Click on "Save Mask", which will save a mask.pickle file in your iota_tutorial
directory. You can close the DIALS image viewer now so as not to clutter your screen.
5. In order to use the mask, we have to read it in. Click on the IOTA Settings...
button, which will open the IOTA Settings dialog. You will see a lot of options, but we're only going to import the mask. Under Beamstop Mask
click on the (...)
button, select the mask.pickle
file and click OK. The full path to the mask file should now show up in the dialog, like this:
Running with defaults
6. The data processing backend for IOTA is dials.stills_process
[[1]], which is implemented in 'cctbx.xfel' . It is possible to run an older backend (here referenced as "HA14", Fig. 6). The option to select the processing backend is found in the IOTA Settings Dialog under Advanced
Keep the backend at cctbx_xfel
for the purposes of this tutorial, as the HA14-based backend is deprecated and will probably crash, anyway.
7. A practice that has worked really well, especially for new data with unknown crystal parameters, is to run IOTA with defaults at first, and then re-run it using information gleaned from the analysis of the first run. To run IOTA, click on the big green RUN IOTA
button in the lower right corner. A new window should open, which will display processing results as they arrive.
7a. OPTIONAL: Before starting IOTA, specify the number of cores you want to use for the processing run. The processing is very parallelizable, given that each diffraction image will be indexed and integrated independently of the others. Thus, the more cores you use, the faster the process will go. By default, IOTA sets the number of cores to 3/4 of those available; thus, if you're running this on a MacBook Pro with a 2.9GHz Intel Core i7 CPU, you will have access to 8 cores and the default value will be 6. To adjust the number of cores as desired, click on the Preferences
toolbar button, then change the value under No. Processors
.
Top to bottom (and left to right as appropriate), the charts are: 1. Resolution per frame, 2. Number of strong (I/sigma > 5) spots per frame, 3. Wilson B-factor histogram, 4. Chart of indices with measurements (with color designating redundancy), 5. Processing summary (color-coded for different outcomes).
8. Try clicking on individual points on the resolution / strong spots charts: a filename should appear in a text control above the charts, and clicking the diffraction pattern button should open that image in the dials.image_viewer
. Does the integration result for that image correlate with what you see?
9. Try double-clicking on the orange "failed indexing" bar on the summary chart. A dialog should pop up prompting you to select images for viewing; select all images
and inspect them. Does it make sense why they were not indexed?
10. When the run is complete, the Analysis tab will be added to the Log and Charts tabs. It contains a bunch of useful information and allows the user to pop open a few useful charts. At this point we are pretending that we do not know the crystal parameters of the system we're studying, and IOTA provides us with some preliminary information:
11. While IOTA reports the consensus Bravais lattice determination of P422, it may not necessarily be correct, and the unit cell parameters are averages, which may be skewed: DIALS indexes each image separately, which causes the lattice model to be determined with relatively poor accuracy and precision; not all the images would be indexed in the tetragonal lattice, nor would the unit cell parameters be the same (or even close!) for each one. Therefore, we need to obtain more granular information by using unit cell clustering. To do so, click the Run Clustering
button. (Post-processing clustering can also be run automatically after processing: turn it on in the Analysis
box in the IOTA Settings dialog.) If the dataset contains many images, this step may take a while. For enormous datasets (10,000+ images), it's recommended to select a smaller subset for this analysis. The results will be displayed in the same window:
Note that not all images indexed in the tetragonal lattice! Some are orthorhombic. Others are tetragonal, but with a different set of unit cell parameters. The majority of the indexed images, however, are consistent with the known parameters of the tetragonal hen egg-white lysozyme. Which is a relief.
Re-processing with new information
12. Go ahead and close IOTA. Don't worry, the information here is not gone! For starters, you can peruse the file ./iota_tutorial/integration/001/iota.log
for image-by-image information on processing and its results. Also, for the brave and foolish, the ./iota_tutorial/integration/001/logs/
folder will contain detailed DIALS logs for every image. But for casual users, the GUI offers something even easier: you can "recover" previous runs by clicking the Recover
toolbar button and selecting the run you're interested in. So open IOTA again, click on the Recover
button and select the single run:
NOTE: if you don't want to open the full run, but only want to recover its settings, you can select "settings only" in the drop down menu in the Recovery Dialog. Otherwise, "everything" is the default, and it will open the processing window, with all the charts and (if the run was completed) with the analysis tab. If the run was aborted or terminated for any other reason, the results so-far will be displayed and you'll have the option to continue the processing where it left off.
13. Now that we've recovered our run, we actually don't need the processing window - but make sure you take down the Bravais lattice and unit cell information: you will need them.
14. Click the IOTA Settings
button. In the IOTA Settings dialog find the Processing Options
box; under Target Space Group
supply the space group (here, let's use the Bravais lattice P422 we determined in the previous run); put unit cell parameters (79.2 79.2 38.1 90 90 90) into the Target Unit Cell
text box:
15. Close the IOTA Settings dialog and click RUN IOTA
again. A new processing window should open, displaying processing results obtained using the new information on our dataset. Does this look obviously different from the previous run?
16. The graph of indices with measured reflections shows many gaps, which would imply that completeness is lacking in this dataset. However, by default the chart is displayed in P1 symmetry. If you check the Space Group
checkbox, the Bravais lattice that you supplied in the settings will be applied (P422 in our case, P1 if no lattice information was provided):
17. That looks better, doesn't it? In fact, you can check out the completeness in any space group imaginable. Try setting the space group at P 43 21 1, which is the correct space group for tetragonal lysozyme. The completeness should look even better.
18. And now we're done! The information on the Analysis Tab should show significant improvement from the last run. The clustering should return a single cluster now. And you are ready for the next step, which is to run cctbx.prime to scale, post-refine, and merge the processed data.