Processing Blog: Difference between revisions

From cctbx_xfel
Jump to navigation Jump to search
(No difference)

Revision as of 21:03, 9 May 2014

Data Processing Blog

Reindexing operator needed to identify C-axis in an orthorhombic setting (6 May 2014 NKS)

The user brought us data from an orthorhombic space group. The synchrotron structure had been solved and published with this unit cell:

space group P 21 21 21 unit cell 107.5 75.0 51.3 90 90 90

However, the program cxi.merge fails to merge any XFEL data when the synchrotron-solved PDB file is used as the "model" for isomorphous scaling. Knowing that the axes of an orthorhombic group can be permuted, we immediately suspected a mismatch between the XFEL unit cell and the PDB unit cell. To inspect the XFEL integration results, we use the command:

cxi.print_pickle [integration_result_file.pickle]

All of the integration pickle files had this unit cell:

Laue group P 2 2 2 unit cell 50.1 73.6 103.8 90 90 90

Therefore we added the following item to the phil-parameters for cxi.merge:

data_reindex_op=l,k,h

This reindexing had the desired effect of aligning the new data with the reference model.

A tricky case of merging resolution limits

Merging some data today where the default per-frame resolution filter settings were cutting things a little too harshly, and discarding a lot more data than expected (based on manual inspection of a maximum projection pattern). So, how to change this? CXTBX.XFEL performs a per-frame resolution test at two stages: during integration, and during merging. Here, we are going to discuss editing the during merging part of things. There are two phil parameters you can play with for this: significance_filter.sigma (default 0.5), and significance_filter.max_ct (default 50). The sigma option will remove resolution bins such that all accepted bins have Mean(I/sigma) >= sigma, and the max_ct option will set the bin sizes so that the number of bins is maximized, subject to mean bin population <= max_ct.

A quick inspection of the terminal output from cxi.merge showed that in many cases, the Mean(I/sigma) was jumping up and down a lot, for example:

Step 4. Filter on global resolution and map to asu

 Bin  Resolution Range  Compl.         <I>     <I/sig(I)>
   1 22.4216 -  8.1030 [127/5229]       646.3       3.8
   2  8.1030 -  6.4828 [ 45/5157]       428.9       1.7
   3  6.4828 -  5.6785 [103/5257]        89.3       0.7
   4  5.6785 -  5.1663 [107/5182]        62.2       0.4
   5  5.1663 -  4.7999 [101/5201]       121.2       0.5
   6  4.7999 -  4.5193 [ 78/5189]        93.9       0.3
   7  4.5193 -  4.2946 [ 83/5191]        36.2       0.2
   8  4.2946 -  4.1089 [ 71/5156]       197.4       0.7
   9  4.1089 -  3.9516 [ 67/5199]        43.5       0.3
  10  3.9516 -  3.8159 [ 74/5170]        83.3       0.4
  11  3.8159 -  3.6971 [110/5194]        12.7       0.2
  12  3.6971 -  3.5919 [ 76/5227]        16.9       0.2

Using the defaults, this file would cut off at 5.7 A, but I would like it to go to 4.8, since it looks like that is a legitimate 0.5 in the I/sigI column. Changing the bin size should average this out. I therefore re-ran the merging with the following in my phil file:

significance_filter{ 
    sigma=0.5
    max_ct=100
}

This helped a bit, and the per-image cuttoffs were looking more sensible. I then tweaked the sigma down to 0.4, since it decreased gradually to around this point before going mental, and got even better results.

When doing this, how do you know when you are just adding junk to your file? If you observe lots of variation in your intensities per image per bin, then it's fairly safe to increase max_ct. This has the effect of smoothing things out, but may also underestimate your resolution if the bins become too large. Reducing sigma, on the other hand, can easily introduce junk data. The only real way to test for this is to see if your statistics from cxi.xmerge, in particular CC1/2, get better.

--Zeldin (talk) 23:19, 17 April 2014 (UTC)

A Traceback error in April

15 April 2014. A user writes:

I am seeing a Traceback error during integration with pyana:

/cctbx_project/myrelease/cxi_data/present_image_0000001

Beam center is not immediately clear; rigorously retesting 9 solutions
Beam x 96.6 y 97.1, initial score 9; refined rmsd: None
Beam x 96.8 y 97.5, initial score 8; refined rmsd: None
Beam x 96.6 y 97.0, initial score 7; refined rmsd: None
Beam x 96.6 y 97.2, initial score 7; refined rmsd: None
Beam x 96.7 y 97.4, initial score 6; refined rmsd: None
Beam x 96.6 y 97.3, initial score 6; refined rmsd: None
Beam x 96.9 y 97.0, initial score 6; refined rmsd: None
Beam x 96.9 y 97.1, initial score 6; refined rmsd: None
Process Process-2:

Traceback (most recent call last):
  File "/usr/local/crys_test/psdm/sw/external/python/2.7.2/x86_64-rhel6-gcc44-opt/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/crys_test/psdm/sw/external/python/2.7.2/x86_64-rhel6-gcc44-opt/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/crys/psdm/sw/releases/ana-current/arch/x86_64-rhel6-gcc44-opt/python/pyana/pyanamod.py", line 126, in _proc
    for req in proto.getRequest():
  File "/usr/local/crys/psdm/sw/releases/ana-current/arch/x86_64-rhel6-gcc44-opt/python/pyana/mp_proto.py", line 121, in getRequest
    opcode = self._conn.recv_bytes()
EOFError

Reply:

  • It's not clear what's causing the EOFError. However, we should at least check for memory leaks that would cause Pyana processing to draw to a standstill. The diagnostic is to use the "top" command in the Unix shell. Observe whether or not the subprocess memory use continually increases as more images are processed. Correct behavior is for the memory use to reach a stationary plateau. We recently fixed a memory leak bug (March 2014), but it is worth checking this to make sure.
  • You should generally not get a "rigorously retesting 9 solutions" message for XFEL indexing. Generally for XFELs we know the beam position accurately so there is no need to search for the direct beam. Your phil file should include beam_search_scope=0.5 to avoid a large search radius. However, your log indicates a narrow search, likely indicating that spot finder sees numerous spots that are unphysically close. Many things could be done to safeguard you from this:
    • Careful use of detector masks, as described in Preparatory steps on the tutorial wiki . This would avoid picking spots on compromised areas of the detector. Aaron is this correct?
    • Use distl.minimum_spot_area=1 instead of =0. This would avoid picking random local-maxima pixels as spots, but there is a tradeoff.
    • Set the phil file resolution cutoff (3 different places) to a value close to where you actually intend to cut off the merging. This avoids picking spots for indexing that aren't real spots.
  • I notice you are using integration.detector_gain=7.5. However, we've been having a lot of trouble with setting this to anything else but 1.0. Until we can sort out the code, I recommend using gain=1.0 during processing, and then in the cxi.merge phil use significance_filter.sigma=0.5. This combination should help retain as many reflections as possible.


Low resolution reflections missing

A user was indexing using cxi.index and finding many low resolution spots not being predicted. Here, cxi.index refuses to predict spots that are overlapped. Setting integration.background_factor=1 and integration.guard_width_sq=3 can force a better outcome.


Indexing images with fewer spots

A user was having a hard time with images that had clear diffraction patterns with between 20 to 40 spots but that didn't index with cxi.index. Here's how we resolved the problem.

Add the following to the indexing phil file:

1) target cell and known setting. These parameters provide constraints on the still-image indexing procedure algorithm that finds basis vectors without the benefit of additional images with rotational data. See the phil section. Example for lysozyme:

target_cell = 38 74 74 90 90 90
known_setting = 9

2) Add distl_minimum_number_spots_for_indexing=20 and model_refinement_minimum_N=10 to the phil file. These parameters set lower limits on the number of spots needed for indexing and crystal model refinement. For patterns with fewer spots, it disables two checks for enough spots to proceed.

3) distl.detector_tiling. This is a legacy parameter that shouldn't be needed, but is required for these weakly diffracting patterns. This is a bug that will be fixed soon. It should be equal to your image dimensions, for example: distl.detector_tiling=0,0,1024,1024