Processing Blog

From cctbx_xfel
Revision as of 23:19, 17 April 2014 by Zeldin (Talk)

Jump to: navigation, search

Data Processing Blog

A tricky case of merging resolution limits

Merging some data today where the default per-frame resolution filter settings were cutting things a little too harshly, and discarding a lot more data than expected (based on manual inspection of a maximum projection pattern). So, how to change this? CXTBX.XFEL performs a per-frame resolution test at two stages: during integration, and during merging. Here, we are going to discuss editing the during merging part of things. There are two phil parameters you can play with for this: significance_filter.sigma (default 0.5), and significance_filter.max_ct (default 50). The sigma option will remove resolution bins such that all accepted bins have Mean(I/sigma) >= sigma, and the max_ct option will set the bin sizes so that the number of bins is maximized, subject to mean bin population <= max_ct.

A quick inspection of the terminal output from cxi.merge showed that in many cases, the Mean(I/sigma) was jumping up and down a lot, for example:

Step 4. Filter on global resolution and map to asu

 Bin  Resolution Range  Compl.         <I>     <I/sig(I)>
   1 22.4216 -  8.1030 [127/5229]       646.3       3.8
   2  8.1030 -  6.4828 [ 45/5157]       428.9       1.7
   3  6.4828 -  5.6785 [103/5257]        89.3       0.7
   4  5.6785 -  5.1663 [107/5182]        62.2       0.4
   5  5.1663 -  4.7999 [101/5201]       121.2       0.5
   6  4.7999 -  4.5193 [ 78/5189]        93.9       0.3
   7  4.5193 -  4.2946 [ 83/5191]        36.2       0.2
   8  4.2946 -  4.1089 [ 71/5156]       197.4       0.7
   9  4.1089 -  3.9516 [ 67/5199]        43.5       0.3
  10  3.9516 -  3.8159 [ 74/5170]        83.3       0.4
  11  3.8159 -  3.6971 [110/5194]        12.7       0.2
  12  3.6971 -  3.5919 [ 76/5227]        16.9       0.2

Using the defaults, this file would cut off at 5.7 A, but I would like it to go to 4.8, since it looks like that is a legitimate 0.5 in the I/sigI column. Changing the bin size should average this out. I therefore re-ran the merging with the following in my phil file:

significance_filter{ 
    sigma=0.5
    max_ct=100
}

This helped a bit, and the per-image cuttoffs were looking more sensible. I then tweaked the sigma down to 0.4, since it decreased gradually to around this point before going mental, and got even better results.

When doing this, how do you know when you are just adding junk to your file? If you observe lots of variation in your intensities per image per bin, then it's fairly safe to increase max_ct. This has the effect of smoothing things out, but may also underestimate your resolution if the bins become too large. Reducing sigma, on the other hand, can easily introduce junk data. The only real way to test for this is to see if your statistics from cxi.xmerge, in particular CC1/2, get better.

--Zeldin (talk) 23:19, 17 April 2014 (UTC)

A Traceback error in April

15 April 2014. A user writes:

I am seeing a Traceback error during integration with pyana:

/cctbx_project/myrelease/cxi_data/present_image_0000001

Beam center is not immediately clear; rigorously retesting 9 solutions
Beam x 96.6 y 97.1, initial score 9; refined rmsd: None
Beam x 96.8 y 97.5, initial score 8; refined rmsd: None
Beam x 96.6 y 97.0, initial score 7; refined rmsd: None
Beam x 96.6 y 97.2, initial score 7; refined rmsd: None
Beam x 96.7 y 97.4, initial score 6; refined rmsd: None
Beam x 96.6 y 97.3, initial score 6; refined rmsd: None
Beam x 96.9 y 97.0, initial score 6; refined rmsd: None
Beam x 96.9 y 97.1, initial score 6; refined rmsd: None
Process Process-2:

Traceback (most recent call last):
  File "/usr/local/crys_test/psdm/sw/external/python/2.7.2/x86_64-rhel6-gcc44-opt/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/crys_test/psdm/sw/external/python/2.7.2/x86_64-rhel6-gcc44-opt/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/crys/psdm/sw/releases/ana-current/arch/x86_64-rhel6-gcc44-opt/python/pyana/pyanamod.py", line 126, in _proc
    for req in proto.getRequest():
  File "/usr/local/crys/psdm/sw/releases/ana-current/arch/x86_64-rhel6-gcc44-opt/python/pyana/mp_proto.py", line 121, in getRequest
    opcode = self._conn.recv_bytes()
EOFError

Reply:

  • It's not clear what's causing the EOFError. However, we should at least check for memory leaks that would cause Pyana processing to draw to a standstill. The diagnostic is to use the "top" command in the Unix shell. Observe whether or not the subprocess memory use continually increases as more images are processed. Correct behavior is for the memory use to reach a stationary plateau. We recently fixed a memory leak bug (March 2014), but it is worth checking this to make sure.
  • You should generally not get a "rigorously retesting 9 solutions" message for XFEL indexing. Generally for XFELs we know the beam position accurately so there is no need to search for the direct beam. Your phil file should include beam_search_scope=0.5 to avoid a large search radius. However, your log indicates a narrow search, likely indicating that spot finder sees numerous spots that are unphysically close. Many things could be done to safeguard you from this:
    • Careful use of detector masks, as described in Preparatory steps on the tutorial wiki . This would avoid picking spots on compromised areas of the detector. Aaron is this correct?
    • Use distl.minimum_spot_area=1 instead of =0. This would avoid picking random local-maxima pixels as spots, but there is a tradeoff.
    • Set the phil file resolution cutoff (3 different places) to a value close to where you actually intend to cut off the merging. This avoids picking spots for indexing that aren't real spots.
  • I notice you are using integration.detector_gain=7.5. However, we've been having a lot of trouble with setting this to anything else but 1.0. Until we can sort out the code, I recommend using gain=1.0 during processing, and then in the cxi.merge phil use significance_filter.sigma=0.5. This combination should help retain as many reflections as possible.