Automated data processing in JBluIce (short summary)
Data processing jobs in JBluIce are started as follows:
- The JBluIce data processing pipeline is named GMCAproc.
Three GMCAproc XDS jobs (gmcaproc) processing 25%, 50% and 100% of
images are started when user clicks the "Collect" button
(25% or 50% may be skipped dependent on data collection time).
These jobs index diffraction using the first half of the data,
integrate it in P1, and then scale in the space group suggested by the
POINTLESS program (data is saved in the "scale-1" subdirectory).
For the 100% data, additional cycles of reintegration and scaling are
preformed (see notes on the XDS optimization at
https://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Optimisation).
Resulting data is saved in the "scale-2" directory (labeled as
"gmcaproc-refi" in the JBluIce Analysis Tab). The optimization step
could produce better results than the single pass, but not always.
- At the end of the data collection, three pipelines are
started to process all data (fast_dp by Diamond, autoPROC by Global
Phasing Ltd*, and DIALS via Xia2). If user clicks "Pause" during data
collection, additional jobs of fast_dp will be run (gmcaproc will run
instead if there are multiple sweeps in a single run). If enough data
were collected, DIALS and autoPROC will also start.
- By default, processing will occur in a directory named
"/process", at the same level as the directory "/collect" (if no
collect directory exists, then "/process" will be created in the
directory containing images). User may also choose to run all the
processing in a single directory tree "/process" under the root data
directory named 23ID..., or disable the auto processing altogether in the
menu Tools/Options/Options.
- User can reprocess data using the reprocessing button. A
small set of user defined parameters can be set for reprocessing.
Reprocess also supports processing of multiple runs of a single crystal
and multiple crystal datasets.
- The processing results in JBluIce will be updated and
saved automatically periodically. They correspond to three files
present in the data collection root directory 23ID..._cbf (these files
are automatically backed up by the GMCA rsync script).
- File Summary-of-Processing-Results.html stores all
data processing results. This file contains all the information user
need to look the processing results when they get home, including file
locations. User can use a web browser to open this file (the
relative links in it will only work if the original directory structure
is kept). Please note that additional columns (e.g. truncate, working
directory) may be hidden, they can be unhidden using the top left
"Show/Hide Columns" button.
- File Summary-of-Screening-Results.html stores all
indexing and strategy results (from the Screening Tab or Collect Tab
0).
- File datasets.txt stores a text summary of datasets
collected.
- For people collecting many partial data sets, you may
turn off the auto processing, and launch the Spring-8 KAMO pipeline
after one or several data files are collected (KAMO button is located
at the bottom of the Analysis Tab). The KAMO process will monitor the
data collected and process each data after collection in the
"23ID..._cbf/process" directory. User may merge the crystals
periodically as data accumulate (a background processing monitors the
merging request, so there is no need to run merging scripts through
command line).
For more information please check:
*autoPROC
software by Global Phasing Ltd is exclusively available for
non-commercial users or commercial users with the autoPROC license.
For the other users the autoPROC pipeline will not be invoked.
Contact your host before your beamtime starts if you are a commercial
user, but have an autoPROC license at your institution.
|