Automated data processing in JBluIce/PyBluIce (short summary)

Data processing jobs in JBluIce/PyBluIce are started as follows:

The JBluIce/PyBluIce data processing pipeline is named GMCAproc. Three GMCAproc XDS jobs (gmcaproc) processing 25%, 50% and 100% of images are started when user clicks the "Collect" button (25% or 50% may be skipped dependent on data collection time). These jobs index diffraction using the first half of the data, integrate it in P1, and then scale in the space group suggested by the POINTLESS program (data is saved in the "scale-1" subdirectory). For the 100% data, additional cycles of reintegration and scaling are preformed (see notes on the XDS optimization at https://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Optimisation). Resulting data is saved in the "scale-2" directory (labeled as "gmcaproc-refi" in the JBluIce/PyBluIce Analysis Tab). The optimization step could produce better results than the single pass, but not always.
At the end of the data collection, three pipelines are started to process all data (fast_dp by Diamond, autoPROC by Global Phasing Ltd^*, and DIALS via Xia2). If user clicks "Pause" during data collection, additional jobs of fast_dp will be run (gmcaproc will run instead if there are multiple sweeps in a single run). If enough data were collected, DIALS and autoPROC will also start.
By default, processing will occur in a directory named "/process", at the same level as the directory "/collect" (if no collect directory exists, then "/process" will be created in the directory containing images). User may also choose to run all the processing in a single directory tree "/process" under the root data directory named 23ID..., or disable the auto processing altogether in the menu Tools/Options/Options.
User can reprocess data using the reprocessing button. A small set of user defined parameters can be set for reprocessing. Reprocess also supports processing of multiple runs of a single crystal and multiple crystal datasets.
The processing results in JBluIce/PyBluIce will be updated and saved automatically periodically. They correspond to three files present in the data collection root directory 23ID..._cbf (these files are automatically backed up by the GMCA rsync script).
1. File Summary-of-Processing-Results.html stores all data processing results. This file contains all the information user need to look the processing results when they get home, including file locations. User can use a web browser to open this file (the relative links in it will only work if the original directory structure is kept). Please note that additional columns (e.g. truncate, working directory) may be hidden, they can be unhidden using the top left "Show/Hide Columns" button.
2. File Summary-of-Screening-Results.html stores all indexing and strategy results (from the Screening Tab or Collect Tab 0).
3. File datasets.txt stores a text summary of datasets collected.
For people collecting many partial data sets, you may turn off the auto processing, and launch the Spring-8 KAMO pipeline after one or several data files are collected (KAMO button is located at the bottom of the Analysis Tab). The KAMO process will monitor the data collected and process each data after collection in the "23ID..._cbf/process" directory. User may merge the crystals periodically as data accumulate (a background processing monitors the merging request, so there is no need to run merging scripts through command line).

For more information please check:

^* The autoPROC software by Global Phasing Ltd is exclusively available for non-commercial users or commercial users with the autoPROC license. For the other users the autoPROC pipeline will not be invoked. Contact your host before your beamtime starts if you are a commercial user, but have an autoPROC license at your institution.