Argonne National Laboratory

GM/CA @ APS

Remote Data Backups with Globus Online

Department of Energy Office of Science
GM/CA @ APS Sponsors:
National Institute of General Medical Sciences (NIGMS) and National Cancer Institute (NCI) of the National Institutes of Health (NIH)
 

 

Please also check our Globus Video Guide!

 

Globus Online for GM/CA @ APS Users

Globus Online is a free service sponsored by DOE, NIH, NSF, Argonne, and the University of Chicago (see the list of sponsors). It addresses the challenges faced by researchers in moving, sharing, and archiving large volumes of data among distributed sites. With Globus Online, you hand-off data movement tasks to a hosted service that manages the entire operation, monitoring performance and errors, retrying failed transfers, correcting problems automatically whenever possible, and reporting status to keep you informed so that you can focus on your research. Our tests show that Globus Online is 2x faster than scp and rsync. This make a big difference reducing data transfer times from e.g. 6 hours to 3 hours. Also, the transfer progress can be started and watched from any place with Internet connection, e.g. from ANL Guest House, airport, or home.

Transferring Data Between GM/CA @ APS and Your Machine

  1. Login on Globus. There are at least three ways to do it:
    • If your institution has an association with Globus, then you can login to Globus using your existing organizational login. Navigate to https://auth.globus.org, choose your institution from the dropdown link and click "Continue". It will bring your institutional login web page when you will need to enter your institutional username and password. Then, your institution will authenticate you with Globus.
       
    • If you have a Google account, you can use it too. Navigate to https://auth.globus.org and click on "Sign in with Google". Then follow the prompts.
       
    • You can create a Globus ID personal account (one time for life) at https://www.globusid.org/create, Then, navigate to https://auth.globus.org, choose "Globus ID" from the drop-down link and click "Continue".
       

     
  2. Once you login, you will see the File Manager interface. Choose the two panels option at the top as one panel (normally left) will be showing your home directory at GMCA and the other one will be showing a directory on your computer.

     
  3. If you already have globusconnect application installed on the computer (the application which will be receiving data) and the computer already has an encryption key registered with Globus under your account), then skip to Step-4. Otherwise install globusconnect on the computer and obtain a unique computer Setup Key from Globus. Click into any of the two "Collection" fields to reveal the globusconnect download link:

    The above link brings you to the page where you can generate your personal Setup Key (if needed) and download Globus client for MacOS, Linux, and Windows:

    • Name your data-receiving computer as a new Globus Endpoint. Then, generate the computer Setup Key by pressing the "Generate Setup Key" button and copy this key to clipboard. You will need to provide this key to Globus Client after the installation.
    • Download a one-click Globus Connect Setup for the operating system of the data receiving computer. The application is available for MacOS, Linux, and Windows. Note the place where you saved the installer (e.g. on the Desktop or in the "Downloads" folder).
    • Unpack and install Globus Connect by running the downloaded Setup application. When Setup asks you for a setup key, paste from clipboard the previously copied Setup key. More detailed instructions can be found on the Globus website.

     
  4. Start the globusconnect application on the data receiving computer. For example, on Linux starting globusconnect is as simple as typing "./globusconnect" in respective directory:

    On Windows the application can be started via Start -> Programs -> Globus Connect menu. Normally no administrative privileges are required.
    NOTE: The globusconnect client should be started each time you want to transfer files to or from your data receiving computer. This application makes the computer visible among available endpoints on Globus Online web page. The client has a GUI interface, which looks like this:

     
  5. On the Globus Connect web page start typing gmca in the left "Collection" field to display the GMCA endpoints. Then, select either
          gmca23idb#gridftp 
    or
          gmca23idd#gridftp 
    
    endpoint depending on the beamline where your data is collected:

    The other two GMCA endpoints with "dmz" in their name reside on so-called DMZ network specifically tuned for Globus. They may provide higher transfer speeds, but require an extra step by your host to setup. If you are planning to use one of the DMZ endpoints, please contact your host in advance and tell him to add your remote IP range to the routing table for blXws8 (X=1 for 23IDD and X=2 for 23IDB) using one of the following commands:

              add_dmz_route <IP>
              add_dmz_route <CIDR>	  
          
     
  6. Enter your GMCA account credentials as prompted to access data at GMCA. Please note that the account is normally active for 2 days after your beamtime to avoid interference with other users. However, if there is an unforeseen reason for an extension, please contact your host.

     
  7. Once the credentials are accepted, you should see your folders at GMCA:

     
  8. Click on the right "Collection" pane and choose your local endpoint on the "Your Collections" tab:

     
  9. After that you should see a listing of local directory in the right pane;

     
  10. Select a file or directory and click on the highlighted "arrow button" to initiate the transfer:

     
  11. To watch the file transfer progress or possibly cancel a transfer, choose "View Transfers" from the drop down menu in the top bar of the Globus Online web page. The screen will look like this:

     

NOTE: In the present form Globus Online offers directories synchronization option in the "Transfer Files" drop down box, but no continuous synchronization. Although the lack of continuous synchronization is some inconvenience, it is outweighed by the speed, which has been tested to be at least 2x faster than traditional scp or rsync (rsync deploys scp for the transfers). We recommend to use Globus Online for transferring large amounts of data and possibly scp/rsync for post-transfer synchronizations.

Additional Learning Resources

  • Video guide: "Remote data transfers with Globus GridFTP" presented by Raj Kettimuthu, MCS. This 12-minutes video courtesy one of core Globus developers guides you through the steps of setting up Globus transfers with GMCA servers.
  • In-depth video guide "Globus for Research Data Management" by Rachana Ananthakrishnan, Globus; presented at the Argonne Training Program on Extreme-Scale Computing, Summer 2015 (53 minutes).

GM/CA @ APS is an Office of Science User Facility operated for the U.S. Department of Energy Office of Science by Argonne National Laboratory

UChicago Argonne LLC | Privacy & Security Notice | Contact Us | A-Z Index | Search