IT Connect

Information technology tools and resources at the UW

Using Your Data

The University of Washington’s scalable scientific computing cyberinfrastructure offers robust capabilities for managing, accessing, transferring and archiving data. The following provides an overview with links to more information.

Transferring Your Data

Fast data transfer using UW’s High Speed Research Network

Hyak, UW’s shared high performance computer cluster, and lolo, on-site large scale storage, are connected to the UW High Speed Research Network (HSRN) and Science DMZ, which enables high throughput data transfers to points outside the UW, is designed to handle high-volume data transfers.

This network provides a 100G (Gigabits per second) backbone for researchers moving data between compute and storage resources in the campus data centers, as well as a 100GBs uplink to Internet2 and Energy Sciences Network (ESNet). Researchers with compute and storage resources or laboratory instruments elsewhere on campus have access to up to 30GBs of bandwidth to Hyak and lolo, as well as to off-campus destinations.

Campus-CCNIE-Plan-diagram-Aug262014

Click to view larger

How to transfer data

Transfers between Hyak and lolo are as simple as copying files between directories. For transfers elsewhere, UW-IT provides tools (such as bbcp and GridFTP) that enable throughput measured at ~750MBs to sites such as the Texas Advanced Computing Center in Austin.

Archiving Your Data

Easy, affordable data archiving using lolo Archive file service

The lolo Archive file service provides UW researchers with an affordable, convenient means to archive their data for long-term safekeeping. The lolo Archive is directly accessible from Hyak.

How to archive data

From campus, files may be uploaded via most popular ssh-based data transfer tools (including sftp, as well as GridFTP and bbcp) at rates up to 400MBs. Single-file downloads for large files (100GB or larger) proceed at 100MBs.

Uploaded files are cached on disk before being backed up to tape in a data center near Spokane. Once backed up, files are migrated to a second set of tapes in a Seattle data center.  Recalls are as simple as browsing the file system and opening a file.

Sharing Your Data

Easy and direct data sharing via lolo Collaboration file services

The lolo Collaboration file service is a disk-based file system directly accessible from Hyak, and available as a Microsoft Windows file share from campus, and via ssh-based tools, bbcp and GridFTP from anywhere.

For groups wanting to share data among peers on or off campus, all you need is a UW NetID. Sponsored NetIDs are simple to request for off-campus collaborators.

Processing your data—a complete solution

Big Data isn’t much use to anyone without Big Compute to grind through it all. Processing a 100TB data set on your laptop or even on a department-scale system is impractical. Moreover, even if large data sets can be stored in your laboratory, it’s still not of much use without very high bandwidth connections to CPU, a very large number of CPUs to perform the analysis, and the whole software stack necessary to manage the calculations. Hyak, lolo and the HSRN are a complete solution for many Big Data challenges.

Visualizing your data

Hyak offers powerful graphics processing capabilities and supported software packages for visualizing data.

  • Supported software pages
    Among software supported on Hyak is VisIt and ParaView, two packages that allow researchers to apply the power of dozens or hundreds of CPU cores to the challenge of visualizing Big Data.
  • Graphics processing capabilities
    Hyak includes more than 40 Nvidia Tesla GPUs (Graphics processing units) among its collection of more than 10,000 conventional CPU cores—a huge help for researchers wanting to leverage the power of graphical accelerators and programming abstractions such as CUDA.

Find out more

For details on using Hyak, see the Hyak users documentation.