Cloud computing primer
What is cloud computing?
Cloud computing—also known as the public cloud, the commercial cloud, or just “the cloud”—is a massive computing resource accessed via the internet. Cloud computing has five components:
- Compute—Virtual machines ranging from small to extremely powerful
- Storage—Effectively infinite
- Networking—Connectivity both local and with the internet
- Data management—Databases and related structure
- Services—Special features such as streaming data analytics
Microsoft, Amazon Web Services, and Google are among the largest cloud providers. They offer their computers, storage and networking hardware as a service, through the web, known as “Infrastructure as a Service” (IaaS). They also provide higher-level “Platform as a Service” (PaaS) on top of those basic services, such as data management and networking tools that range from operating systems to databases and many more add-on services. All these resources are maintained in secure data centers in multiple geographic locations. Customers access resources online, buying as little or as much as they need.
The UW works closely with public cloud vendors to provide research teams with cost-effective cloud access, and meet federal compliance regulations (HIPAA, FISMA, FERPA). Read below for more information on the benefits of cloud computing for research, including cost, security, usability and scale.
How the cloud applies to research computing
Cloud computing scales down or (way) up to support research computing without the hassles of operating and maintaining your own computing environment. You can implement processing pipelines, build and test modifications, create websites and web applications, implement databases, and securely manage your research computing environment. You can collaborate and easily share your data with others. Specifically, you can:
- Archive data cheaply with extremely high reliability
- Build out familiar data systems (Postgres, MySQL, Spark)
- Create automated secure data feeds that extract from / load to your data system
- Maintain data repositories for other researchers to access
- Manage access to your cloud resources across your research team, and with collaborators around the globe
- Learn about capabilities and features of the cloud that may be new to you
- Run MATLAB, R, Python or any other programming environment
- Create an accurate estimate of what your use of the cloud will cost
- Easily pull your data and software out of the cloud if you so choose, or move to a different cloud
The cloud is paid for as a utility
You pay for what you use and turn resources off when you are done. At the UW, cloud computing is not subject to indirect costs.
Two important points to keep in mind about costs:
- Cloud vendors such as Amazon and Microsoft are promoting cloud adoption by making research grants available. This means that you can easily apply for and obtain a year of cloud credit to explore using the cloud with no financial risk
- An hour of compute time on a moderately powerful computer and a gigabyte-month of storage each cost about three cents, a starting point for thinking about cost
The cloud is secure with appropriate steps
The cloud is physically and programmatically secure, provided you take the appropriate precautions. As part of our cloud consulting service we will help you develop, document and follow the necessary steps to secure your data. We provide a growing set of guidelines and procedures pertaining to cloud security that includes compliance with HIPAA regulations (UW has a BAA with its cloud providers); and we are available to help you determine what you need.
Cloud computing allows you to scale your needs
Four elements make up the ‘cloud scale’ argument—how cloud computing resources scale to a given computational task. They include:
- Machine types: Cloud vendors have many types of computers available to match your work: memory-intensive, compute-intensive, general purpose, small-scale, moderate, powerful, GPU-based and so on
- Machine quantity: A parallelizable compute task can be scaled to multiple machines (hundreds or thousands) in order to complete the task in a short amount of time; this costs the same as using fewer machines and more time
- Start-up latency: There is no wait for cloud resources to become available
- Obstruction: Others are not blocked from doing their computational work when you are using the public cloud
Taming the learning curve: Helping you succeed
Like any new technology, using the cloud involves a learning curve. The key to adopting the cloud is making time to learn how to use it effectively. To help you evaluate whether that time investment will pay off in the long term, UW-IT Research Computing provides:
- Consultation
- Tutorials
- Links to excellent resources
- Dedicated training courses
- Direct contact with vendors
We work closely with the UW eScience Institute to support your data science needs. Contact us to get help.
Our goal is to help you overcome obstacles and make your path more about science and less about purchase orders, hardware and cooling, and operating system patches. We believe that the cloud can be a powerful tool to help you along that path.
In the cloud migration process, each case is unique. Sometimes cloud migration can be quite straightforward, but in cases where some additional help and training is needed, we are here to support you.
How the cloud compares to alternatives
UW-IT provides help and support for cloud computing and for managed services based on computing resources owned and operated by the University. Explore your options at Research Computing.
Cloud computing Q&A
Q: Is the cloud secure?
A: Yes. Physical resources reside in secure facilities, with independent auditing practices followed by the cloud vendors. Data are encrypted and host companies do not have access to those encryption keys. You (the account holder) manage access through user identities and access policies.
Q: How much does the cloud cost?
A: We start with the ‘three cents’ rule of thumb: A modest machine with one CPU will cost you three cents per hour. Storing one Gigabyte of data for one month costs three cents. We provide deep-dive consulting and a set of tutorials to help you get very accurate cost estimates. We can also help you write a proposal to a funding agency that includes cloud computing resources, which are not subject to indirect costs at the UW.
Q: Is the cloud powerful?
A: Yes, which is why so many researchers rely on the cloud in this era of Big Data. Individual cloud machines come in many state-of-the-art flavors: GPU-intensive, compute-intensive, memory-intensive, low network latency, general purpose and so on. But there’s also a double-win for intensive science computation in the cloud: You do not have to wait for resources to become available, and, if you can parallelize your work, you can spin up large (or very large) clusters to finish your tasks quickly.
Example: Suppose you have a highly parallel task that takes 40 hours on 20 nodes with 16 cores each. If you are sharing computing resources with other research groups, you may need to wait 24 hours—or maybe weeks—to use them, so your wall-clock time for one processing run becomes 64 hours or much longer. In the cloud, you can distribute your work among 2,000 machines, start immediately and run your work to completion in 30 minutes. The cloud has many strengths, and scalability is one of the biggest.
Q: How do I get started?
A: Visit our getting started getting started page. Also, read the questions below about funding your research migration to the cloud. Visit us by appointment or during office hours at the eScience Institute’s Data Science Studio or contact the UW-IT Research Computing team.
Q: How do I fund cloud computing?
A: You can receive one year of initial support through research computing credits by filling out a one-page application and coordinating your efforts with us. A research credit account good for one year will enable you to explore cloud computing at low financial risk. Once you determine whether the cloud works for you, there are various funding agencies that can help pay for your research. We can help you build a proposal budget for cost-effective use of the cloud in your research.
Q: Do NSF and NIH fund cloud computing?
A: Yes, absolutely. Both the National Science Foundation (NSF) and the National Institutes of Health (NIH) fund cloud computing. We can help you explore these options.
Q: Can I work with Private Health Information (PHI) on the cloud?
A: Yes; and because (HIPAA) compliance is such an important topic, we encourage you to consult with us and connect with other research teams in working out implementation details.
Q: Can I put a database in the cloud?
A: Yes, in two ways. You can allocate a virtual machine in the cloud, install your favorite database on it, and operate it as if you were operating a database server that you own. You can also simply pay for a database-as-a-service and dispense with worrying about the underlying machine, operating system or installation of a data base management system. Both options have supporting arguments and we can help you decide which path is best for you.
Q: Do you offer any training?
A: UW-IT regularly holds training days and workshops that are coordinated with public cloud vendors, particularly Amazon Web Services (AWS) and Microsoft. We are also welcoming Google participation into the UW cloud computing arena and hope to welcome other vendors. Visit our training calendar.
Q: Once I get started, am I locked in with a particular vendor? Am I locked into the cloud forever?
A: No, you are not locked in. We show you technologies such as Docker containers that can be moved between cloud environments as you see fit. We can also help you with the details of data archival and relocation and with comparison of cloud vendor services.
Because cloud-based technology so closely mirrors traditional computing, you will find that the flexibility of computing translates well to the cloud, including the flexibility to pack up and move on when you so desire.