Information technology tools and resources at the UW
If you are new to SQL, these simple examples will help you begin thinking about ways to query your data. This tutorial will take you through the following operations:
- Finding a dataset
- Creating three different queries of increasing complexity
- Joining data from two datasets
We have also provided simple procedural how-tos to get you started:
Note: We strongly recommend using the Microsoft Edge browser rather than Internet Explorer 11. Microsoft Edge provides superior performance for both logging in and uploading files to SQLShare.
We’ll use a simple dataset to begin working with queries. FirstLinePoetTitle.csv contains the first lines of poems, names of poets, and in some cases, a title.
First we’ll find the dataset in SQLShare.
I. Find the dataset
- To find FirstLinePoetTitle.csv, in the left sidebar, click All. In the search field, enter “poet.”
- To open the dataset, click FirstLinePoetTitle.csv. We want to save the original as is and create a new dataset. To do so, click Derive.
Now we have a new dataset that we can query and we have kept our original dataset intact.
II. Create a new query
We will use our new dataset to create a new query that extracts all the first lines that contain the word “thee.” That query reads as follows:
WHERE FirstLine LIKE ‘%thee’
About this query:
- SELECT tells SQLShare which data you want to pull. For this query, we are pulling from the column titled FirstLine. When you write your query, make sure to use the exact spelling of the name of the relevant column.
- FROM tells SQLShare which dataset you’re pulling data from.
- WHERE and LIKE qualify the type of data. LIKE ‘%text’ is the SQL way to say: says, pull only the first lines that contain the word “and.” (You would get different results if you wrote LIKE ‘%and%’ or LIKE ‘and%’. Extra credit: Experiment with these variations and see what happens.)
- You can remove everything from the first SELECT command through the UNION command. Those lines aren’t needed for creating and running new queries.
Create a new query:
- Select and copy the following commands:
WHERE “FirstLine” LIKE ‘%thee%’
- Under New Query, select all the text and paste the commands you just copied.
- To run your new query, click Run Query.
- To save your new dataset, click Save Dataset. This will save the dataset under your name in SQLShare. To find it again later, in the left sidebar, click Yours.
III. Create a second query
The query we just created gives us the first lines that contain the word “and,” but we don’t know which poets wrote those lines. That’s not a particularly helpful or interesting set of data.
To create a more interesting dataset, we’ll add another column name to the SELECT command. For example, to see the title in addition to the first line, I’ll add Title to the SELECT command: SELECT “FirstLine”, “Title”.
To narrow the query a bit further, we’ll specify one particular poet whose first lines we want to see. To do this, we exchange ‘%Yeats’ for ‘%and’ in the WHERE command line.
The original dataset does not include titles in every row. In this new query, we want to see only those first lines that have a title associated. So we’ll add a “not null” line to the query. All together, we’re saying in SQL, “show us all the first lines and titles of poems by Yeats that contain titles in our dataset.”
We will begin by opening the query we just saved.
Match first lines and titles of poems by Yeats:
- In the left sidebar, click Yours.
- Select and copy the following commands to the space under SQL QUERY.
SELECT “FirstLine”, “Title”
WHERE “Poet” LIKE ‘%Yeats’
AND “Title” is not null
- Repeat steps 2-4 in the above procedure. The results appear as follows:
IV. Count instances of kinds of data
The data in FirstLinePoetTitle.csv shows multiple first lines by the same poet. Say we want to know how many poems are represented by each poet. To do this, we introduce the COUNT command. And we indicate how we want the data organized (“group by Poet”).
In SQL syntax, the query appears as follows:
SELECT “Poet”, count(*) FROM sqldemo.”FirstLinePoetTitle.csv” group by “Poet”
Select that string, copy it and paste it into the area below SQL QUERY, as you have done in the previous procedures.
The results should appear as follows (this is just a partial view of the entire dataset):