Introduction to filtering (WHERE)
In the SELECT section of the tutorial, you learned how to pick columns from a table using the select function. By running the different queries, you were presented with the columns you selected and all the rows of data from the raw data set. Next, you’ll learn how to section this data by applying certain filters. For example: how many customers have made more than one purchase, or how many customers have unpaid invoices. There are different methods to filter rows in a data set, but filtering, in general, is used all the time when browsing through data or performing your analysis.
It’s important to keep in mind that as a principle, one row of data belongs together when working in SQL. If you’re used to working in Excel, you can delete one cell and move the remaining cells up or to the side. That’s not the way to work in SQL — here you create a filter based on certain values in cells, which determines whether the whole row will be included in the table or not.
The WHERE clause in SQL is a very simple way to filter out rows in your data set based on the content of it. The WHERE statement comes after the two other clauses learned earlier, SELECT and FROM, and it works more or less in plain English. Say you want to see all your orders, but only where the shipping address is to Australia. In that case, your query could look like this:
1select
2 *
3from
4 {{raw.e_commerce_sample.webshop_order}}
5where
6 shipping_address_country = 'Australia'
As you see in the example above, the query is written out over several lines, but this has no influence on the result you’ll get. You could write it all in one line and get the exact same result, so the different lines are only added to make the query easier to read for other people. Try to write the query yourself. Grab the data from the webshop_order table in the Weld Editor. You should get a result that looks like this:
Note that 'Australia' is put in a single quotation mark (') which specifies the beginning and end of a non-numerical string of data. When specifying conditions in your queries that are non-numerical, these should always be encapsulated by single quotation marks.
Numerical conditions can be typed directly in your query without quotation marks. To test this out, you need to use a table containing numerical values. Apart from the Customer and Order table you’ve used so far, the sample e-commerce data set also contains an Order_line table, which includes all the lines with prices and quantities of each order. These are numerical values. Say you want to find all lines in the orders where the price is exactly 10. You could write this condition without quotation marks in a query like this:
1select
2 *
3from
4 {{raw.e_commerce_sample.webshop_order_line}}
5where
6 price = 10