GROUP BY and ORDER BY
In the last section you learned how to make aggregations across the entire table, which enables you to get insights about the whole data sample. So you've now gone from looking at all rows in a table to being able to summarize all of these into one row. But sometimes there will be situations where you'll need to summarize the results on a level in between looking at all the rows in a table and only looking at one. For example, you might want to see the sum of your sales for each month, the sales for each order, or the average sales for each customer.
To do this, you want to use the aggregate functions on certain defined groupings. You can use the GROUP BY function to do this. For example, to find how many orders each customer has made:
See the query here:
1select
2 customer_id
3 , count(id) as number_of_orders
4from
5 {{raw.e_commerce_sample.webshop_order}}
6group by
7 customer_id
This is how the GROUP BY function works. You write your aggregate function as you would otherwise, and then add the column(s) you want your resulting table to be grouped by. For example, if you’d like to count how many orders you get per year, a query to investigate this could look like this:
1select
2 count(id) as orders_received
3 , extract(
4 year
5 from
6 (parse_date('%d / %m / %Y', received_at))
7 ) as recived_at_year
8from
9 {{raw.e_commerce_sample.webshop_order}}
10group by
11 recived_at_year
As you see, you can use the count of the id column in the Order table to get the number of orders in the same way as you did before, but on top of this, you also need to create a new column to extract only the year from the received_at date column. After the FROM statement, you also add a GROUP BY statement to define how you’d like to put your aggregations into groups. In this case, you’d like these aggregations to be grouped by the new column you just made, received_at_year. If you try it on your own, the result would look like this:
As you see, the result looks a little weird with a lot of orders in 2021 and few in 2020 and 2022. You can take a deeper look by adding another level of granularity:
1select
2 count(id) as orders_received
3 , extract(
4 month
5 from
6 (parse_date('%d / %m / %Y', received_at))
7 ) as recived_at_month
8 , extract(
9 year
10 from
11 (parse_date('%d / %m / %Y', received_at))
12 ) as recived_at_year
13from
14 {{raw.e_commerce_sample.webshop_order}}
15group by
16 recived_at_year
17 , recived_at_month
This time, add one more column extracting the month from the received_at date column, and group by this column. The result will look like this:
Now the 12th month in 2020 and the 1st month in 2022 are the only months included, so the result seems to make sense. And this is how the GROUP BY function works — defining which column(s) you’d like your aggregations to be grouped by.
The result of the grouped count of orders is a bit messy, and it would be nice to have the results ordered chronologically. You can use the ORDER BY function to define how you'd like the results ordered. This function is written similarly to the GROUP BY function and is simply added after the GROUP BY function. For example:
1select
2 count(id) as orders_received
3 , extract(
4 month
5 from
6 (parse_date('%d / %m / %Y', received_at))
7 ) as recived_at_month
8 , extract(
9 year
10 from
11 (parse_date('%d / %m / %Y', received_at))
12 ) as recived_at_year
13from
14 {{raw.e_commerce_sample.webshop_order}}
15group by
16 recived_at_year
17 , recived_at_month
18order by
19 recived_at_year
20 , recived_at_month
To order the results by both year and month, add both to the ORDER BY statement. If you type this in the Weld Editor on your own, your result should look like this:
You can define whether the ORDER BY function should order the value in ascending or descending order by adding ASC or DESC to the end of the statement. If you want to reverse the order in the query above, you could change the ORDER BY statement to recived_at_year desc, recived_at_month desc.
The ORDER BY function statement not only works for aggregated data, it can also define the order of all rows in a table. If you want to play around with this, you can go back to some of the queries you wrote earlier.
Exercise 9: Find the total revenue for each order, and order it from largest to smallest
As noted at the end of the last exercise, finding the highest value of the Order lines doesn’t make much sense, since each order can have several lines. Now with the GROUP BY function in your toolbox, you can instead try to find the largest order on your webshop. All the data you’ll need is in your order_line table. As each order has a unique order id that’s included in the order_line table, you can group it by these values. Order your table from the largest to the smallest revenue per order.
If you run your query, it should give a result like this:
As you can see, the order with the highest value is the order with 48vZj27vQz as the id.
In SQL there are often many means to an end, so you’ll usually be able to get the required result by writing different variations of the same query. A simple way to solve the exercise above would be to write a query like this:
1select
2 order_id
3 , sum(quantity * price) as total_revenue_per_order
4from
5 {{raw.e_commerce_sample.webshop_order_line}}
6group by
7 order_id
8order by
9 total_revenue_per_order desc
You might remember from looking at the raw data in this table that the orders are actually in different currencies. To convert the values into one currency, you’d need to add another step in your analysis. This way, you could find the real revenue for the e-commerce business per month. To do this, you’d need to know the conversion rates between all the currencies in the table. From there, you could calculate it in the editor. If you want to get a deeper look into the sales in the sample e-commerce data, feel free to push yourself with this exercise.