CrateDB Fundamentals

Menu

Hands-on: Time Series Data

CrateDB Academy: CrateDB Fundamentals

 

This exercise requires a CrateDB cluster with the course dataset loaded.  If you didn't create your cluster earlier in the course, complete these steps before going further.


Several tables in the course data set contain time series data. These are:

  • three_eleven_calls
  • beach_weather
  • taxi_rides

Open your CrateDB cluster’s console, and we’ll try running some queries using this data.

This query uses CrateDB’s min_by function. min_by returns the value of the field specified in its first parameter from the record where the value of the field specified in its second parameter is lowest. Here, we’re asking CrateDB to tell us what the minimum temperature recorded at each weather station was, and when that occurred.

Run this query in your CrateDB console:

SELECT station,
   min(airtemp) as min_temp,
   min_by(ts, airtemp) AS time_of_min_temp
FROM beach_weather
GROUP BY station;

You’ll see that winter can get very cold in Chicago!

Downsampling is a way of reducing the number of data points in a data set. This query reduces the hourly observations at one of the weather stations to return the average temperature over four week periods. Run this query and browse through the output, paying particular attention to the date values:

SELECT
date_bin('4 weeks'::INTERVAL, ts, 0) AS date,
   AVG(airtemp) as avg_air_temp
FROM
   beach_weather
WHERE
   station = 'Foster'
GROUP BY
   date
ORDER BY
   date DESC;

Notice how you can spot the seasonal variations in temperature more easily now.

Try out some time series queries of your own, referring to the Time Series Data section in the CrateDB documentation for inspiration.