CrateDB Fundamentals

Menu

Working with Time Series Data

CrateDB Academy: CrateDB Fundamentals

 

Time series data is everywhere and can be found across a multitude of industries. These include smart manufacturing, transport, logistics and the energy sector. The use of time series data for real time monitoring and forecasting is crucial to obtaining meaningful insights and to make data-driven decisions from huge and never ending data sets.

In this video, you'll see why CrateDB is an ideal time series database. Widely recognised as one of the leading time series databases, CrateDB offers unparalleled capabilities to process very large data workloads. Its Postgres compatibility and SQL native interface make CrateDB easy to integrate without having to learn new query languages or adopt new database driver libraries. Thanks to it's shared nothing distributed architecture and horizontal scalability, CrateDB can handle large volumes of data from sensors, IoT gateways, ERP systems and so forth, ingesting and indexing data quickly. Columnar storage helps to enable fast aggregations, ensuring that data is available to query in near real time.

CrateDB has built in functions to support time series use cases such as downsampling and interpolation. Thanks to its flexible data model, CrateDB can store other types of data such as relational, document, geospatial, BLOBs or vectors in the same database as your time series data, all of which can be combined in the same SQL query.

Let's briefly explore these concepts using our City of Chicago data set. The Chicago Park District maintains weather sensors at three weather stations along the Lake Michigan Waterfront. These sensors capture data about the temperature, humidity, and barometric pressure, as well as wind speeds and direction. Data is captured at one hour intervals.

This is a SQL statement that creates a table to store this information. The 'ts' field is our timestamp for this time series data. The sample data set contains over 150,000 records taken from three weather stations over a number of years. Unusually for the United States, temperatures in this data set are in degrees Celsius.

Let's perform some simple aggregations with this data. First, let's look at the average temperature at each weather station. When I run this query, you'll see that I get an average temperature for each of the three stations.

Now let's see what the highest temperature recorded at each weather station was and the time at which that occurred. This query uses CrateDB's MAX_BY function which takes two parameters. The 1st is the field to return. The 2nd is the field that we're looking for the maximum value of. So here we're going to return the timestamp at which the maximum air temperature was observed. Let's run the query.

We see for each station the maximum temperature observed and the time at which that event occurred. Large data sets with many data points can be challenging for humans to visualise. This graph shows all of the hourly temperature readings from the Foster weather station. There are 64,266 observations. Downsampling is a method of decreasing the number of data points in a data set. Our weather data observations are recorded at hourly intervals over several years. By downsampling this data, we more easily understand trends across time periods.

Here we've downsampled the data into four week periods, showing the minimum temperature recorded in each. With fewer data points, 105 in this case, it's much easier to get an idea of what's going on. CrateDB's DATE_BIN function is designed to help with this. Let's see how it works.

This query makes use of CrateDB's DATE_BIN function to bucket or bin the weather data into four week periods, effectively downsampling it. It returns the lowest temperature from the foster weather stations for each. Let's go ahead and run it.

As we can see, Chicago winters get pretty cold with a minimum air temperature of -22.78 Celsius in December. Consult the CrateDB documentation to learn more about downsampling with the DATE_BIN function. And see how CrateDB supports data interpolation with LEAD and LAG functions.

In this short video, we only scratch the surface of CrateDB's time series capabilities. If you'd like to take a deep dive into time series data with CrateDB, I recommend taking our course Advanced Time Series at the CrateDB Academy. And don't forget to try some queries of your own using your CrateDB cluster and the Chicago sample data set.