CrateDB Fundamentals

Menu

What is CrateDB?

CrateDB Academy: CrateDB Fundamentals

 

This video provides a high level overview of CrateDB. You'll see what we mean by a multi-model database, how multi-model data can be queried with SQL, and we'll look at flexible data schemas and high level architecture.

What is CrateDB? CrateDB is a hyper-fast open source multi-model database combining the simplicity of SQL and flexible schemas with the scalability of a distributed architecture. CrateDB is a multi-model database. Its strength lies in the efficient handling of multiple data models within the same database and even within the same record in a table. This allows you to use one database for all structured, semi-structured and unstructured data. For example, tabular data, JSON or documents, time series, vectors, geospatial, full text and binary objects. You can create hybrid data models comprised of structured and semi-structured data all in one place.

CrateDB eliminates the need to manage and synchronise multiple database technologies and learn different languages by offering unified access via the well known SQL language. All data models are accessible via SQL, allowing for complex queries, full text and vector search. Additionally, CrateDB implements the PostgreSQL wire protocol, ensuring compatibility with the wide ecosystem of tools, libraries and drivers.

Here's an example SQL query that demonstrates the multi-model power of CrateDB. Many U.S. cities have a citizens hotline for reporting issues such as potholes in the road, street lights that don't work, and other issues. New issues are usually reported by calling 311.

As part of our data set, we have records for 311 calls made in the city of Chicago. In this SELECT query, we're asking CrateDB to tell us the four most common types of 311 call reported in an area around Chicago's O'Hare Airport. Here, locationdetails is a column of type OBJECT containing JSON data. Within that JSON data, the field location contains the latitude and longitude for the area that the issue was reported in. We're asking CrateDB to consider only calls whose location falls within the provided geopolygon representing the area around O'Hare Airport, and we want to get a count of each different type of call made from that area, returning only the four top call types. The most common call type by far is to report noise from passing aircraft. This is perhaps not surprising for an area near one of the world's busiest airports.

JSON data can be stored in an OBJECT field in a table in CrateDB. Object fields created with the default dynamic policy are used to store arbitrary JSON. Here, we're storing a JSON object in the locationdetails field of our 311 calls table. This object can contain any structure. In this example, we have an object containing a nested object and a couple of arrays. To ensure the best query performance, CrateDB indexes these objects automatically for us.

CrateDB can also allow flexibility in table schemas. Let's create a table to store details of Chicago's taxi drivers. Here I want to store the name and licence number as TEXT, as the licence number may contain alphanumerics, and I want a BOOLEAN field for whether or not the driver is a first aider. Note that I'm telling CrateDB to use a dynamic column policy here. This tells CrateDB that we might add or update data in ways that can change the table schema. As a side effect, this allows us to easily evolve our schema as our data changes over time.

Let's insert a record with values corresponding to the columns we specified when creating the table. I've inserted a record for myself with licence number ABC956 and a true flag for being a first aider. Inserting this second driver, we've referenced an additional column penaltypoints. This insert succeeds as we've defined the drivers table to have a dynamic column policy.

Here we've asked CrateDB to show us the CREATE TABLE statement for the taxi drivers table. Note that the table now has an extra column penaltypoints. CrateDB has inferred the type BIGINT for this column based on the data provided when our second driver was inserted. Records added before the table schema was changed will have a null value for the penaltypoints column. If you know your data schema, CrateDB's STRICT option for objects and tables enables the database to enforce it for you.

In this version of the 311 table, objects must match the declared structure to be inserted. Similarly, our table schema can be enforced using the STRICT column policy when creating the table.

Finally, let's talk about CrateDB's architecture. CrateDB is a distributed database with a shared nothing architecture. Data is stored on multiple nodes and automatically balanced across nodes in a cluster. It scales horizontally and provides high availability with self-healing characteristics. CrateDB's distributed SQL engine enables queries, aggregations, joins and full text searches with the power of Lucene to be performed efficiently at speed. CrateDB can be deployed in the cloud, at the edge, on premise or as a hybrid deployment.