CrateDB Fundamentals

Menu

Importing and Exporting Data

CrateDB Academy: CrateDB Fundamentals

 

There are several ways to bulk load data into CrateDB. In this video I will walk you through the different options and will look at a few ways to create records with the INSERT statement. We will also briefly consider the options available for bulk data export from CrateDB. CrateDB accepts bulk data imports from a variety of sources. You can use a local file, import data from a URL, or connect to the popular Amazon S3 or Azure Storage Container Services. Let's explore these, beginning with the CrateDB Cloud interface, then with the COPY FROM SQL statement. Finally, we'll learn how to populate a table in CrateDB by selectively copying data from an existing table.

Before importing data, we first need to create a table. This SQL statement creates a table to hold details of Chicago's taxi fleet, and this file contains the fleet data in CSV format. Moving to the Import tab, we can see our import options. We're going to import from a file, but let's first look at some of the other options available to us. We can choose a URL or an S3 bucket, or we can import from Azure. In the case of Azure and S3, we can provide the appropriate secrets as necessary. Returning to the File tab, I'm going to drag and drop the file onto this area and you'll see that CrateDB's interface has inferred the file type. We can select other supported file types, and we can also choose whether or not the file is Gzip compressed.

Next, we need to provide a table name to import the data into. In our case it's taxis. We're all ready to go. So we'll click the import button here and the import job has started. Now we can see it succeeded. Let's move to the console and change this query for one that selects a few records in the table to demonstrate that they were in fact created. So SELECT * FROM taxis LIMIT 10. When we execute this, we can see that we got the first 10 taxis back and the data was in fact imported.

The COPY FROM statement is another way to import data into CrateDB. This method supports the file types and locations that we saw when importing a file previously. Let's use COPY FROM to populate our Chicago 311 calls table using a Gzip JSON file. It's a GitHub URL as the data source. First, I need to create the 311 calls table with this statement here. Then, run COPY FROM.

What we see here is that I'm copying from a GitHub URL and specifying that the compression is Gzip. I'm asking CrateDB to return a summary of the results, so let's execute that. And here we see the results summary. We've successfully imported 174,000 and 92 records. Let's look at these by running a simple SELECT query. SELECT * FROM three_eleven_calls LIMIT 10. Let's run that query, and you'll see that we have the data in the table. Tables in CrateDB can be populated with data from other tables.

Let's use our taxis table as the data source to create a new table containing a subset of the taxi data. First, I'll create a second table with the same schema as the taxis table. Now, let's populate that by selecting only the records from the taxis table where the vehicle make is Toyota and the model is Camry. This query populated the new table with 1098 records, and we can see that they're all Toyota Camrys with a simple SELECT query. As expected, we have 1098 rows, each of which contains data for a Toyota Camry taxi.

New rows can of course be inserted into tables with the INSERT statement. I'll show you three different methods. First, there's a simple INSERT for a single row. We're adding a new taxi to the taxis table. Here's an INSERT statement that uses the UNNEST function to insert multiple rows at once. UNNEST produces rows from arrays.

This statement inserts rows for a black 2024 Ford Explorer and a yellow 2022 Toyota Sienna. Let's run it now, and then let's select the resulting rows. So SELECT * FROM taxis WHERE vehicleid > 7000. And we can see here that we've inserted those two rows. This method offers dramatic performance improvements over the corresponding number of individual inserts, and the query only needs to be passed and planned once and requires less network traffic.

Finally, multiple rows can also be inserted with multiple value expressions like so. This method is generally slower than using UNNEST, as parsing the SQL is more expensive. However, this query is more readable than the UNNEST equivalent.

Sometimes you'll want to export data from CrateDB back to a file. Using the export option in CrateDB Cloud, I'll export the Toyota Camry taxi table that we created earlier. We input the table name here, and let's choose JSON from the available formats here. There's no need to compress the file, as there aren't too many records in this table. Pressing export starts the job in the cloud. Once the export has finished, we can download the file from the cloud. Here's our data in JSON lines format.

If you're running CrateDB locally, including Docker, you should check out the documentation for the COPY TO command. This allows you to export data from a locally running CrateDB cluster to files on your file system.

In this video, we covered the various ways of importing data into CrateDB. We also touched on the different ways that records can be added to tables with the INSERT statement. And we finished by learning how to export data from CrateDB.