CockroachDB’s IMPORT INTO command can ingest data from PostgreSQL, but it’s not a simple lift-and-shift; it’s more like a guided translation where CockroachDB rewrites your PostgreSQL schema and data into its own distributed, transactional format.

Let’s see this in action. Imagine we have a PostgreSQL table and we want to bring it into CockroachDB.

PostgreSQL users table:

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO users (username, email) VALUES
('alice', 'alice@example.com'),
('bob', 'bob@example.com');

Now, let’s prepare for the import into CockroachDB. We’ll export the data from PostgreSQL in CSV format.

1. Export from PostgreSQL:

psql -d mydb -c "\COPY users TO 'users.csv' WITH CSV HEADER"

This creates users.csv:

user_id,username,email,created_at
1,alice,alice@example.com,2023-10-27 10:00:00+00
2,bob,bob@example.com,2023-10-27 10:00:00+00

2. Create a compatible schema in CockroachDB:

CockroachDB doesn’t have SERIAL types directly. Instead, we use DEFAULT unique_rowid() for auto-incrementing primary keys. TIMESTAMP WITH TIME ZONE is supported.

CREATE TABLE users (
    user_id BIGINT PRIMARY KEY DEFAULT unique_rowid(),
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now()
);

3. Import into CockroachDB:

We’ll use the IMPORT INTO command, pointing it to our CSV file.

IMPORT INTO users (user_id, username, email, created_at)
FROM CSV 'users.csv'
WITH
  skip_header = true,
  -- If you don't provide user_id in the CSV,
  -- CockroachDB will generate them using unique_rowid()
  -- but since we have it, we specify it.
  column_names = 'user_id,username,email,created_at';

If users.csv was in S3, you’d use FROM 's3://your-bucket/users.csv' WITH ... and provide AWS credentials.

The IMPORT INTO command’s magic:

When IMPORT INTO runs, it doesn’t just shove bytes around. It parses the CSV, validates against the target table’s schema, and then writes the data in CockroachDB’s native, distributed key-value format. For primary keys, if they are unique_rowid() or default_int(), CockroachDB manages allocation. For other types, it ensures uniqueness and integrity. It also handles the distribution of data across nodes, creating ranges and splitting them as necessary for optimal performance and scalability.

The core problem IMPORT INTO solves is bridging the gap between PostgreSQL’s relational storage model and CockroachDB’s distributed, MVCC-based storage. It translates SQL types, handles primary key generation, and ensures transactional consistency during the import process, even for very large datasets.

One of the most subtle aspects of IMPORT INTO is how it handles data type conversions. While TIMESTAMP WITH TIME ZONE maps directly, SERIAL needs the DEFAULT unique_rowid() translation. If you have complex types like JSONB in PostgreSQL, you’ll need to pre-process them into a string format that CockroachDB can then parse into its own JSONB type, or adjust your import strategy. IMPORT INTO expects the data in the CSV to match the target table’s schema expectations, not the source.

The next challenge you’ll likely face is optimizing query performance on the newly imported data, especially concerning secondary indexes and table partitioning.

Want structured learning?

Take the full Cockroachdb course →