CockroachDB’s distributed nature means it doesn’t have a single "lock" in the traditional RDBMS sense, but "online" schema changes here is about avoiding user-visible latency spikes or transaction aborts during schema modifications.
Let’s see it in action. Imagine we have a users table:
CREATE TABLE users (
id UUID PRIMARY KEY,
username STRING UNIQUE,
email STRING,
created_at TIMESTAMPTZ DEFAULT now()
);
Now, we want to add a last_login timestamp column. In many databases, this would lock the table for the duration of the operation, potentially affecting live application traffic. In CockroachDB, we can do this:
ALTER TABLE users ADD COLUMN last_login TIMESTAMPTZ;
While this command runs, your application can continue to read and write to the users table without interruption. No explicit locking mechanism is blocking INSERT, UPDATE, or DELETE operations on rows, nor are SELECT queries affected by the schema change itself. The new column will appear as NULL for existing rows until explicitly updated.
The core problem this solves is enabling database evolution without impacting application availability. Traditional locking-based schema changes create maintenance windows and introduce risk. CockroachDB’s approach allows for continuous deployments and agile development cycles.
Internally, CockroachDB handles schema changes as a series of distributed, asynchronous operations. When you execute an ALTER TABLE statement, the database doesn’t immediately update all data files or metadata across all nodes. Instead, it initiates a multi-phase process:
- Schema Change Request: The initial
ALTER TABLEcommand is sent to the cluster. - Schema Versioning: The schema change is assigned a new schema version. All nodes in the cluster will eventually adopt this new version.
- Metadata Update: The schema information is updated in the cluster’s distributed key-value store (using Raft for consensus). This update is very fast and doesn’t block data operations.
- Staged Rollout: The actual schema modification (e.g., adding a column) is then applied to the data in a staged, background process. This involves updating the internal representation of table data and ensuring that new reads and writes respect the updated schema. Importantly, older versions of the schema can still be read by transactions that started before the change, ensuring consistency.
- Garbage Collection: Once all nodes have acknowledged the new schema and all in-flight transactions have completed, the old schema artifacts are garbage collected.
The levers you control are primarily the ALTER TABLE statements themselves. CockroachDB handles the complexity of the distributed rollout. You can monitor the progress of schema changes using SHOW JOBS.
For example, after running ALTER TABLE users ADD COLUMN last_login TIMESTAMPTZ;, you can check its status:
SHOW JOBS;
This will show a job with job_type SCHEMA_CHANGE and a status that progresses from RUNNING to SUCCEEDED.
The one thing most people don’t realize is that even though the schema change appears "instantaneous" from an application perspective, the underlying data modification and metadata propagation is a sophisticated, multi-stage, distributed process. CockroachDB uses a combination of schema versions and background workers to ensure that data remains accessible and consistent throughout the entire lifecycle of the schema alteration. Transactions can continue to operate using the "old" schema version until they are ready to adopt the new one, preventing cascading aborts or long waits.
The next concept to explore is how to efficiently backfill data for newly added columns without impacting performance.