CockroachDB’s MVCC garbage collection doesn’t actually delete data; it just makes old versions of rows invisible to new transactions.

Let’s watch a simple UPDATE statement and see what happens under the hood.

-- Initial state
CREATE TABLE users (id INT PRIMARY KEY, name STRING);
INSERT INTO users VALUES (1, 'Alice');

Now, let’s update Alice’s name.

UPDATE users SET name = 'Alicia' WHERE id = 1;

If you were to inspect the data right now using CockroachDB’s internal tools (which we’ll get to), you’d see two versions of the row for id = 1. The first version has name = 'Alice', and the second, newer version has name = 'Alicia'. The UPDATE statement didn’t erase the old row; it just created a new one with a later timestamp. This is the Multi-Version Concurrency Control (MVCC) in action, ensuring that ongoing transactions can still see the "Alice" version of the row while new transactions see "Alicia".

So, how does that old "Alice" row eventually get cleaned up? This is where garbage collection (GC) comes in. CockroachDB periodically scans for data that is no longer needed. What constitutes "no longer needed"? It’s data whose timestamp is older than the maximum timestamp of any active transaction.

Imagine you have a transaction that started before you ran the UPDATE statement. That transaction might still need to read the "Alice" version of the row. If GC ran immediately and deleted the "Alice" row, that transaction would fail or return inconsistent results. To prevent this, CockroachDB keeps old versions around until it’s absolutely certain no active transaction needs them.

The key mechanism is the gc.ttlseconds setting. This is a system-level configuration that dictates how long old versions of data are retained. The default is 25 hours. This means that a row version will be kept for at least 25 hours after it’s no longer visible to any new transaction.

Here’s how to check the current setting:

SHOW ZONE CONFIGURATION FOR SYSTEM DEFAULT;

You’ll see output that includes gc.ttlseconds: 90000 (which is 25 hours in seconds). If you wanted to reduce this to, say, 1 hour (3600 seconds), you would run:

ALTER ZONE CONFIGURATION FOR SYSTEM DEFAULT USING gc.ttlseconds = 3600;

When the GC process runs (it’s a background process), it looks at the data and the timestamps. For a given row, if all versions older than max(transaction_start_timestamp) are also older than current_timestamp - gc.ttlseconds, then those old versions are eligible for removal. It’s a conservative approach: if there’s any doubt, the data stays.

The actual deletion isn’t immediate. Once a version is marked for deletion by GC, it’s eventually purged from disk by background compaction processes. You won’t see the storage disappear the moment GC marks it; it’s a two-step process.

A common misconception is that gc.ttlseconds applies to the creation time of the old version. That’s not quite right. It applies to the time that version became stale – meaning, the point in time when a new, newer version of the same row was written. The GC process then waits for gc.ttlseconds beyond that point to ensure all potential readers have passed that timestamp.

The most counterintuitive part of this whole process is that even after you’ve deleted a row or updated it to a new value, the old data physically remains on disk for a significant period. This is a deliberate design choice to support ACID transactions, distributed consistency, and features like point-in-time restores, which rely on having access to historical data. The storage isn’t truly freed until the garbage collection process has completed its cycles and the underlying storage engine has performed its compaction.

The next thing you’ll likely encounter is understanding how gc.ttlseconds interacts with max(transaction_start_timestamp) when you have very long-running transactions.

Want structured learning?

Take the full Cockroachdb course →