CouchDB can generate UUIDs for documents, but it’s not guaranteed to be globally unique without understanding its underlying strategy.
Let’s see CouchDB generate a UUID for a new document.
// Using Node.js with the nano client
const nano = require('nano')('http://admin:password@localhost:5984');
const db = nano.db.use('my_database');
async function createDocument() {
const doc = {
_id: null, // CouchDB will generate this
name: "Example Document",
value: 123
};
try {
const response = await db.insert(doc);
console.log("Document created with ID:", response.id);
return response.id;
} catch (err) {
console.error("Error creating document:", err);
}
}
createDocument();
When you run this, CouchDB will assign a UUID to the _id field if it’s null or omitted. The UUID it generates is a standard RFC 4122 version 1 (time-based) UUID.
How CouchDB Generates UUIDs
CouchDB’s default UUID generation strategy relies on a combination of factors to produce these identifiers. It aims for uniqueness but has specific trade-offs depending on the configuration and environment.
- Timestamp: The primary component is a timestamp, specifically a 60-bit timestamp representing the number of 100-nanosecond intervals since midnight, October 15, 1985 UTC. This makes the UUID time-ordered, which can be beneficial for certain types of queries or data ordering.
- Clock Sequence: To handle cases where the clock might jump backward (e.g., due to system clock adjustments), a 14-bit clock sequence is included. This sequence is incremented when a clock discontinuity is detected, ensuring uniqueness even if the timestamp resets.
- MAC Address: The final component is a 48-bit node identifier, which is typically derived from the MAC address of the machine running CouchDB. This is the crucial part for ensuring uniqueness across different nodes in a distributed system.
Choosing Your Strategy
While the default RFC 4122 v1 UUID is generally robust, CouchDB offers flexibility. The key is how the node identifier (MAC address) is handled.
- Default (MAC Address): If CouchDB can reliably detect a unique MAC address on the host machine, it uses that. This is the most common scenario and provides strong uniqueness guarantees across multiple CouchDB nodes. The
httpdprocess in CouchDB probes for network interfaces and selects one. - Random Node ID: In environments where MAC addresses might be duplicated or unavailable (like some containerized deployments or virtual machines without stable MACs), CouchDB can fall back to generating a random 48-bit node identifier. This is less guaranteed to be globally unique if multiple CouchDB instances coincidentally generate the same random ID, though the probability is astronomically low.
- Explicit Node ID: For ultimate control, you can explicitly configure a node ID. This is done via the
[couchdb]section in yourlocal.iniordefault.iniconfiguration file, using thenode_idparameter. This is useful for testing or in highly controlled environments.
The Surprise: Timestamp Order and Its Downside
The fact that CouchDB generates time-based UUIDs means they are mostly sortable chronologically. This sounds great for performance, as documents created around the same time might be stored physically close to each other on disk. However, this can lead to a significant performance bottleneck: hotspotting. If you have a very high write throughput on a single CouchDB node, all new documents will be written to the end of the database file, causing constant disk head seeks and contention. This is especially problematic for SSDs, as it can also contribute to accelerated wear.
The node_id component is what prevents collisions when multiple CouchDB nodes are generating UUIDs concurrently. If your node_id is derived from a MAC address, and each node has a unique MAC, then even if two nodes generate a UUID at the exact same microsecond, the resulting UUIDs will differ due to their distinct node_ids.
If you’re experiencing write performance issues on a high-throughput CouchDB cluster, consider using a UUID generation strategy that doesn’t rely solely on sequential timestamps. CouchDB itself doesn’t offer a built-in "random-only" UUID generation for document IDs; the RFC 4122 v1 is the default. However, you can pre-generate UUIDs using a different strategy in your application code before inserting documents, by calling db.insert({ _id: generate_random_uuid(), ... }).
The next challenge you’ll likely face is managing document conflicts when using custom UUID generation strategies in a distributed setup.