CockroachDB’s TLS certificate rotation is designed to be a seamless process, but the real magic isn’t in the rotation itself; it’s how the cluster maintains consensus on the new certificate before the old one expires, ensuring zero downtime.

Let’s see it in action. Imagine a cockroach cert rotate command output:

INFO: Initiating certificate rotation.
INFO: Generating new CA certificate.
INFO: Signing new node certificates.
INFO: Distributing new CA certificate to all nodes.
INFO: Verifying new CA certificate distribution.
INFO: Signing new client certificates.
INFO: Distributing new node certificates.
INFO: Verifying new node certificate distribution.
INFO: Rotation complete. New certificates will be active after a brief interval.

This output, while brief, signifies a complex dance of distributed consensus. When you initiate rotation, CockroachDB doesn’t just replace certificates on one node and hope for the best. It generates a new Certificate Authority (CA) certificate and then uses this new CA to sign new node and client certificates. The critical part is how it ensures all nodes agree on the new CA before it starts using it to validate connections.

Here’s the mental model:

  1. New CA Generation: A new CA certificate is generated. This is the root of trust for the next cycle of certificates.
  2. Node Certificate Signing: All existing nodes are issued new certificates signed by this new CA.
  3. CA Distribution: The new CA certificate is distributed to all nodes in the cluster. This is a multi-phase process. Initially, nodes might still be operating under the old CA, but they are being updated with the new one.
  4. Consensus on New CA: CockroachDB leverages its distributed consensus protocol (Raft) to ensure that a quorum of nodes has successfully received and validated the new CA certificate. This is the "silent" part of the operation. Nodes start trusting the new CA implicitly, even if they haven’t yet fully switched to using their new certificates.
  5. Node Certificate Deployment: The new node certificates, signed by the new CA, are then distributed.
  6. Client Certificate Deployment: Similarly, new client certificates are generated and distributed.
  7. Gradual Switchover: As nodes receive their new certificates and the cluster has reached consensus on the new CA, connections begin to be established using the new credentials. The old certificates remain valid until their expiry, providing a buffer.

The specific levers you control are primarily the cockroach cert rotate command itself, which orchestrates the entire process. You don’t typically need to touch individual node configurations for this. The system handles the distribution and validation.

What most people don’t realize is that the cockroach cert rotate command doesn’t immediately start using the new certificates. It initiates a process where the new CA is distributed and acknowledged by a majority of nodes first. Only then are the new node and client certificates deployed. This two-stage approach is what prevents a situation where a node might try to validate a connection using a new certificate signed by a CA that other nodes don’t yet trust, leading to a split-brain scenario or connection failures.

The next concept you’ll likely encounter is managing certificate revocation and understanding the implications of early revocation within a distributed system.

Want structured learning?

Take the full Cockroachdb course →