CockroachDB is failing because its nodes can’t agree on the current time, which is critical for its distributed consensus.

Cause 1: NTP Daemon Not Running

Diagnosis: Check the status of the chronyd or ntpd service on each node.

sudo systemctl status chronyd
# or
sudo systemctl status ntpd

Fix: If the service is inactive, start and enable it.

sudo systemctl start chronyd
sudo systemctl enable chronyd
# or
sudo systemctl start ntpd
sudo systemctl enable ntpd

This ensures that the system’s clock is actively synchronized with external NTP servers.

Cause 2: NTP Server Misconfiguration

Diagnosis: Inspect the NTP client configuration file for incorrect or unreachable server entries. For chronyd: /etc/chrony/chrony.conf For ntpd: /etc/ntp.conf

Example chrony.conf snippet to check:

# Look for lines like this, ensure servers are reachable and valid
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst

Fix: Replace invalid or unreachable servers with known good ones, ideally geographically diverse. Ensure your firewall allows outbound UDP traffic on port 123.

# Example modification in /etc/chrony/chrony.conf
server ntp.ubuntu.com iburst
server pool.ntp.org iburst

After editing, restart the NTP service:

sudo systemctl restart chronyd

This forces the client to use reliable time sources, improving synchronization accuracy.

Cause 3: Firewall Blocking NTP Traffic

Diagnosis: Verify that UDP port 123 (NTP) is open for outbound connections from your CockroachDB nodes to your NTP servers. Use tcpdump on a node to see if NTP packets are leaving.

sudo tcpdump -i any udp port 123 -n

If you see packets going out, the firewall is likely not the issue. If not, it is.

Fix: Configure your firewall (e.g., iptables, firewalld, or cloud provider security groups) to allow outbound UDP traffic on port 123. For firewalld:

sudo firewall-cmd --add-service=ntp --permanent
sudo firewall-cmd --reload

For iptables:

sudo iptables -A OUTPUT -p udp --dport 123 -j ACCEPT
# Save rules if necessary, e.g., with iptables-persistent

Allowing NTP traffic ensures that time synchronization packets can reach the external servers and return.

Cause 4: Insufficient NTP Server Reachability/Quorum

Diagnosis: Check the output of chronyc sources or ntpq -p for the status of your configured NTP servers. Look for servers with ^* (synchronized) or + (candidate) status. If most are x (reject) or ? (unreachable), you have a problem.

chronyc sources

Example of bad output:

210 Number of sources = 4
MS Name/IP address         Stratum Poll Reach LastRx Last Сервер
==============================================================================
? 192.168.1.1                2   6     0     -     -    +0ns[  +0ns] +/-   0ns
? 192.168.1.2                2   6     0     -     -    +0ns[  +0ns] +/-   0ns
x 1.pool.ntp.org             2   6     3     6     6   -250ms[ -250ms] +/-  20ms
x 2.pool.ntp.org             2   6     3     8     8   -300ms[ -300ms] +/-  25ms

Fix: Configure at least 3-4 NTP servers. Use a mix of local (if available and reliable) and geographically diverse public NTP servers from pools like pool.ntp.org. Ensure your NTP client is configured to allow a reasonable number of sources.

# In /etc/chrony/chrony.conf
server 0.pool.ntp.org iburst
server 1.pool.ntp.org iburst
server 2.pool.ntp.org iburst
server 3.pool.ntp.org iburst
# Add a local server if you have one
# server 127.127.1.0 # Local clock, usually not recommended for critical sync

Restart the NTP service after changes. Having more reliable time sources increases the chance of achieving a stable and accurate time synchronization.

Cause 5: NTP Daemon Over-Reliance on Specific Servers

Diagnosis: Examine the chronyc sources or ntpq -p output. If a single server dominates synchronization despite others being available, it might be a point of failure. Look for a single server with ^* and others with ? or x.

Fix: Adjust NTP client configuration to give more weight to a diverse set of servers, or to be more aggressive in switching sources if the primary becomes unreliable. For chronyd, you can adjust maxdist and maxfreq to be more tolerant of small deviations. However, the primary fix is ensuring multiple good sources. In chrony.conf, consider adding:

# Allow a wider initial drift
makestep 10 3

Restart the NTP service. This helps the client dynamically adapt to varying clock drift by considering a broader range of time sources.

Cause 6: System Clock Drift Exceeding CockroachDB’s Tolerance

Diagnosis: CockroachDB has a default tolerance for clock skew between nodes, typically 100ms. Check the actual clock difference between nodes using date.

# On node 1
date +%s.%N
# On node 2
date +%s.%N
# Calculate the difference

If the difference consistently exceeds 100ms, your NTP is not keeping up sufficiently.

Fix: Ensure your NTP client is configured to synchronize frequently. For chronyd, the default poll interval is usually acceptable, but you can explicitly set it lower for critical systems if needed.

# In /etc/chrony/chrony.conf
# Make poll intervals smaller (e.g., 64 seconds for fast sync)
# Adjust these carefully, too frequent can overload servers or cause instability
# Default is often 64s for polling, 1024s for max.
# Consider tuning if drift is persistent.
# server ntp.example.com iburst minpoll 4 maxpoll 10

The most common fix is ensuring robust NTP server configuration (Causes 2-4) rather than aggressive client polling. A well-synced system clock will remain within CockroachDB’s acceptable skew limits.

The next error you’ll likely encounter after fixing clock skew is related to transaction retries due to contention, as the system now has a consistent view of time and can more accurately detect concurrent operations.

Want structured learning?

Take the full Cockroachdb course →