CockroachDB doesn’t actually limit connections; it’s your application that’s likely drowning it with too many.
Let’s say you’ve got a busy web application backed by CockroachDB, and users are starting to complain about slow response times or outright errors. You check your CockroachDB metrics, and you’re not seeing CPU maxed out, disk I/O looks okay, but your sql.conns.open metric is climbing steadily and never seems to drop. This is the classic sign that your application is opening connections and not closing them properly, or it’s opening way more than CockroachDB can efficiently handle. CockroachDB itself is designed to handle thousands of concurrent connections, so if you’re hitting a wall, the bottleneck is almost certainly on the client side.
Here’s how to diagnose and tune your application’s connection behavior to keep CockroachDB happy.
The "Too Many Connections" Symptoms
The most obvious symptom is increased latency for SQL queries. CockroachDB has to manage the overhead of each connection, even if it’s idle. When the number of connections becomes excessive, the database spends more time managing those connections than executing queries. You might also see errors like:
connection refused(less common with CockroachDB unless the node is truly out of resources, but possible).pq: the database system is overloaded(from thepqdriver, often Go).SQLSTATE: 53300, ERROR: remaining connection slots are reserved for non-replication superuser connections(this is a PostgreSQL-compatible error that CockroachDB can return, indicating it’s hitting internal limits on active connections it can service).- Application-specific errors indicating connection timeouts or pool exhaustion.
Common Causes and Fixes
The root cause is almost always in your application’s connection management.
-
Unclosed Connections (Connection Leaks)
- Diagnosis: This is the most frequent culprit. Your application code opens a connection to the database, performs an operation, and then fails to return it to the connection pool or explicitly close it. Over time, the number of open connections grows unbounded. Monitor your application’s connection pool metrics (if it has one) and CockroachDB’s
sql.conns.openmetric. A steadily increasingsql.conns.openthat never decreases is a dead giveaway. - Fix: Implement robust connection handling in your application. If you’re using a connection pool (highly recommended), ensure that every
connection.Query()orconnection.Exec()call is followed byconnection.Release()(forpgx) orconnection.Close()(fordatabase/sqlandpqdrivers) if you’re not usingdeferto ensure closure.- Go (
database/sql):db, err := sql.Open("postgres", "...") if err != nil { /* handle error */ } defer db.Close() // Ensure the *pool* is closed on application exit conn, err := db.Conn(ctx) if err != nil { /* handle error */ } defer conn.Close() // THIS is the crucial part for individual connections // ... use conn ... - Go (
pgx):conn, err := pgx.Connect(ctx, "...") if err != nil { /* handle error */ } defer conn.Close(ctx) // Ensures the connection is returned to the pool or closed // ... use conn ... - Python (psycopg2):
conn = psycopg2.connect(...) try: cur = conn.cursor() # ... use cur ... finally: cur.close() conn.close() # Ensure connection is returned to pool or closed
- Go (
- Why it works: Explicitly closing or releasing connections ensures that the database (or the pool) knows the connection is no longer in use, freeing up resources and preventing the
sql.conns.opencount from growing indefinitely.
- Diagnosis: This is the most frequent culprit. Your application code opens a connection to the database, performs an operation, and then fails to return it to the connection pool or explicitly close it. Over time, the number of open connections grows unbounded. Monitor your application’s connection pool metrics (if it has one) and CockroachDB’s
-
Connection Pool Size Too Large (or Too Small)
- Diagnosis: Most applications use a connection pool to reuse database connections. If the pool is configured with a maximum size that’s excessively large, your application might be trying to open more connections than CockroachDB can efficiently manage concurrently, even if they are being released. Conversely, a pool that’s too small can lead to connection acquisition timeouts if requests outpace available connections. Check your application’s connection pool configuration.
- Fix: Adjust the
max_open_conns(or equivalent) setting in your application’s connection pool configuration. A good starting point formax_open_connsis often related to the number of CPU cores on your application servers, or a number that balances concurrency with CockroachDB’s capacity. For a typical web application, values between 50-200 are common, but this is highly workload-dependent.- Go (
database/sql):db.SetMaxOpenConns(100) // Example: Limit to 100 open connections db.SetMaxIdleConns(10) // Keep up to 10 idle connections - Python (SQLAlchemy):
engine = create_engine( "postgresql://user:password@host:port/dbname", pool_size=100, # Maximum number of connections to keep open max_overflow=20, # Additional connections allowed beyond pool_size pool_timeout=30 # Seconds to wait for a connection )
- Go (
- Why it works: This limits the total number of concurrent connections your application can establish, preventing it from overwhelming CockroachDB by sheer volume. It forces requests to queue up within the application’s pool if all connections are busy, rather than creating new ones that add overhead to the database.
-
Inefficient Query Patterns Leading to Long-Running Transactions
- Diagnosis: While not directly a connection limit issue, long-running transactions consume a connection for their entire duration. If your application opens a transaction, performs a series of slow operations (e.g., multiple sequential queries, complex application logic, waiting for external services), and then commits or rolls back, that connection is held up. Monitor CockroachDB’s
txn.max_txns_totalandtxn.active_txnsmetrics, and look for queries with highmax_execution_latency. - Fix: Break down long-running transactions into smaller, more manageable units. Optimize slow queries. If external service calls are involved, consider running them outside of the main database transaction or using asynchronous patterns. Ensure transactions are committed or rolled back promptly.
- Why it works: Shorter transactions free up connections much faster. This is crucial because the total number of active transactions can also indirectly lead to connection pressure if each transaction is holding a connection for an extended period.
- Diagnosis: While not directly a connection limit issue, long-running transactions consume a connection for their entire duration. If your application opens a transaction, performs a series of slow operations (e.g., multiple sequential queries, complex application logic, waiting for external services), and then commits or rolls back, that connection is held up. Monitor CockroachDB’s
-
Driver-Specific Connection Management Defaults
- Diagnosis: Different database drivers and ORMs have different default behaviors for connection pooling and management. Some might have very aggressive defaults that open many connections quickly, or they might not aggressively prune idle connections. Review the documentation for the specific driver or ORM you are using.
- Fix: Consult your driver’s documentation (e.g.,
pgx,pq,node-postgres, SQLAlchemy, Hibernate) and explicitly configure pool size, idle connection timeouts, and connection validation strategies to match your workload and CockroachDB’s capabilities. - Why it works: Tailoring the driver’s behavior to your specific needs ensures it doesn’t operate with a suboptimal default that could lead to excessive connections.
-
Application Restart/Scaling Events
- Diagnosis: When an application scales up (new instances start) or restarts, each new instance will establish its own set of connections (or connect to the pool). If these events happen frequently or if there’s a delay in old instances being fully terminated and their connections released, you can see a temporary spike in
sql.conns.open. - Fix: Ensure your application deployment process has graceful shutdown procedures. This means allowing existing requests to complete and connections to be released before terminating the application instance. In cloud environments, configure health checks and termination grace periods appropriately.
- Why it works: Graceful shutdowns prevent abrupt connection drops and allow the connection pool and database to manage the transition smoothly, avoiding sudden surges in connection counts.
- Diagnosis: When an application scales up (new instances start) or restarts, each new instance will establish its own set of connections (or connect to the pool). If these events happen frequently or if there’s a delay in old instances being fully terminated and their connections released, you can see a temporary spike in
-
Misconfigured Load Balancers or Proxies
- Diagnosis: If you’re using a load balancer or proxy in front of your application servers, it’s possible that the proxy itself is maintaining persistent connections to your application, or that connection pooling is misconfigured between the proxy and your application. This is less common for direct database connections but can happen in complex architectures.
- Fix: Review the connection management settings of any intermediate network devices or application-level proxies. Ensure they are not holding onto connections unnecessarily or are configured to work harmoniously with your application’s connection pool.
- Why it works: Ensures that the entire chain of communication respects connection limits and doesn’t introduce phantom connections.
By systematically checking these points, you should be able to identify why your application is opening too many connections and resolve the overload on your CockroachDB cluster.
After fixing your connection limits, you’ll likely start looking at query performance more closely, and the next thing you’ll encounter is understanding how CockroachDB’s distributed query planner works.