CockroachDB transactions are designed to be resilient, but that resilience comes at the cost of occasional transient errors that your application code needs to handle.
Let’s see a simple Go example of how to manage these retries.
import (
"context"
"database/sql"
"fmt"
"log"
"time"
_ "github.com/jackc/pgx/v4/stdlib"
)
func main() {
// Replace with your actual connection string
connStr := "postgresql://root@localhost:26257/mydatabase?sslmode=disable"
db, err := sql.Open("pgx", connStr)
if err != nil {
log.Fatal(err)
}
defer db.Close()
// Ensure the table exists
_, err = db.Exec(`CREATE TABLE IF NOT EXISTS accounts (id INT PRIMARY KEY, balance DECIMAL)`)
if err != nil {
log.Fatal(err)
}
// Insert initial data if not exists
_, err = db.Exec(`INSERT INTO accounts (id, balance) VALUES (1, 1000) ON CONFLICT DO NOTHING`)
if err != nil {
log.Fatal(err)
}
// Perform a transaction that might fail and needs retries
ctx := context.Background()
err = performTransfer(ctx, db, 1, 2, 100) // Transfer 100 from account 1 to account 2
if err != nil {
log.Fatalf("Transfer failed after retries: %v", err)
}
fmt.Println("Transfer successful!")
}
func performTransfer(ctx context.Context, db *sql.DB, fromAccountID, toAccountID int, amount float64) error {
const maxRetries = 5
var lastErr error
for i := 0; i < maxRetries; i++ {
tx, err := db.BeginTx(ctx, &sql.TxOptions{Isolation: sql.LevelSerializable})
if err != nil {
return fmt.Errorf("failed to begin transaction: %w", err)
}
// Execute your transaction logic here
// For demonstration, let's simulate a potential conflict
var fromBalance DECIMAL
err = tx.QueryRowContext(ctx, "SELECT balance FROM accounts WHERE id = $1 FOR UPDATE", fromAccountID).Scan(&fromBalance)
if err != nil {
tx.Rollback()
return fmt.Errorf("failed to get from account balance: %w", err)
}
if fromBalance < amount {
tx.Rollback()
return fmt.Errorf("insufficient funds in account %d", fromAccountID)
}
_, err = tx.ExecContext(ctx, "UPDATE accounts SET balance = balance - $1 WHERE id = $2", amount, fromAccountID)
if err != nil {
tx.Rollback()
return fmt.Errorf("failed to debit from account %d: %w", fromAccountID, err)
}
_, err = tx.ExecContext(ctx, "UPDATE accounts SET balance = balance + $1 WHERE id = $2", amount, toAccountID)
if err != nil {
tx.Rollback()
return fmt.Errorf("failed to credit to account %d: %w", toAccountID, err)
}
err = tx.Commit()
if err != nil {
// Check for retryable error
if isRetryableError(err) {
lastErr = fmt.Errorf("transaction commit failed (attempt %d/%d): %w", i+1, maxRetries, err)
log.Printf("Retrying transaction: %v", lastErr)
time.Sleep(time.Duration(i+1) * 100 * time.Millisecond) // Exponential backoff
continue
}
// Not a retryable error, return immediately
tx.Rollback() // Ensure rollback if commit failed for non-retryable reason
return fmt.Errorf("transaction commit failed: %w", err)
}
// Transaction committed successfully
return nil
}
return fmt.Errorf("transaction failed after %d retries: %w", maxRetries, lastErr)
}
// isRetryableError checks if the error is a CockroachDB retryable error.
// In a real application, you would parse the error string or use a specific error code.
// For demonstration, we'll check for a common string pattern.
func isRetryableError(err error) bool {
return err != nil && (strings.Contains(err.Error(), "restart transaction") || strings.Contains(err.Error(), "serialization_failure"))
}
// DECIMAL is a placeholder for a decimal type, you'd use `decimal.Decimal` from a library.
type DECIMAL float64
The core idea is that CockroachDB uses optimistic concurrency control. When you commit a transaction, it checks if any data you read has been modified by another, concurrently committed transaction. If it has, your transaction is aborted with a "serialization failure" or a similar error, and you need to retry it. This is where the SELECT ... FOR UPDATE clause becomes crucial; it tells CockroachDB that you intend to modify the selected rows, helping to prevent certain types of conflicts.
The sql.LevelSerializable isolation level is essential here because it guarantees that your transaction will behave as if it were executed serially, one after another, even if multiple transactions are running concurrently. This is the strongest isolation level and is what enables CockroachDB to detect and report conflicts that require a retry.
When tx.Commit() returns an error, you must inspect it. CockroachDB errors often contain clues like "restart transaction" or "serialization_failure". If you detect one of these, you should roll back the transaction (though Commit failing usually implies a rollback has already happened internally) and try the entire transaction again. A simple time.Sleep() with an increasing delay (exponential backoff) is a good practice to avoid overwhelming the database and to give other transactions a chance to complete.
The FOR UPDATE clause in SELECT balance FROM accounts WHERE id = $1 FOR UPDATE is key. It acquires a strong lock on the row(s) being read, preventing other transactions from modifying them until your current transaction is committed or rolled back. This significantly reduces the chances of encountering a serialization error for the specific rows you’re working with, but it doesn’t eliminate all possibilities, especially if other transactions are affecting different parts of the system or if the conflict arises from a complex interaction.
The isRetryableError function is a simplification. In a production system, you’d want to parse the specific error code returned by the pgx driver or the underlying PostgreSQL wire protocol to reliably identify retryable errors. CockroachDB’s documentation lists specific error codes that indicate a retryable condition.
The next hurdle you’ll face is handling application-level deadlocks, which occur when two or more transactions are waiting for each other to release locks.