The primary reason Couchbase buckets, scopes, and collections exist is to provide a flexible, multi-tenant data modeling layer that mimics relational database schemas without the rigidity of SQL.
Let’s see this in action. Imagine we have a Couchbase cluster and we want to store user data for different applications. We can create a bucket named users to house all this data.
{
"name": "users",
"bucketType": "couchbase",
"ramQuotaMB": 256,
"numReplicas": 1,
"maxTTL": 0
}
Inside the users bucket, we can create scopes. A scope is like a logical grouping within a bucket. Let’s say we have two applications, app1 and app2, and we want to isolate their user data. We’d create scopes for each:
app1_scopeapp2_scope
Within each scope, we can define collections. Collections are where the actual documents (data) reside. For our user data, we might have a profiles collection and an activity_log collection within each scope.
So, a user document for app1 might live in users.app1_scope.profiles.
And a user document for app2 might live in users.app2_scope.profiles.
The structure looks like this:
Bucket: users
Scope: app1_scope
Collection: profiles
{ "user_id": "user123", "username": "alice", "email": "alice@example.com" }
Collection: activity_log
{ "user_id": "user123", "timestamp": "2023-10-27T10:00:00Z", "action": "login" }
Scope: app2_scope
Collection: profiles
{ "user_id": "user456", "username": "bob", "email": "bob@example.com" }
Collection: activity_log
{ "user_id": "user456", "timestamp": "2023-10-27T10:05:00Z", "action": "signup" }
This hierarchical structure provides powerful organization. You can set different access control lists (ACLs) at the bucket, scope, and collection levels, ensuring that app1 cannot accidentally or intentionally access data belonging to app2. You can also set different default Time-To-Live (TTL) values for collections, meaning data in the activity_log collection could automatically expire after, say, 30 days, while user profiles are kept indefinitely.
The key problem this solves is managing complexity in NoSQL data. Without these constructs, you’d be forced to prefix document keys with application names (e.g., app1_user_user123, app2_user_user456) or rely on a field within the document itself to denote its origin. This leads to messy queries, potential for data leakage, and a lack of granular control over data lifecycle. Buckets, scopes, and collections provide a native, performant way to achieve logical separation and structured data management.
When you query for a document, you specify the full path: users.app1_scope.profiles.user123. If you omit the scope and collection, Couchbase defaults to the _default scope and _default collection within the specified bucket. This implicit default behavior is a common source of confusion when starting out, as applications might be writing to unexpected locations if they don’t explicitly target a collection.
The next step in understanding Couchbase data modeling is exploring how N1QL queries interact with these collections and how to leverage global secondary indexes across them.