DynamoDB can’t directly query for a "missing" attribute, but it can efficiently find items that have a specific attribute.

Let’s say you have a table of Products with attributes like product_id, name, category, and tags (a set of strings). You want to find all products that don’t have a specific tag, say, is_discounted. A direct Scan with a FilterExpression like NOT attribute_exists(tags.is_discounted) is inefficient because Scan reads every item.

The standard solution for this kind of "absence" query is an inverted index pattern, but we’ll use a slightly different approach to achieve "reverse query direction." Instead of an inverted index that maps tags to products, we’ll create an index that maps the absence of a tag to the product.

Here’s how we’ll set it up:

Table Structure (Products)

{
  "TableName": "Products",
  "AttributeDefinitions": [
    {"AttributeName": "product_id", "AttributeType": "S"},
    {"AttributeName": "category", "AttributeType": "S"},
    {"AttributeName": "has_tag", "AttributeType": "S"},
    {"AttributeName": "tag_value", "AttributeType": "S"}
  ],
  "KeySchema": [
    {"AttributeName": "product_id", "KeyType": "HASH"}
  ],
  "ProvisionedThroughput": {
    "ReadCapacityUnits": 5,
    "WriteCapacityUnits": 5
  },
  "GlobalSecondaryIndexes": [
    {
      "IndexName": "TagIndex",
      "KeySchema": [
        {"AttributeName": "has_tag", "KeyType": "HASH"},
        {"AttributeName": "tag_value", "KeyType": "RANGE"}
      ],
      "Projection": {
        "ProjectionType": "KEYS_ONLY"
      },
      "ProvisionedThroughput": {
        "ReadCapacityUnits": 5,
        "WriteCapacityUnits": 5
      }
    }
  ]
}

In this setup:

  • product_id is the primary key.
  • TagIndex is a Global Secondary Index (GSI).
  • has_tag will be our "inverted" attribute. We’ll use it to signify the presence of a specific tag.
  • tag_value will store the actual tag name.

The "Reverse Query" Logic

When we add a product, we’ll create entries in the GSI for each tag it possesses.

Let’s say we have a product: { "product_id": "prod-123", "name": "Wireless Mouse", "category": "Electronics", "tags": {"wireless", "ergonomic", "bluetooth"} }

We would write three items to the Products table, in addition to the main item:

  1. Main item: { "product_id": "prod-123", "name": "Wireless Mouse", "category": "Electronics", "tags": {"wireless", "ergonomic", "bluetooth"} }

  2. GSI item for "wireless": { "product_id": "prod-123", "has_tag": "wireless", "tag_value": "wireless" }

  3. GSI item for "ergonomic": { "product_id": "prod-123", "has_tag": "ergonomic", "tag_value": "ergonomic" }

  4. GSI item for "bluetooth": { "product_id": "prod-123", "has_tag": "bluetooth", "tag_value": "bluetooth" }

Notice how has_tag and tag_value are the same for positive tag presence.

Querying for Products Without a Tag

Now, to find products that do not have the tag is_discounted, we perform a Query on the TagIndex GSI.

We want to find all items where has_tag is not is_discounted. This is where the "reverse" comes in. We’ll query for all items where has_tag is not equal to is_discounted.

{
  "TableName": "Products",
  "IndexName": "TagIndex",
  "KeyConditionExpression": "has_tag <> :tag_name",
  "ExpressionAttributeValues": {
    ":tag_name": {"S": "is_discounted"}
  },
  "Select": "COUNT" // Or "ALL_PROJECTED_ATTRIBUTES" if you need product_id
}

This query returns a list of all GSI items where the has_tag attribute is anything other than is_discounted. The product_ids returned in the result are precisely the products that do not have the is_discounted tag.

Why this works:

By creating an entry in the GSI for every tag a product has, we’ve effectively inverted the relationship. The GSI now indexes the presence of tags. When we query for has_tag <> 'is_discounted', we are asking for all indexed tag associations that aren’t the one we want to exclude. The product_ids associated with these non-matching tag entries are the products that lack the specified tag.

Handling Multiple Tags and Complex Conditions

For more complex scenarios, like finding products that have tag A but not tag B, you can combine queries. First, query for products with tag A. Then, for each of those product_ids, perform a second query or check to see if they also have tag B. Alternatively, you could design a more elaborate GSI structure, perhaps using composite keys that encode tag combinations, but that adds write complexity.

The most surprising true thing about this pattern is that you’re using DynamoDB’s ability to efficiently query for presence to infer absence, effectively turning the query inside out without a full table scan.

The next problem you’ll likely encounter is managing the write amplification – ensuring every tag addition/removal correctly updates the GSI entries.

Want structured learning?

Take the full Dynamodb course →