Azure Blob Storage lifecycle management allows you to automatically transition your data through different access tiers and delete it when it’s no longer needed, saving you money and managing storage efficiently.
Let’s see this in action. Imagine you have a blob container named raw-data where you ingest daily CSV files. These files are frequently accessed for the first 30 days, then accessed less often for the next 90 days, and finally, archived indefinitely until deletion.
Here’s a simplified lifecycle management rule applied to this scenario:
{
"rules": [
{
"name": "MoveRawToCoolAndArchive",
"enabled": true,
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"tierToCool": { "daysAfterModificationGreaterThan": 30 },
"tierToArchive": { "daysAfterModificationGreaterThan": 120 }
},
"snapshot": {
"tierToCool": { "daysAfterCreationGreaterThan": 30 },
"tierToArchive": { "daysAfterCreationGreaterThan": 120 }
},
"version": {
"tierToCool": { "daysAfterCreationGreaterThan": 30 },
"tierToArchive": { "daysAfterCreationGreaterThan": 120 }
}
},
"filters": {
"blobTypes": [ "all" ],
"prefix": [ "raw-data/" ]
}
}
},
{
"name": "DeleteOldData",
"enabled": true,
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"delete": { "daysAfterModificationGreaterThan": 365 }
},
"snapshot": {
"delete": { "daysAfterCreationGreaterThan": 365 }
},
"version": {
"delete": { "daysAfterCreationGreaterThan": 365 }
}
},
"filters": {
"blobTypes": [ "all" ],
"prefix": [ "raw-data/" ]
}
}
}
]
}
This JSON defines two rules. The first rule, "MoveRawToCoolAndArchive," transitions blobs to the Cool tier after 30 days of inactivity and then to the Archive tier after 120 days. It applies to all blob types (base blobs, snapshots, and versions) within the raw-data/ prefix. The second rule, "DeleteOldData," deletes any blobs that haven’t been modified in 365 days, again applying to all blob types and the same prefix.
The problem this solves is the manual effort and potential cost overruns associated with managing large volumes of data with varying access patterns. Instead of manually moving data to cheaper tiers or deleting it, Azure handles it automatically based on your defined policies.
Internally, Azure Blob Storage has a background service that periodically scans your storage account for blobs that match the criteria defined in your lifecycle rules. When a blob’s age (based on modification or creation date) crosses a threshold defined in a rule, the service applies the specified action. This action could be changing the access tier (Hot, Cool, Archive) or deleting the blob. The filters section allows you to scope these rules to specific blob types and prefixes, ensuring you don’t accidentally apply a deletion rule to your frequently accessed data.
The exact levers you control are the daysAfterModificationGreaterThan and daysAfterCreationGreaterThan for actions like tierToCool, tierToArchive, and delete. You also control the blobTypes (e.g., baseBlob, snapshot, version) and the filters for prefix and blobTypes. Importantly, the order of rules matters. If a blob meets the criteria for multiple rules, the first rule that matches will be applied. For instance, if a blob is 150 days old, it will be tiered to Archive by the first rule and then, if the second rule also matched, it wouldn’t be deleted by that rule in the same cycle because the Archive tier action already occurred.
The cost savings come from the significant price difference between access tiers. Archive storage is the cheapest per gigabyte but has higher access costs and latency. By moving infrequently accessed data to Archive, you reduce your monthly storage bill considerably.
A common misconception is that daysAfterModificationGreaterThan applies to all blob types. However, for snapshots and versions, the age is calculated based on daysAfterCreationGreaterThan. This distinction is crucial for accurate lifecycle management, as snapshots and versions are point-in-time copies and their "modification" date doesn’t change in the same way a base blob’s does.
The next concept you’ll likely encounter is managing blob versions and snapshots alongside your base blobs, and how lifecycle rules interact with them, especially regarding deletion and tiering.