We've been hitting PostgreSQL limitations for storage of the content -> revision cache. Azure table storage looks like a relevant candidate to store that cache.
Table storage provides a schemaless storage API which uses a compound primary key containing a PartitionKey and a RowKey, clustering on PartitionKeys and ordering queries on RowKeys. Each entry can have up to 255 properties and weigh up to 1MB.
A good candidate for PartitionKey would be the content identifier (well distributed except for corner cases).
We need to figure out a RowKey that's intrinsic to the line provided (properties : Revision identifier, path), and gives us a relevant ordering for files with multiple entries.
Limitations:
PartitionKey and RowKey are strings, and a bunch of control characters aren't allowed. Better use some kind of ASCII I suppose. Both can be up to 1KB in size.
Resources:
- Azure table storage patterns: https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/
- How to use Azure Table Storage from Python: https://azure.microsoft.com/en-us/documentation/articles/storage-python-how-to-use-table-storage/
- Understanding the table service data model: https://msdn.microsoft.com/en-us/library/dd179338.aspx
- Scalable partitioning strategy for azure table storage: https://msdn.microsoft.com/en-us/library/hh508997.aspx