We've been hitting PostgreSQL limitations for storage of the content -> revision cache. Azure table storage looks like a relevant candidate to store that cache.
Table storage provides a schemaless storage API which uses a compound primary key containing a `PartitionKey` and a `RowKey`, clustering on `PartitionKey`s and ordering queries on `RowKey`s. Each entry can have up to 255 properties and weigh up to 1MB.
A good candidate for `PartitionKey` would be the content identifier (well distributed except for corner cases).
We need to figure out a `RowKey` that's intrinsic to the line provided (properties : Revision identifier, path), and gives us a relevant ordering for files with multiple entries.
Limitations:
`PartitionKey` and `RowKey` are strings, and a bunch of control characters aren't allowed. Better use some kind of ASCII I suppose. Both can be up to 1KB in size.
Resources:
- How to use Azure Table Storage from Python: https://azure.microsoft.com/en-us/documentation/articles/storage-python-how-to-use-table-storage/
- Understanding the table service data model: https://msdn.microsoft.com/en-us/library/dd179338.aspx
- Scalable partitioning strategy for azure table storage: https://msdn.microsoft.com/en-us/library/hh508997.aspx