Change Details

We've been hitting PostgreSQL limitations for storage of the content -> revision cache. Azure table storage looks like a relevant candidate to store that cache. Table storage provides a schemaless storage API which uses a compound primary key containing a `PartitionKey` and a `RowKey`, clustering on `PartitionKey`s and ordering queries on `RowKey`s. Each entry can have up to 255 properties and weigh up to 1MB. A good candidate for `PartitionKey` would be the content identifier (well distributed except for corner cases). We need to figure out a `RowKey` that's intrinsic to the line provided (properties : Revision identifier, path), and gives us a relevant ordering for files with multiple entries. Limitations: `PartitionKey` and `RowKey` are strings, and a bunch of control characters aren't allowed. Better use some kind of ASCII I suppose. Both can be up to 1KB in size. Resources: - Azure table storage patterns: https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/ - How to use Azure Table Storage from Python: https://azure.microsoft.com/en-us/documentation/articles/storage-python-how-to-use-table-storage/ - Understanding the table service data model: https://msdn.microsoft.com/en-us/library/dd179338.aspx - Scalable partitioning strategy for azure table storage: https://msdn.microsoft.com/en-us/library/hh508997.aspx