Page MenuHomeSoftware Heritage

Refactor output of indexer storage's `get` methods.
Open, NormalPublic

Description

In the Indexer Storage API, most get methods (eg. content_ctags_get) yield items with this format:

{"id": sha1"tool": TOOL, "ctags": ctags1}
{"id": sha1"tool": TOOL, "ctags": ctags2}

Starting with T782/D301, content_fossology_license_get yields item with this format:

{sha1: {"tool": TOOL, "licenses": [license1, license2]}}

This task is twofold:

  • first, improve content_fossology_license_get's result to return a dictionary instead of yielding dictionaries each with a single key-value
  • secondly, refactor other _get methods to use the same format.

The files that should be edited are:

  • swh/indexer/tests/storage/test_storage.py: this are the test cases for both Indexer Storage implementations. It should be adapted to test for the new format.
  • swh/indexer/storage/in_memory.py: a fully in-memory implementation of the Indexer Storage. This is the easiest implementation to start with.
  • swh/indexer/storage/__init__.py and swh/indexer/storage/converters.py: an implementation of the Indexer Storage backed by postgresql. Look at D301 for examples of how to do it.

Event Timeline

vlorentz created this task.Dec 6 2018, 4:04 PM
vlorentz triaged this task as Low priority.
vlorentz updated the task description. (Show Details)Dec 6 2018, 4:07 PM
vlorentz updated the task description. (Show Details)
vlorentz raised the priority of this task from Low to Normal.Dec 13 2018, 1:56 PM
Sowmya added a subscriber: Sowmya.Sat, Mar 9, 3:43 AM