Page MenuHomeSoftware Heritage

Refactor output of indexer storage's `get` methods.
Closed, MigratedEdits Locked

Description

In the Indexer Storage API, most get methods (eg. content_ctags_get) yield items with this format:

{"id": sha1, "tool": TOOL, "ctags": ctags1}
{"id": sha1, "tool": TOOL, "ctags": ctags2}

Starting with T782/D301, content_fossology_license_get yields item with this format:

{sha1: {"tool": TOOL, "licenses": [license1, license2]}}

This task is twofold:

  • first, improve content_fossology_license_get's result to return a dictionary instead of yielding dictionaries each with a single key-value
  • secondly, refactor other _get methods to use the same format.

The files that should be edited are:

  • swh/indexer/tests/storage/test_storage.py: this are the test cases for both Indexer Storage implementations. It should be adapted to test for the new format.
  • swh/indexer/storage/in_memory.py: a fully in-memory implementation of the Indexer Storage. This is the easiest implementation to start with.
  • swh/indexer/storage/__init__.py and swh/indexer/storage/converters.py: an implementation of the Indexer Storage backed by postgresql. Look at D301 for examples of how to do it.

Event Timeline

vlorentz created this task.
vlorentz updated the task description. (Show Details)
vlorentz raised the priority of this task from Low to Normal.Dec 13 2018, 1:56 PM

I am familiar with the web APIs and I went through the discussion in T782. When you say output a single dictionary, I believe you mean something like this

{
  sha1: [
    {tool: TOOL, licenses: [licences]},
    {tool: TOOL, licenses: [licences]}
  ],

  sha1: [
    {tool: TOOL, licenses: [licences]},
    {tool: TOOL, licenses: [licences]}
  ]
}

Following the setup guide I have hosted indexer locally and will start refactoring all the APIs one by one.

I went through all the tests in test_storage.py. It appears that only content_fossology_license_get needs to be refactored. All other storage methods return a dictionary or a list of dictionaries, where each dictionary has multiple keys.

vlorentz changed the task status from Open to Work in Progress.Oct 1 2020, 12:40 PM
vlorentz claimed this task.