Page MenuHomeSoftware Heritage

Create origin_metadata tables and logic
ClosedPublic

Authored by moranegg on Oct 6 2017, 3:50 PM.

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 1066
Build 1404: Software Heritage Python tests
Build 1403: arc lint + arc unit

Event Timeline

ardumont requested changes to this revision.Oct 6 2017, 5:03 PM

I only have nitpicks about it:

  • indentation consistency
  • documentation (missing types or Args section)
  • a test which misses something (at least another origin_metadata entry)
swh/storage/db.py
1107

Please add the variable's "type".

1132

what's the \?
Is it to go to the next line?

1154

tabulation here is not the same as prior functions.

swh/storage/storage.py
1750

Missing the Args entry.

swh/storage/tests/test_storage.py
3561

Maybe add another origin_metadata from another provenance.
That way, this shows the filtering.

This revision now requires changes to proceed.Oct 6 2017, 5:03 PM
moranegg marked an inline comment as done.
  • Refactor origin_metadata and adding provider table and logic
  • Merge branch 'master' of ssh://forge.softwareheritage.org/diffusion/DSTO/swh-storage
  • Refactor origin_metadata and adding provider table and logic

Added some remarks.

I did not check the tests yet though.

sql/swh-schema.sql
17

Why 111 and not 110?

swh/storage/api/client.py
270

We should merge those methods (get/get_all).

If someone wants only 1 origin_metadata, (s)he should simply asks for one origin_metadata (list of 1 element).

That's what we do for most of our endpoints content, revision, etc...

Not sure why we diverge for cache_content_get/get_all endpoint.
That was either a mistake, either possibly unavoidable.
(Plus that endpoint seems to no longer be used so it should go away anyways).

To sum up, as we can avoid the duplication here, we should stick to the standard :)

276

indentation :)

286

provider/get?

swh/storage/db.py
1032

missing one space tool and metadata.

1037

time when the metadata was found?

1045
cur = self._cursor(cur)
1074

It should probably be named origin_metadata_get_by_origin.

1094

This probably could be merged with the method above as get_by.

1120
cur = self._cursor(cur)
swh/storage/storage.py
1771

a nitpick, if we take a look either at db or storage module, the pattern is inverted.

if not om:
    return None
return dict...
1794
yield dict(...
1797

Maybe we could merge origin_metadata_get_by_provider_type and origin_metadata_get_all (which is a get by origin actually) to origin_metadata_get_by.

This could either take an origin_id and a provider_type (defaulting to None) or a dict of filters.
And depending on the parameters, would call inside the db module the right sql function?

You'd also need to unify the information returned (that is join on the provider table to return those information even in the case of the get_by_origin, to return more than just the provider id).

1825

same :)

moranegg marked 16 inline comments as done.
  • Refactor origin_metadata and adding provider table and logic
  • Refactor entry point to origin_metadata table
sql/swh-schema.sql
17

because i added origin_metadata on the 110
and now i add provider on the 111

but i can put it back to 110

swh/storage/db.py
1074

you had a great idea making all get method, so now we have only origin_metadata_get
where by default it's:

  • get by origin_id

if a second param (provider_type) is given it's:

  • get by origin_id and provider_type

This sounds quite good.

I believe there remains some dead code though. Can you please check and adapt accordingly?

Cheers,

swh/storage/api/client.py
277

I think this is some dead code.

swh/storage/api/server.py
434

If my previous assertion is correct, this should disappear as well.

swh/storage/tests/test_storage.py
3612

This tends to agree with my previous assertions :)

This revision now requires changes to proceed.Oct 23 2017, 7:01 PM

Refactor origin_metadata_get to origin_metadata_get_by
deleting dead code

  • Refactor entry point to origin_metadata table
  • Refactor entry points to origin_metadata table with get_by function
ardumont added inline comments.
swh/storage/db.py
1106
cur = self._cursor(cur)
This revision now requires changes to proceed.Oct 24 2017, 12:23 PM
moranegg marked an inline comment as done.
  • Refactor entry points to origin_metadata table with get_by function
This revision is now accepted and ready to land.Oct 24 2017, 2:34 PM
This revision was automatically updated to reflect the committed changes.