Page MenuHomeSoftware Heritage

Create origin_metadata tables and logic
ClosedPublic

Authored by moranegg on Oct 6 2017, 3:50 PM.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont requested changes to this revision.Oct 6 2017, 5:03 PM

I only have nitpicks about it:

  • indentation consistency
  • documentation (missing types or Args section)
  • a test which misses something (at least another origin_metadata entry)
swh/storage/db.py
1053 ↗(On Diff #848)

Please add the variable's "type".

1078 ↗(On Diff #848)

what's the \?
Is it to go to the next line?

1100 ↗(On Diff #848)

tabulation here is not the same as prior functions.

swh/storage/storage.py
1751 ↗(On Diff #848)

Missing the Args entry.

swh/storage/tests/test_storage.py
3542

Maybe add another origin_metadata from another provenance.
That way, this shows the filtering.

This revision now requires changes to proceed.Oct 6 2017, 5:03 PM
moranegg marked an inline comment as done.
  • Refactor origin_metadata and adding provider table and logic
  • Merge branch 'master' of ssh://forge.softwareheritage.org/diffusion/DSTO/swh-storage
  • Refactor origin_metadata and adding provider table and logic

Added some remarks.

I did not check the tests yet though.

sql/swh-schema.sql
17

Why 111 and not 110?

swh/storage/api/client.py
287

We should merge those methods (get/get_all).

If someone wants only 1 origin_metadata, (s)he should simply asks for one origin_metadata (list of 1 element).

That's what we do for most of our endpoints content, revision, etc...

Not sure why we diverge for cache_content_get/get_all endpoint.
That was either a mistake, either possibly unavoidable.
(Plus that endpoint seems to no longer be used so it should go away anyways).

To sum up, as we can avoid the duplication here, we should stick to the standard :)

293

indentation :)

303

provider/get?

swh/storage/db.py
1033 ↗(On Diff #851)

missing one space tool and metadata.

1038 ↗(On Diff #851)

time when the metadata was found?

1046 ↗(On Diff #851)
cur = self._cursor(cur)
1075 ↗(On Diff #851)

It should probably be named origin_metadata_get_by_origin.

1095 ↗(On Diff #851)

This probably could be merged with the method above as get_by.

1147 ↗(On Diff #851)
cur = self._cursor(cur)
swh/storage/storage.py
1772 ↗(On Diff #851)

a nitpick, if we take a look either at db or storage module, the pattern is inverted.

if not om:
    return None
return dict...
1795 ↗(On Diff #851)
yield dict(...
1798 ↗(On Diff #851)

Maybe we could merge origin_metadata_get_by_provider_type and origin_metadata_get_all (which is a get by origin actually) to origin_metadata_get_by.

This could either take an origin_id and a provider_type (defaulting to None) or a dict of filters.
And depending on the parameters, would call inside the db module the right sql function?

You'd also need to unify the information returned (that is join on the provider table to return those information even in the case of the get_by_origin, to return more than just the provider id).

1826 ↗(On Diff #851)

same :)

moranegg marked 16 inline comments as done.
  • Refactor origin_metadata and adding provider table and logic
  • Refactor entry point to origin_metadata table
sql/swh-schema.sql
17

because i added origin_metadata on the 110
and now i add provider on the 111

but i can put it back to 110

swh/storage/db.py
1075 ↗(On Diff #851)

you had a great idea making all get method, so now we have only origin_metadata_get
where by default it's:

  • get by origin_id

if a second param (provider_type) is given it's:

  • get by origin_id and provider_type

This sounds quite good.

I believe there remains some dead code though. Can you please check and adapt accordingly?

Cheers,

swh/storage/api/client.py
294

I think this is some dead code.

swh/storage/api/server.py
434

If my previous assertion is correct, this should disappear as well.

swh/storage/tests/test_storage.py
3593

This tends to agree with my previous assertions :)

This revision now requires changes to proceed.Oct 23 2017, 7:01 PM

Refactor origin_metadata_get to origin_metadata_get_by
deleting dead code

  • Refactor entry point to origin_metadata table
  • Refactor entry points to origin_metadata table with get_by function
ardumont added inline comments.
swh/storage/db.py
1107 ↗(On Diff #868)
cur = self._cursor(cur)
This revision now requires changes to proceed.Oct 24 2017, 12:23 PM
moranegg marked an inline comment as done.
  • Refactor entry points to origin_metadata table with get_by function
This revision is now accepted and ready to land.Oct 24 2017, 2:34 PM
This revision was automatically updated to reflect the committed changes.