Page MenuHomeSoftware Heritage

Split Content class into two classes, for missing and non-missing contents.
ClosedPublic

Authored by vlorentz on Feb 4 2020, 3:50 PM.

Details

Summary

I'm not very happy with the names though.
Suggestions welcome

Diff Detail

Repository
rDMOD Data model
Branch
missing-content
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 10445
Build 15557: tox-on-jenkinsJenkins
Build 15556: arc lint + arc unit

Event Timeline

s/non missing/present/
would be an improvement.

"Missing" isn't perfect, maybe, but it's consistent with SQL storage tables at least.

In D2623#62430, @zack wrote:

s/non missing/present/
would be an improvement.

Indeed

"Missing" isn't perfect, maybe, but it's consistent with SQL storage tables at least.

No, SQL tables use "skipped". But I prefer "missing" because it's more generic (it also includes content we couldn't find)

"Missing" isn't perfect, maybe, but it's consistent with SQL storage tables at least.

No, SQL tables use "skipped". But I prefer "missing" because it's more generic (it also includes content we couldn't find)

Ah, good point — I was misremembering.
I agree with you that "missing" is better.

"Missing" isn't perfect, maybe, but it's consistent with SQL storage tables at least.

i remembered as much but the table is named skipped, not missing.
what uses missing are the storage endpoints.

so bonus point to improve consistency.

swh/model/hypothesis_strategies.py
135

as zack said present is better


also i like existing but it can be ambiguous.
if it's missing, it does not exist within the archive...
it exists for real though... like i said ambiguous.

go for present ;)

Is this already covered by current tests?

Is this already covered by current tests?

Yes, via the hypothesis strategies

This revision is now accepted and ready to land.Feb 5 2020, 11:19 AM

rename non-missing -> present

  • rename missing -> skipped

it avoid confusion with the terminology used in swh-storage,
as "content missing" means we never saw that content before;
while "skipped content" means we saw it but didn't ingest it
for some reason.

  • rename missing -> skipped

it avoid confusion with the terminology used in swh-storage,
as "content missing" means we never saw that content before;
while "skipped content" means we saw it but didn't ingest it
for some reason.

as per oral discussion, agreed!