Page MenuHomeSoftware Heritage

Deduplicate origin-metadata when they have the same authority + discovery_date + fetcher.
ClosedPublic

Authored by vlorentz on Jun 4 2020, 12:34 PM.

Details

Reviewers
ardumont
Group Reviewers
Reviewers
Summary

By replacing the old value with the new one.

This will allow an easy implementation of pagination, using the fetcher
id as an opaque page_token.

Plus, it did not make sense logically to have different metadata from
the same authority at the same time (especially with the same fetcher).

Diff Detail

Repository
rDSTO Storage manager
Branch
om-unique-fetcher
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 12631
Build 19194: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 19193: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D3221 (id=11427)

Rebasing onto f9b2ca3fec...

First, rewinding head to replay your work on top of it...
Applying: Deduplicate origin-metadata when they have the same authority + discovery_date + fetcher.
Changes applied before test
commit 089e2275159587643ffed9d5a4a569e82358af6f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jun 4 12:33:17 2020 +0200

    Deduplicate origin-metadata when they have the same authority + discovery_date + fetcher.
    
    By replacing the old value with the new one.
    
    This will allow an easy implementation of pagination, using the fetcher
    id as an opaque page_token.
    
    Plus, it did not make sense logically to have different metadata from
    the same authority at the same time (especially with the same fetcher).

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/224/ for more details.

ardumont added a subscriber: ardumont.

looks good, some remarks in the diff.

swh/storage/in_memory.py
1094

is it a list or a set?

.add suggests a set but you named it list ;)

swh/storage/sql/60-swh-indexes.sql
174

A sql upgrade script?

swh/storage/tests/test_storage.py
3229

silently "updated"?

This revision is now accepted and ready to land.Jun 4 2020, 1:20 PM
vlorentz marked an inline comment as done.

apply comments

Build is green

Patch application report for D3221 (id=11484)

Rebasing onto dcef916e5e...

First, rewinding head to replay your work on top of it...
Applying: Deduplicate origin-metadata when they have the same authority + discovery_date + fetcher.
Changes applied before test
commit d122d2ad85538b6e1014c65865b61babb008479b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jun 4 12:33:17 2020 +0200

    Deduplicate origin-metadata when they have the same authority + discovery_date + fetcher.
    
    By replacing the old value with the new one.
    
    This will allow an easy implementation of pagination, using the fetcher
    id as an opaque page_token.
    
    Plus, it did not make sense logically to have different metadata from
    the same authority at the same time (especially with the same fetcher).

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/238/ for more details.

Build is green

Patch application report for D3221 (id=11494)

Rebasing onto dcef916e5e...

Current branch diff-target is up to date.
Changes applied before test
commit 6ebdc2f76e294c888c9b121a222d4d360df4507c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jun 4 12:33:17 2020 +0200

    Deduplicate origin-metadata when they have the same authority + discovery_date + fetcher.
    
    By replacing the old value with the new one.
    
    This will allow an easy implementation of pagination, using the fetcher
    id as an opaque page_token.
    
    Plus, it did not make sense logically to have different metadata from
    the same authority at the same time (especially with the same fetcher).

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/244/ for more details.