Page MenuHomeSoftware Heritage

loader: Read snapshot out of the last origin visit status
ClosedPublic

Authored by ardumont on Jun 17 2020, 4:59 PM.

Details

Summary

Preparatory work to drop unused fields in OriginVisit.

Related to T2310

Test Plan

tox

Diff Detail

Event Timeline

Build is green

Patch application report for D3307 (id=11713)

Rebasing onto 2f08b0dfb7...

Current branch diff-target is up to date.
Changes applied before test
commit 475e95b710ce83ee8c02ea0652ccf73f9038e997
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 17 17:00:03 2020 +0200

    loader: Read snapshot out of the last origin visit status
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/17/ for more details.

swh/loader/svn/loader.py
206

rahhh, come on...
(apparently missing test for that part...)

swh/loader/svn/tests/test_loader.py
52

This is already defined in loader-core now... [1]
Unsure if we want to avoid duplication here...

[1] D3305

Align assert_last_visit_ok helper function

Build is green

Patch application report for D3307 (id=11715)

Rebasing onto 2f08b0dfb7...

Current branch diff-target is up to date.
Changes applied before test
commit 370f660e10b3f8106d855c2bdeacfc874f52c99f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 17 17:00:03 2020 +0200

    loader: Read snapshot out of the last origin visit status
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/18/ for more details.

test_loader: Load twice the same repository to try and increase coverage

Build is green

Patch application report for D3307 (id=11717)

Rebasing onto 2f08b0dfb7...

Current branch diff-target is up to date.
Changes applied before test
commit 6f504132277f7eadcda44da0a058dcec15f24be2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 17 17:53:31 2020 +0200

    test_loader: Load twice the same repository

commit 370f660e10b3f8106d855c2bdeacfc874f52c99f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 17 17:00:03 2020 +0200

    loader: Read snapshot out of the last origin visit status
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/19/ for more details.

swh/loader/svn/loader.py
206

The current test scaffolding prevent this from running there.
This needs more work.

Drop duplication in helper test function declaration

Note: Tests will fail as a new release of loader-core is needed

Build has FAILED

Patch application report for D3307 (id=11771)

Rebasing onto 2f08b0dfb7...

Current branch diff-target is up to date.
Changes applied before test
commit 91787b96bfc9735bf2619d9951252804e2f0560c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 17 17:00:03 2020 +0200

    loader: Read snapshot out of the last origin visit status
    
    Related to T2310

Link to build: https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/20/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/20/console

swh/loader/svn/loader.py
206

tbc, that's for another diff heh.

Why set last_visit as an attribute? It's only needed once when getting the snapshot

Why set last_visit as an attribute? It's only needed once when getting the snapshot

Because right now (storage wise), as we just created a new visit, there is no
snapshot. Thus, when requiring the latest snapshot of the latest visit, we got
None...

I'm currently reflecting on that issue because this is a current
limitation(/regression?) of the storage api/algo modules.

As discussed on irc, there must be an iteration done somewhere, looking from
last visit up until we find what we are looking for (snapshot here). I'm unsure
yet as to what needs to change though...

api:

  • origin-visit-get-latest (which would now iterate other visit status?)
  • origin-visit-status-get-latest (currently taking the visit id parameter, if we remove it, we might be able to search for more data)

algo:

  • snapshot-get-latest which iterates other visit and then visit status up until it finds something worse?

The last entry sounds like the better location as we keep the api dealing with
their respective model objects. What do you think?

Why set last_visit as an attribute? It's only needed once when getting the snapshot

Because right now (storage wise), as we just created a new visit, there is no
snapshot. Thus, when requiring the latest snapshot of the latest visit, we got
None...

We can get the previous visit + snapshot before creating the new visit

  • snapshot-get-latest which iterates other visit and then visit status up until it finds something worse?

worse than what?

The last entry sounds like the better location as we keep the api dealing with
their respective model objects. What do you think?

yes, all loaders need to do the same visit->status->snapshot thing, so it's good to have a function that does it

We can get the previous visit + snapshot before creating the new visit

yes, we could do that also...
I found it less clear than the current separation for some reason.

  • snapshot-get-latest which iterates other visit and then visit status up until it finds something worse?

worse than what?

my bad, i meant something "worth" ("what we are looking for")

The last entry sounds like the better location as we keep the api dealing with
their respective model objects. What do you think?

yes, all loaders need to do the same visit->status->snapshot thing, so it's good to have a function that does it

Yes, I'll do that then ;)

That way that code here could be simplified and look almost the same as before.

yes, all loaders need to do the same visit->status->snapshot thing, so it's good to have a function that does it

Yes, I'll do that then ;)
That way that code here could be simplified and look almost the same as before.

Done in D3330

Rewrite to use the latest swh.storage.algos.snapshot.snapshot_get_latest
implementation

Depends on D3330

Build has FAILED

Patch application report for D3307 (id=11802)

Rebasing onto 600b86bbe9...

Current branch diff-target is up to date.
Changes applied before test
commit b117df6be252b8bc811fbee86c812e2b6cb5de3c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jun 22 15:16:28 2020 +0200

    loader: Read snapshot out of the last origin visit status
    
    Related to T2310

Link to build: https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/21/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/21/console

Drop no longer needed self.last_visit

Also tests won't work as storage 0.7.0 is not released yet (whose part needed
is D3330).

Build has FAILED

Patch application report for D3307 (id=11803)

Rebasing onto 600b86bbe9...

Current branch diff-target is up to date.
Changes applied before test
commit 3337028ae9e93abf573b5332d3a6883411c1a122
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jun 22 15:16:28 2020 +0200

    loader: Read snapshot out of the last origin visit status
    
    Related to T2310

Link to build: https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/22/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/22/console

Build is green

Patch application report for D3307 (id=11803)

Rebasing onto 600b86bbe9...

Current branch diff-target is up to date.
Changes applied before test
commit 3337028ae9e93abf573b5332d3a6883411c1a122
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jun 22 15:16:28 2020 +0200

    loader: Read snapshot out of the last origin visit status
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/23/ for more details.

lgtm, but could you add a test?

That will actually be difficult... i tried (you can look at the diff history here)

My conclusion is that the test scaffolding here needs to be reworked to allow it...
so i'm inclined towards it but not right now...

olasd added inline comments.
swh/loader/svn/loader.py
215–216

Will that ever happen now?

swh/loader/svn/loader.py
215–216

I actually have mistyped the method...
For previous_swh_revision, the type should be Optional[Union[bytes, dict]].

Yes, i think that can still happen... That parameter is coming from the constructor and i did not touched that part.

Build is green

Patch application report for D3307 (id=11829)

Rebasing onto 600b86bbe9...

Current branch diff-target is up to date.
Changes applied before test
commit 1e332e68755a5ecc06e2e8c739f1caba783fc74c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jun 22 15:16:28 2020 +0200

    loader: Read snapshot out of the last origin visit status
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/24/ for more details.

Rework docstring sentence phrasing

Build is green

Patch application report for D3307 (id=11832)

Rebasing onto 600b86bbe9...

Current branch diff-target is up to date.
Changes applied before test
commit 4ee911a462619349bf089774926355b14e0997da
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jun 22 15:16:28 2020 +0200

    loader: Read snapshot out of the last origin visit status
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/25/ for more details.

Add missing returned type to method

Build is green

Patch application report for D3307 (id=11834)

Rebasing onto 600b86bbe9...

Current branch diff-target is up to date.
Changes applied before test
commit 782690bd27d3b78cd039a66b48cba03c3ae5ab54
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jun 22 15:16:28 2020 +0200

    loader: Read snapshot out of the last origin visit status
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/26/ for more details.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 23 2020, 3:23 PM
This revision was automatically updated to reflect the committed changes.