Page MenuHomeSoftware Heritage

Refactor svn loader to respect loader-core's interface (fetch_data/store_data)
ClosedPublic

Authored by ardumont on Sep 20 2018, 6:15 PM.

Details

Summary

This now respects the fetch_data/store_data pattern from the
loader-core.

So fetch_data is only one iteration on one svn revision. In that
call, we consume one svn revision as before and compute the
contents/directories/revision. The result is stored in internal
instance variable (as in pypi/debian loader for example).

And store_data, also in the same iteration, is in charge of passing
those data to the storage.

If the iteration is over, we build the snapshot and stops appropriately.

The internal computations did not change.

Note: I'm willing to rebase all this on @anlambert's current work to improve the loading speed (D433)

Test Plan

make test

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Branch
refactor-svn
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 1422
Build 1766: arc lint + arc unit

Event Timeline

ardumont edited the summary of this revision. (Show Details)
ardumont added a project: SVN Loader.
  • docs: Remove old comparison file
ardumont retitled this revision from Refactor svn loader to clarify what it does to Refactor svn loader to respect loader-core's interface (fetch_data/store_data).Sep 20 2018, 6:36 PM
ardumont added inline comments.
swh/loader/svn/loader.py
400

Removing this check block will speed up the loader.
Especially on big repositories.

This was added because at the time, the reason of the hash divergence detected was not yet known (T570).

swh/loader/svn/producer.py
112

For info, that will impact the scheduler's setup (when we'll package this)

swh/loader/svn/tests/test_loader.py
115

All this green is needed because now we test the full loading of the loader (and not some internal hidden implementation detail, the old and now deleted process_repository method).

132

That's a gory detail.
I added it as inheritance because that allowed to split the setup somewhere else once and not be bothered by it afterwards.

805

That's was what i referred to orally.
Before that diff, we grouped the svn revisions into groups of (see config).

The test's setup was group of 3.
So the svn revisions read were groups of 3.

The error was detected in the last iteration (6th).
Thus, the error was detected in the 6th iteration. Meaning the last revision was the 18th.

Now, we no longer group the revision, we read them 1 by 1.
In effect, we read 2 more revisions before stopping due to the error.
The last revision is the 20th (since there is 21 commits in the tested repository and it's the last one with the external thing).

So we have 2 more revisions here (2 more green lines)

Really nice rework! Implementing fetch_data / store_data really helps to better understand the loader processing.
I added a couple of comments but it's all good for me.

swh/loader/svn/loader.py
102

I would rather initialize the svnrepo to None in the constructor then test again self.svnrepo here.

400

Instead of dropping it, maybe enabling/disabling it through configuration would be a better choice. This is useful for debugging so I think we should keep it.

This revision is now accepted and ready to land.Sep 21 2018, 11:41 AM
ardumont added inline comments.
swh/loader/svn/loader.py
102

Agreed.
As this is already done, in your diff, i'll let you fix it.

400

Even better suggestion! Agreed.

swh/loader/svn/tests/test_loader.py
805

Sorry for the repetition here... Must have been tired (T.T)

ardumont marked an inline comment as done.

Amend and improve the svnrepo initialization

  • test_loader: Unplug storage
  • svn.loader: cosmetic: Reduce number of lines
  • docs: Remove old notes
  • svn.loader: Improve svnrepo initialization

Really nice rework!

Thanks ;)

swh/loader/svn/loader.py
102

I did it as it annoyed me too much, thx.

This revision was automatically updated to reflect the committed changes.
ardumont marked an inline comment as done.

Note: I'm willing to rebase all this on @anlambert's current work to improve the loading speed (D433)

Well, we agreed to do it the other way around (@anlambert wants to improve further the diff \m/).