Page MenuHomeSoftware Heritage

loader: Add logs displaying path differences after revision replay
ClosedPublic

Authored by anlambert on Nov 22 2022, 5:40 PM.

Details

Summary

When a tree computation divergence is detected after replaying a revision
add debug logs displaying the paths that differ or are missing between the
reconstructed repository filesystem and the exported one at that specific
revision.

It should help to gain some time when debugging such issues.

Diff Detail

Repository
rDLDSVN Subversion (SVN) loader
Branch
tree-divergence-debug-helper
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 32919
Build 51599: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 51598: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8872 (id=31992)

Rebasing onto 04566a7f36...

Current branch diff-target is up to date.
Changes applied before test
commit a843858b0c2f34ba144d164e0dba97e46611aade
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Nov 22 17:33:06 2022 +0100

    loader: Add logs displaying path differences after revision replay
    
    When a tree computation divergence is detected after replaying a revision
    add debug logs displaying the paths that differ or are missing between the
    reconstructed repository filesystem and the exported one at that specific
    revision.
    
    It should help to gain some time when debugging such issues.

See https://jenkins.softwareheritage.org/job/DLDSVN/job/tests-on-diff/355/ for more details.

ardumont added a subscriber: ardumont.

Awesome!

one non blocking suggestion inline

swh/loader/svn/tests/test_loader.py
2347

Those are the top-level directory which detects the divergence too due to the way we compute the hashes.
Maybe, it'd be worth trying to filter those out otherwise, we could have a hard time parsing through the debug log?

This revision is now accepted and ready to land.Nov 22 2022, 6:08 PM
swh/loader/svn/tests/test_loader.py
2347

I thought about it too, we should only check hash difference for contents indeed.

swh/loader/svn/tests/test_loader.py
2347

It's not blocking (if it's a bit hard-ish to do immediately), you can always land this now and iterate other this in another diff (as you wish heh ;)

swh/loader/svn/tests/test_loader.py
2347

In fact, we need to keep those checks as it can happen that the directory model from the replay module missed some hash updates when copying files/directories, see below a bug that I am tracking currently:

DEBUG:swh.loader.svn.loader.SvnLoader:rev: 10364, swhrev: 8538c37dccb886ca848151787174d022993485c7, dir: 816e429fd284c5775b97c95ab723e302c44c6f55
DEBUG:swh.loader.svn.loader.SvnLoader:Checking hash computations on revision 10364...
DEBUG:swh.loader.svn.svn:svn export -r 10364 --depth infinity --ignore-keywords file:///home/anlambert/tmp/codeblocks_repo /tmp/swh.loader.svn.gji1ywo9-3153480/check-revision-10364.prxv4dgw/codeblocks_repo
DEBUG:swh.loader.svn.svn:cleanup /tmp/swh.loader.svn.gji1ywo9-3153480/check-revision-10364.prxv4dgw
DEBUG:swh.loader.svn.loader.SvnLoader:directory with path b'trunk' has different hash in reconstructed repository filesystem
DEBUG:swh.loader.svn.loader.SvnLoader:directory with path b'trunk/src' has different hash in reconstructed repository filesystem
DEBUG:swh.loader.svn.loader.SvnLoader:directory with path b'trunk/src/sdk' has different hash in reconstructed repository filesystem
DEBUG:swh.loader.svn.loader.SvnLoader:content with path b'trunk/src/sdk/filemanager.cpp' has different hash in reconstructed repository filesystem
DEBUG:swh.loader.svn.loader.SvnLoader:directory with path b'branches' has different hash in reconstructed repository filesystem
DEBUG:swh.loader.svn.loader.SvnLoader:directory with path b'branches/scintilla_3_5_x' has different hash in reconstructed repository filesystem
DEBUG:swh.loader.svn.loader.SvnLoader:directory with path b'branches/scintilla_3_5_x/src' has different hash in reconstructed repository filesystem
DEBUG:swh.loader.svn.loader.SvnLoader:directory with path b'branches/scintilla_3_5_x/src/plugins' has different hash in reconstructed repository filesystem
DEBUG:swh.loader.svn.loader.SvnLoader:directory with path b'branches/scintilla_3_5_x/src/plugins/contrib' has different hash in reconstructed repository filesystem
ERROR:swh.loader.svn.loader.SvnLoader:Hash tree computation divergence detected at revision 10364 (816e429fd284c5775b97c95ab723e302c44c6f55 != 576b09a85b208eb2439148b2be49a6fd7365cf32), stopping!
Traceback (most recent call last):
  File "/home/anlambert/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 476, in fetch_data
    data = next(self.swh_revision_gen)
  File "/home/anlambert/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 393, in process_svn_revisions
    self._check_revision_divergence(rev, dir_id, root_directory)
  File "/home/anlambert/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 343, in _check_revision_divergence
    raise ValueError(err)
ValueError: Hash tree computation divergence detected at revision 10364 (816e429fd284c5775b97c95ab723e302c44c6f55 != 576b09a85b208eb2439148b2be49a6fd7365cf32), stopping!