Page MenuHomeSoftware Heritage

loader: mark visit as 'not_found' when relevant
ClosedPublic

Authored by vsellier on Tue, Feb 9, 5:09 PM.

Details

Summary

Related to T3030

Test Plan

tox failing until swh.loader.core > 0.16 released

Diff Detail

Repository
rDLDHG Mercurial loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D5052 (id=18006)

Could not rebase; Attempt merge onto 1965d6ec1e...

Updating 1965d6e..16970b5
Fast-forward
 requirements-swh.txt                      |  2 +-
 swh/loader/mercurial/loader.py            | 57 +++++++++++++++++--------------
 swh/loader/mercurial/tests/test_loader.py | 44 ++++++++++++++++++++++--
 3 files changed, 74 insertions(+), 29 deletions(-)
Changes applied before test
commit 16970b5405303519402f1119572de176830740fd
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Feb 9 17:01:15 2021 +0100

    loader: mark visit as 'not_found' when relevant
    
    Related to T3030

commit 1f96493ef318ed02a674efb4885c2d8d547154e0
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Feb 9 12:28:51 2021 +0100

    loader: Mark visit status as failed when relevant
    
    Related to T3030

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/140/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/140/console

Harbormaster returned this revision to the author for changes because remote builds failed.Tue, Feb 9, 5:09 PM
Harbormaster failed remote builds in B19126: Diff 18006!

Build is green

Patch application report for D5052 (id=18006)

Could not rebase; Attempt merge onto 1965d6ec1e...

Updating 1965d6e..16970b5
Fast-forward
 requirements-swh.txt                      |  2 +-
 swh/loader/mercurial/loader.py            | 57 +++++++++++++++++--------------
 swh/loader/mercurial/tests/test_loader.py | 44 ++++++++++++++++++++++--
 3 files changed, 74 insertions(+), 29 deletions(-)
Changes applied before test
commit 16970b5405303519402f1119572de176830740fd
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Feb 9 17:01:15 2021 +0100

    loader: mark visit as 'not_found' when relevant
    
    Related to T3030

commit 1f96493ef318ed02a674efb4885c2d8d547154e0
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Feb 9 12:28:51 2021 +0100

    loader: Mark visit status as failed when relevant
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/142/ for more details.

vlorentz added a subscriber: vlorentz.

This is too generic, this could raise "NotFound" for any number of reasons, eg. no disk space left or corrupted repo.

the try...except block should only wrap the call to self.clone_with_timeout, and be more specific about the exceptions to catch.

This revision now requires changes to proceed.Thu, Feb 11, 12:15 PM

The not_found status is now only set when the repository is really not found.

This changes was hard to test and we have lost a lot of time because we have
encountered a hidden bug relative the the timeout management.
The CommandError was never retrieved from the queue due to a
deserialization issue (a constructor supporting no parameters is
mandatory). We solved that by using a dedicated exception with a constructor
with an optional parameter.

This is a link to the errors catched in sentry: https://sentry.softwareheritage.org/share/issue/1fb005d7f6b94132a1c12d74eb9e0649/

Build is green

Patch application report for D5052 (id=18097)

Rebasing onto 1f96493ef3...

Current branch diff-target is up to date.
Changes applied before test
commit d2d77ef4e5c15bb0bf1a67231d39de5d549e44dc
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Feb 9 17:01:15 2021 +0100

    loader: mark visit as 'not_found' when relevant
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/143/ for more details.

Fix tests to increase code coverage

Build is green

Patch application report for D5052 (id=18100)

Rebasing onto 1f96493ef3...

Current branch diff-target is up to date.
Changes applied before test
commit aef6f4dec652fbf64c2af994efb2597a7227de53
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Feb 9 17:01:15 2021 +0100

    loader: mark visit as 'not_found' when relevant
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/144/ for more details.

vlorentz added inline comments.
swh/loader/mercurial/loader.py
79–81

Could you briefly explain what this exception is for? (eg. error raised when calling hg`, with the returned message as the err attribute)

83–84

Could you make it bytes? We already use bytes everywhere and the only difference afaict is mutability.

278–282

Did you check these messages don't change depending on $LANG? If yes, we may need to change how we call hglib

284

maybe?

(from None excludes the stack trace of the CommandErrorWrapper exception from the new one, and I don't think we need it here)

swh/loader/mercurial/tests/test_loader.py
262–278

good test, thanks :)

263
This revision now requires changes to proceed.Mon, Feb 15, 10:36 AM

Adapt according the review

swh/loader/mercurial/loader.py
278–282

It seems it's not locale sensible :

In [7]: import locale
In [8]: locale.setlocale(locale.LC_ALL, 'fr_FR')
Out[8]: 'fr_FR'
In [9]: import hglib
In [10]: hglib.clone("/fake")
---------------------------------------------------------------------------
CommandError                              Traceback (most recent call last)
<ipython-input-10-97847294dd82> in <module>
----> 1 hglib.clone("/fake")

~/src/swh/swh-environment/.venv/lib/python3.7/site-packages/hglib/__init__.py in clone(source, dest, noupdate, updaterev, rev, branch, pull, uncompressed, ssh, remotecmd, insecure, encoding, configs)
     36     out, err = proc.communicate()
     37     if proc.returncode:
---> 38         raise error.CommandError(args, proc.returncode, out, err)
     39 
     40     return client.hgclient(dest, encoding, configs, connect=False)

CommandError: (255, b'', b'abandon\xc2\xa0: repository /fake not found')

----> 1 hglib.clone("http://github.com/softwareheritage/swh-core")




~/src/swh/swh-environment/.venv/lib/python3.7/site-packages/hglib/__init__.py in clone(source, dest, noupdate, updaterev, rev, branch, pull, uncompressed, ssh, remotecmd, insecure, encoding, configs)
     36     out, err = proc.communicate()
     37     if proc.returncode:
---> 38         raise error.CommandError(args, proc.returncode, out, err)
     39 
     40     return client.hgclient(dest, encoding, configs, connect=False)

CommandError: (255, b'', b'real URL is https://github.com/softwareheritage/swh-core\nabandon\xc2\xa0: \'http://github.com/softwareheritage/swh-core\' does not appear to be an hg repository:\n---%<--- (text/html; charset=utf-8)\n\n\n\n\n\n\n<!DOCTYPE html>\n<html lang="en">\n  <head>\n    <meta charset="utf-8">\n  <link rel="dns-prefetch" href="https://github.githubassets.com">\n  <link rel="dns-prefetch" href="https://avatars.githubusercontent.com">\n  <link rel="dns-prefetch" href="https://github-cloud.s3.amazonaws.com">\n  <link rel="dns-prefetch" href="https://user-images.githubusercontent.com/">\n\n\n\n  <link crossorigin="anonymous" media="all" integrity="sha512-rF3cnLJE5IkKUWFkw54emxUMV82DhbZ9aJun83zhvBgJ7J7ZXC20bEFVuLY9RRRC60Ig+pHQO57DuYBrYO+cAA==" rel="stylesheet" href="https://github.githubassets.com/assets/frameworks-ac5ddc9cb244e4890a516164c39e1e9b.css" />\n  <link crossorigin="anonymous" media="all" integrity="sha512-tO1butB3aXG+Ab9M+171Fjde3B2uzMU0DEAKzjbXJ0GLJWfiaIVEhM9QS3/G9Ck32IEZLmaSTscoyA9Z66IglQ==" rel="stylesheet" href="https://github.githubassets.com/assets/site-b4ed5bbad0776971be01bf4cfb5ef516.css" />\n    <link crossorigin="anonymous" media="all" integrity="sha512-QbKgFXj+JoU12QsMYLRWqW9sWAzGHCCMC7FlsHunxzfLL4jGwfsmyAtbn4F/deHyB\n---%<---')

Build is green

Patch application report for D5052 (id=18108)

Rebasing onto 1f96493ef3...

Current branch diff-target is up to date.
Changes applied before test
commit 4c84a86b371e955f57649be24ccd344721b4110d
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Feb 9 17:01:15 2021 +0100

    loader: mark visit as 'not_found' when relevant
    
    Related to T3030

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/145/ for more details.

This revision is now accepted and ready to land.Mon, Feb 15, 4:21 PM