Page MenuHomeSoftware Heritage

initial CVS loader stub
ClosedPublic

Authored by stsp on Jul 12 2021, 6:38 PM.

Details

Reviewers
None
Group Reviewers
Reviewers
Maniphest Tasks
T3691: Implement CVS loader
Commits
rDLDCVS56338c735119: remove rcsparse testmodule; pointless test which tox tried to run
rDLDCVS970e86ad27dd: make 'black' ignore cvsclient.py; forced reformatting is unreadable
rDLDCVS26ab35aa07d1: fix pytest arguments in tox.ini
rDLDCVSea4c7fa15a3e: rcsparse.pyi: 'black' prefers empty line after imports
rDLDCVSc5e647809673: remove cvs2svndump; it fails linter checks and we don't use it
rDLDCVS71157f208bf0: tell flake8 to ignore invalid escape sequences in regex
rDLDCVS3e5f10269f51: tell 'black' to ignore rcsparse and cvs2gitdump sources
rDLDCVS26d6891c6ecf: fix typo in variable name; found by flake8
rDLDCVS9774173fc66f: fix reference to subprocess.TimeoutExpired; found by flake8
rDLDCVSacc88adda271: also add .* to the default exclude list
rDLDCVS9965c6b897ae: reformat cvsclient.py to appease flake8
rDLDCVS87da9cee4c6f: reformat test_loader.py to appease flake8
rDLDCVS91ca5ca35414: avoid use of undefined module vclib; found by flake8
rDLDCVS1c7635b9b3e8: wrap overlong lines in rlog.py to appease flake8
rDLDCVS73ad331915e3: add a comment which documents RCS/CVS branch numbers
rDLDCVSb80aa0c27997: make a line of code more readable
rDLDCVSd6c1817d39c0: document CVS path encoding issues
rDLDCVS871042808269: add an assert to avoid a None type error from mypy
rDLDCVS1dd3f43e9e46: tell flake8 to ignore the 'build' directory
rDLDCVS1080eb83c318: add a very simple .pyi for rcsparse to satisfy mypy
rDLDCVSd5c46c5cb5f2: merge nested functions which yield into single functions
rDLDCVSfa9121756333: simplify use of os.makedirs(); from vlorentz
rDLDCVS883aa39c8c93: fix incomplete docstring of swh_hash_data_per_cvs_changeset
rDLDCVS0fe0e68f29c2: acknowledge code derived from ViewVC in our README file
rDLDCVS52faf13b77f4: convert readme file from .md to .rst
rDLDCVS1c0acc70f1f5: avoid pointless conversion to bytes() and use (x,) tuple idiom
rDLDCVS2a4b87f9999f: Remove an unused import.
rDLDCVSb2fb227a2cf6: fix format string error found by pre-commit hook
rDLDCVS50ae8b09e12e: fix 'ssh' protocol support
rDLDCVSff50851e8183: exclude third-party sources from flake8 checks
rDLDCVS4e65e78bf063: commit reformatting of rlog.py done by pre-commit hook
rDLDCVS0872e9e61663: exclude third-party sources via mypy.ini, not pre-commit conf
rDLDCVSd125c5c7b0f0: the pre-commit hook complained about an unused rcsfile variable
rDLDCVS6548f6ca45dd: commit reformatting performed by pre-commit hook on setup.py
rDLDCVS82e8d797cf0a: wrap overlong lines
rDLDCVS98082f20e7e5: urllib doesn't have stubs for mypy
rDLDCVS5e3904dd50ab: committing reformatting which was performed by pre-commit hooks
rDLDCVSc479b0e26006: fix myphy errors on swh/loader/__init__.py
rDLDCVS0e1e757565ac: exclude upstream python code from pre-commit checks
rDLDCVS07ed9fc946b1: update README
rDLDCVSc04bb816c7f8: remove unused keyword parameter to fix loading of the cvs loader
rDLDCVSba728189a3c5: implement support for import via cvs pserver protocol
rDLDCVS1eaaa52b054c: simply assign to empty lists in order to clear per-changeset data
rDLDCVSc721a06cb62b: remove unused imports
rDLDCVSe7ba43fd93cb: add test case doing an incremental visit
rDLDCVS4ddc08250622: switch log level of an informative progress message from debug to info
rDLDCVS250bb9dbba27: avoid double-parsing of rcsfiles while processing a changeset
rDLDCVS76c249b4a8c3: copy over the pre_cleanup() handler from the SVN loader
rDLDCVS092b5263f936: replace global self.rcs variable with use of local variables
rDLDCVS0424433458bd: remove unused CvsLoader class members and constructor arguments
rDLDCVS89d7b0fdd2ef: cvs2gitdump: avoid parsing rcs files inside expand_keyword()
rDLDCVS551a12e24cb8: drop support for using previous snapshots as base for new ones
rDLDCVS71f22dbe195f: change logging level of per-revision info from DEBUG to INFO
rDLDCVScf5cf4594d05: link revisions to their parents and add another small test
rDLDCVSae77fa161841: test two consecutive visits
rDLDCVSedc28c215955: fix rsync URL processing in fetch_cvs_repo_with_rsync()
rDLDCVSed460a3280fe: do not call self.storage.revision_get() more often than necessary
rDLDCVScdbcdcc92e61: enable check_snapshot() in test_loader_cvs_visit()
rDLDCVS5097bf8eaab1: use an iterator to process swh revisions; reduces memory usage
rDLDCVS496ff263546a: the trivial first visit test is passing now
rDLDCVS49ce9a24d6f9: create subdirectories in the work tree
rDLDCVS6a2f6e50f61e: ignore ENOENT when removing files
rDLDCVS358a05d59956: add converted CVS revisions to storage
rDLDCVS59a7ff18d603: document fetch_data() method
rDLDCVSaf30f67f7403: populate a work tree with files checked out from the repository
rDLDCVS605d8f5862b8: get the 'prepare' step working
rDLDCVS15252a6ce822: force our local rcsparse dependency; upstream does not yet support py3
rDLDCVS799eea5a7b68: get fetch_data working
rDLDCVS757ce82f02eb: remove bogus import
rDLDCVS9dec8a2e4fdc: rcsparse: add missing allocation failure check in parsetoken()
rDLDCVS28cad58a0f32: add stub for an initial test
rDLDCVSaaec7d33bf74: remove the 'foo' module and references to it
rDLDCVSecf402cf6249: compile rcsparse extension
rDLDCVS9e3170f81c2b: rcsparse python3 support patches from OpenBSD ports
rDLDCVS75fe9f5ea285: Add 'swh/loader/cvs/rcsparse/'
rDLDCVSbb4362177743: Add 'swh/loader/cvs/cvs2gitdump/'
rDLDCVS65995f14b4c6: initial CVS loader stub
Required Signatures
L3 Software Heritage Contributor License Agreement, version 1.0
Summary

Add 'swh/loader/cvs/cvs2gitdump/'

Obtained from commit 301a72682d92b11d809eb7476a21ac354b826beb of
repository https://github.com/yasuoka/cvs2gitdump

Add 'swh/loader/cvs/rcsparse/'

Obtained from commit 206bca0b90f5780815c0b6c6cbccfd03f27f6985 of
repository https://github.com/corecode/rcsparse

rcsparse python3 support patches from OpenBSD ports

https://cvsweb.openbsd.org/cgi-bin/cvsweb/ports/devel/py-rcsparse/patches/
$OpenBSD: patch-py-rcsparse_c,v 1.4 2021/02/18 03:35:07 yasuoka Exp $
$OpenBSD: patch-testmodule_py,v 1.2 2021/02/18 03:35:07 yasuoka Exp $

compile rcsparse extension

remove the 'foo' module and references to it

rcsparse: add missing allocation failure check in parsetoken()

add stub for an initial test

remove bogus import

get the 'prepare' step working

force our local rcsparse dependency; upstream does not yet support py3

get fetch_data working

populate a work tree with files checked out from the repository

ignore ENOENT when removing files

create subdirectories in the work tree

document fetch_data() method

add converted CVS revisions to storage

the trivial first visit test is passing now

enable check_snapshot() in test_loader_cvs_visit()

test two consecutive visits

fix rsync URL processing in fetch_cvs_repo_with_rsync()

The last path component of the URL corresponds to the CVS module name,
and the CVSROOT directory is expected to be a path-wise sibling of
this module.

do not call self.storage.revision_get() more often than necessary

use an iterator to process swh revisions; reduces memory usage

change logging level of per-revision info from DEBUG to INFO

link revisions to their parents and add another small test

remove unused CvsLoader class members and constructor arguments

drop support for using previous snapshots as base for new ones

replace global self.rcs variable with use of local variables

cvs2gitdump: avoid parsing rcs files inside expand_keyword()

avoid double-parsing of rcsfiles while processing a changeset

copy over the pre_cleanup() handler from the SVN loader

remove unused imports

simply assign to empty lists in order to clear per-changeset data

switch log level of an informative progress message from debug to info

add test case doing an incremental visit

implement support for import via cvs pserver protocol

remove unused keyword parameter to fix loading of the cvs loader

update README

document how tests can be run

the loader expects a URL argument

Related to T2845

Diff Detail

Repository
rDLDCVS CVS Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

ardumont removed comments

I've cleaned up old and now useless comments about build failing as the diff is painlessly and worthlessly long due to them.

Build has FAILED

I've updated the container to have the cvs dependency so this should unstuck the build once the container have been recreated (ongoing).

I'll trigger a build when it's ready.


I've also updated the diff with the task description link so this diff is found back more easily.

Build is green

Patch application report for D5988 (id=22736)

Rebasing onto 29bd1ed54e...

Current branch diff-target is up to date.
Changes applied before test
commit 56338c7351192927c9295313388f38457eb72d88
Author: Stefan Sperling <stsp@stsp.name>
Date:   Thu Sep 16 11:08:30 2021 +0200

    remove rcsparse testmodule; pointless test which tox tried to run

commit 970e86ad27dd8fa01a23a307cf7ab694140a7fea
Author: Stefan Sperling <stsp@stsp.name>
Date:   Thu Sep 16 11:02:05 2021 +0200

    make 'black' ignore cvsclient.py; forced reformatting is unreadable

commit ea4c7fa15a3eb7cc88bc8254794e1e8a4b771177
Author: Stefan Sperling <stsp@stsp.name>
Date:   Thu Sep 16 10:55:51 2021 +0200

    rcsparse.pyi: 'black' prefers empty line after imports

commit 3e5f10269f515202aa4f594f1bd3a89a5f9d3540
Author: Stefan Sperling <stsp@stsp.name>
Date:   Thu Sep 16 10:54:34 2021 +0200

    tell 'black' to ignore rcsparse and cvs2gitdump sources

commit c5e647809673a25c5d38fad1805b5c3ab0f59930
Author: Stefan Sperling <stsp@stsp.name>
Date:   Thu Sep 16 10:43:16 2021 +0200

    remove cvs2svndump; it fails linter checks and we don't use it

commit 26ab35aa07d1dae366c97b76aed24eded6e174a9
Author: Stefan Sperling <stsp@stsp.name>
Date:   Thu Sep 16 10:42:33 2021 +0200

    fix pytest arguments in tox.ini

commit 71157f208bf06f9ad85ba83e7a265afaccefbe01
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 16:13:05 2021 +0200

    tell flake8 to ignore invalid escape sequences in regex

commit 26d6891c6ecf1b5087a7a36c5838ac14be9086fa
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 16:06:41 2021 +0200

    fix typo in variable name; found by flake8

commit 9774173fc66fbd28bcb4cb99a2033f3763d2c7cc
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 16:06:21 2021 +0200

    fix reference to subprocess.TimeoutExpired; found by flake8

commit 91ca5ca35414783e8a9d13096cbd356a200b5d93
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 16:04:03 2021 +0200

    avoid use of undefined module vclib; found by flake8

commit 87da9cee4c6f65b574d0c9eb812c9f42b1f7d641
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 16:01:06 2021 +0200

    reformat test_loader.py to appease flake8

commit 9965c6b897ae986e57eba0ce30a13c3694a8cf60
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 15:57:10 2021 +0200

    reformat cvsclient.py to appease flake8

commit 1c7635b9b3e82fa352e8166544c9f08874ee4907
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 15:45:04 2021 +0200

    wrap overlong lines in rlog.py to appease flake8

commit acc88adda27184e0eb3ecf022336b347df932022
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 15:41:40 2021 +0200

    also add .* to the default exclude list
    
    Otherwise flake8 on jenkins won't skip the .tox dir.
    
    Found by vlorentz.

commit 1dd3f43e9e46a439064fab664666132b38b702a6
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 15:31:03 2021 +0200

    tell flake8 to ignore the 'build' directory

commit b80aa0c2799731cb2461a921f363c14f03052049
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 15:10:13 2021 +0200

    make a line of code more readable
    
    Suggested by vlorentz.

commit d6c1817d39c0e3574e1fb5e42d25be6015e97418
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 15:09:03 2021 +0200

    document CVS path encoding issues
    
    Suggested by vlorentz.

commit 73ad331915e31283fbde49618e41eb3eef3c5f0d
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 14:55:21 2021 +0200

    add a comment which documents RCS/CVS branch numbers
    
    Suggested by vlorentz.

commit 871042808269120aa241cb299a97700081475ef2
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 14:14:32 2021 +0200

    add an assert to avoid a None type error from mypy

commit 1080eb83c31873b4d65c5182a176c28f6ed2c986
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 14:11:41 2021 +0200

    add a very simple .pyi for rcsparse to satisfy mypy

commit d5c46c5cb5f23de6dc8bd3b5bc21e362ae6abfc8
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Sep 15 13:51:26 2021 +0200

    merge nested functions which yield into single functions
    
    Merge swh_hash_data_per_cvs_changeset() + process_cvs_changesets().
    Merge swh_hash_data_per_cvs_rlog_changeset() + process_cvs_rlog_changesets().
    
    Makes the code easier to follow and makes return values typed.
    
    Also, cvs_changesets does not need to be a class member.
    Make it a local variable instead. Pointed out by mypy.
    
    Suggested by vlorentz.

commit fa9121756333881b5532716938788df56434ef13
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Sep 14 14:10:57 2021 +0200

    simplify use of os.makedirs(); from vlorentz

commit 883aa39c8c93a7668c86b3df4847a541d25e4295
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Sep 14 14:09:00 2021 +0200

    fix incomplete docstring of swh_hash_data_per_cvs_changeset
    
    Found by vlorentz

commit 1c0acc70f1f5611476c58b95dad607c9fa6833a9
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Sep 14 14:06:26 2021 +0200

    avoid pointless conversion to bytes() and use (x,) tuple idiom
    
    From vlorentz

commit 52faf13b77f4ebcdbe97f641368859754abed8e9
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Sep 14 14:03:39 2021 +0200

    convert readme file from .md to .rst

commit 0fe0e68f29c289743d62bef01c87ccc3d5435212
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Jul 14 17:06:59 2021 +0200

    acknowledge code derived from ViewVC in our README file

commit 2a4b87f9999f6c71a6bd2e212835be8bea414637
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Jul 14 16:33:04 2021 +0200

    Remove an unused import.
    
    The pre-commit hook decided to reformat of the line again after
    it was shortened. Apply this reformatting change, too.

commit 4e65e78bf0638f92e18b3530ed7b5540a076a134
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Jul 14 16:24:40 2021 +0200

    commit reformatting of rlog.py done by pre-commit hook
    
    This file contains a mix of our own code and some code inherited from ViewVC.
    Parts derived from ViewVC now no longer match formatting used by upstream.
    But automatic merges from upstream were already impossible anyway.

commit ff50851e8183098a1ab33f7118c355e0c5484928
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Jul 14 14:56:38 2021 +0200

    exclude third-party sources from flake8 checks

commit 0872e9e61663dd71727866d0b41f1d458aa92bf8
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Jul 14 13:34:50 2021 +0200

    exclude third-party sources via mypy.ini, not pre-commit conf

commit 50ae8b09e12e5f4e40430b7f4ff15393e2be0a81
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Jul 14 10:47:26 2021 +0200

    fix 'ssh' protocol support

commit d125c5c7b0f07ddf8d495d63fdc226eb57450ee6
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Jul 14 10:41:24 2021 +0200

    the pre-commit hook complained about an unused rcsfile variable

commit b2fb227a2cf61ecdf6b257c92b922c5cd8eec93b
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Jul 14 10:41:00 2021 +0200

    fix format string error found by pre-commit hook

commit 6548f6ca45dd98fdf9cf56e12fd5e7c904867359
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Jul 14 10:40:11 2021 +0200

    commit reformatting performed by pre-commit hook on setup.py

commit 82e8d797cf0a916ada22172562bcbb122846b0f1
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Jul 13 17:14:50 2021 +0200

    wrap overlong lines

commit c479b0e260068157abce5a98b1b49300c036c981
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Jul 13 17:11:42 2021 +0200

    fix myphy errors on swh/loader/__init__.py

commit 98082f20e7e52256531b0b097f22bd8e149d64bc
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Jul 13 17:10:48 2021 +0200

    urllib doesn't have stubs for mypy

commit 0e1e757565ac3f47e2cce72bb5538eac98a85ff7
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Jul 13 16:55:21 2021 +0200

    exclude upstream python code from pre-commit checks

commit 5e3904dd50ab76cf7383254bce1eb5c2fff96812
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Jul 13 16:27:37 2021 +0200

    committing reformatting which was performed by pre-commit hooks

commit 07ed9fc946b1ea87e12d02fad36783d1eec38536
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 12 18:16:50 2021 +0200

    update README

commit c04bb816c7f8ac19a4101f5647b7de0c3f7af4a8
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 12 16:35:01 2021 +0200

    remove unused keyword parameter to fix loading of the cvs loader

commit ba728189a3c5160c4fec39d347a13a9d676d8373
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    implement support for import via cvs pserver protocol

commit e7ba43fd93cb72ca44168317b394ebfa9076b902
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    add test case doing an incremental visit

commit 4ddc08250622718702358afdcdd89edcb69f2dc8
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    switch log level of an informative progress message from debug to info

commit 1eaaa52b054cd4a5411952f46054c7fccab5b64f
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    simply assign to empty lists in order to clear per-changeset data

commit c721a06cb62b769b7493bd5d41d08ce7391fe784
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    remove unused imports

commit 76c249b4a8c35a3b98cd47d7e1fa7c93dcb439f1
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    copy over the pre_cleanup() handler from the SVN loader

commit 250bb9dbba27850ba3d16ddba10d7078063dab16
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    avoid double-parsing of rcsfiles while processing a changeset

commit 89d7b0fdd2ef1b16e5c0cbd1c83689dd9eaea623
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    cvs2gitdump: avoid parsing rcs files inside expand_keyword()

commit 092b5263f936afeb741204f7d5bc8ade69fa474b
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    replace global self.rcs variable with use of local variables

commit 551a12e24cb8f5aecc1f9b0b194a7758ca041ee2
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    drop support for using previous snapshots as base for new ones

commit 0424433458bd10f5d14f1bc3cc1a4ae65bb38418
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    remove unused CvsLoader class members and constructor arguments

commit cf5cf4594d05526ea872faacfa104b1d32e784b5
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    link revisions to their parents and add another small test

commit 71f22dbe195fccc9614cbd126dfcfb8f488bc0fa
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    change logging level of per-revision info from DEBUG to INFO

commit 5097bf8eaab1f583a9810dbffecabbe2146fffe6
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    use an iterator to process swh revisions; reduces memory usage

commit ed460a3280fe85366581fe4ac6601cc6eb4be99e
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    do not call self.storage.revision_get() more often than necessary

commit edc28c2159553bb51d5355acd903888ea120a751
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    fix rsync URL processing in fetch_cvs_repo_with_rsync()
    
    The last path component of the URL corresponds to the CVS module name,
    and the CVSROOT directory is expected to be a path-wise sibling of
    this module.

commit ae77fa16184141721d94216b024597e05dd40849
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    test two consecutive visits

commit cdbcdcc92e619be7f2f5617b35da2c6b760adb5c
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    enable check_snapshot() in test_loader_cvs_visit()

commit 496ff263546a1f037a39a9f069475a5b8787a8f2
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    the trivial first visit test is passing now

commit 358a05d59956491240cccaa06fcc17db7d0e3015
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    add converted CVS revisions to storage

commit 59a7ff18d6030d691ad9dd61607d7711d55c531f
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    document fetch_data() method

commit 49ce9a24d6f93e20c911684826b8c625314c3020
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    create subdirectories in the work tree

commit 6a2f6e50f61e353429ed3018a3de62d0c3046147
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    ignore ENOENT when removing files

commit af30f67f740345e2d85a900c43f47690915b5b6b
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    populate a work tree with files checked out from the repository

commit 799eea5a7b6830c5a38c900fdf9b1d88ccb958f4
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    get fetch_data working

commit 15252a6ce822cb78a5da37745459dcd08f6e9279
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    force our local rcsparse dependency; upstream does not yet support py3

commit 605d8f5862b81a47a99db7180652fe9d2898a0e1
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    get the 'prepare' step working

commit 757ce82f02eba97703b7a94f44c8dfad4ce78b8e
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    remove bogus import

commit 28cad58a0f325b977d6c2a0e3079cb74df34cad7
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    add stub for an initial test

commit 9dec8a2e4fdc2be70d41c992190e75fc5afa8662
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    rcsparse: add missing allocation failure check in parsetoken()

commit aaec7d33bf74a83eaefaaecc914731bcbf9a1957
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 14:51:43 2021 +0200

    remove the 'foo' module and references to it

commit ecf402cf624994da6aeb21539c0dbb1efcccb57e
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    compile rcsparse extension

commit 9e3170f81c2b5a7cfc93abf3358fd5e505dcc133
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    rcsparse python3 support patches from OpenBSD ports
    
    https://cvsweb.openbsd.org/cgi-bin/cvsweb/ports/devel/py-rcsparse/patches/
    $OpenBSD: patch-py-rcsparse_c,v 1.4 2021/02/18 03:35:07 yasuoka Exp $
    $OpenBSD: patch-testmodule_py,v 1.2 2021/02/18 03:35:07 yasuoka Exp $

commit 75fe9f5ea285a31b234983011743bde3d10c4c97
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    Add 'swh/loader/cvs/rcsparse/'
    
    Obtained from commit 206bca0b90f5780815c0b6c6cbccfd03f27f6985 of
    repository https://github.com/corecode/rcsparse

commit bb436217774344e00c57bb927491af5d02526776
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    Add 'swh/loader/cvs/cvs2gitdump/'
    
    Obtained from commit 301a72682d92b11d809eb7476a21ac354b826beb of
    repository https://github.com/yasuoka/cvs2gitdump

commit 65995f14b4c658e3113380688f1c97ed67f22008
Author: Stefan Sperling <stsp@stsp.name>
Date:   Mon Jul 5 13:41:49 2021 +0200

    initial CVS loader stub

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/12/ for more details.

stsp requested review of this revision.Sep 16 2021, 11:59 AM
This revision was not accepted when it landed; it landed in state Needs Review.Sep 17 2021, 2:49 PM
This revision was automatically updated to reflect the committed changes.

batch of comment I drafted:

(Phabricator once again lost track of what file they are attached too...)

swh/loader/cvs/loader.py
138–140

I think you can remove this part of the docstring; it's redundant with the type annotation

143–150

can you use longer variable names?

154

this is going to be noisy otherwise

170–174

this should work too

178–182
264

Are all names guaranteed to be ascii?

348–352
359–369

you missed this comment

405

the exception and its traceback are automatically added by self.log.exception to the message