Page MenuHomeSoftware Heritage

support custom keywords during rsync:// conversion
ClosedPublic

Authored by stsp on Dec 8 2021, 3:53 PM.

Details

Summary

CVS supports the definition of custom keywords. A common use case
for custom keywords is to use the project name as a keyword. This
avoids confusion when files are copied between projects using CVS,
in case files contain a keyword that is in use by both projects.
In other words, a file will retain its expanded custom keyword from
project A, allowing to trace the initial file version back to its
origin, after the file was copied into project B's CVS repository.

This feature is in active use by OpenBSD and NetBSD, for example.
Existing conversions of their CVS repositories to Git expand
the corresponding custom keywords as well, and so should we.
Historically, X11 and FreeBSD were also using custom keywords.

During conversion via rsync:// we copy the CVSROOT directory and the
desired CVS module from the rsync server. The file CVSROOT/config
contains directives which configure the use of custom keywords.
Parse this file and expand keywords accordingly when checking out
versions of files from our local copy of the CVS repository.

For now, we only support custom keywords which correspond to the
Id keyword since this is known to be in common use by projects.
The latest releases of CVS (1.12.x) have optional support for arbitrary
keyword aliases via custom keywords. Support for this could be added
later, should there be a need to do so. In any case, the pserver access
method already supports arbitrary custom keywords because such keywords
will be expanded by the CVS server when we check out files from it.

While here, optimize our use of rsync a bit.
Fetch only CVSROOT and the desired CVS module over rsync, rather
than fetching the entire CVS repository directory, which may contain
unrelated CVS modules that require disk space but will not be used.

Diff Detail

Repository
rDLDCVS CVS Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D6791 (id=24633)

Rebasing onto 965629d6c8...

Current branch diff-target is up to date.
Changes applied before test
commit fd42519c0c0da0c72b62ae26d7d0b264c4623a2e
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Dec 7 15:23:34 2021 +0100

    support custom keywords during rsync:// conversion
    
    CVS supports the definition of custom keywords. A common use case
    for custom keywords is to use the project name as a keyword. This
    avoids confusion when files are copied between projects using CVS,
    in case files contain a keyword that is in use by both projects.
    In other words, a file will retain its expanded custom keyword from
    project A, allowing to trace the initial file version back to its
    origin, after the file was copied into project B's CVS repository.
    
    This feature is in active use by OpenBSD and NetBSD, for example.
    Existing conversions of their CVS repositories to Git expand
    the corresponding custom keywords as well, and so should we.
    Historically, X11 and FreeBSD were also using custom keywords.
    
    During conversion via rsync:// we copy the CVSROOT directory and the
    desired CVS module from the rsync server. The file CVSROOT/config
    contains directives which configure the use of custom keywords.
    Parse this file and expand keywords accordingly when checking out
    versions of files from our local copy of the CVS repository.
    
    For now, we only support custom keywords which correspond to the
    Id keyword since this is known to be in common use by projects.
    The latest releases of CVS (1.12.x) have optional support for arbitrary
    keyword aliases via custom keywords. Support for this could be added
    later, should there be a need to do so. In any case, the pserver access
    method already supports arbitrary custom keywords because such keywords
    will be expanded by the CVS server when we check out files from it.
    
    While here, optimize our use of rsync a bit.
    Fetch only CVSROOT and the desired CVS module over rsync, rather
    than fetching the entire CVS repository directory, which may contain
    unrelated CVS modules that require disk space but will not be used.

Link to build: https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/77/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/77/console

Harbormaster returned this revision to the author for changes because remote builds failed.Dec 8 2021, 3:55 PM
Harbormaster failed remote builds in B25492: Diff 24633!
  • change docstring indentation in response to jenkins build failure

Build has FAILED

Patch application report for D6791 (id=24658)

Rebasing onto 965629d6c8...

Current branch diff-target is up to date.
Changes applied before test
commit c3b144fa176a8b453fa65b3c343c24e48e5cb956
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Dec 8 18:03:37 2021 +0100

    change docstring indentation in response to jenkins build failure

commit fd42519c0c0da0c72b62ae26d7d0b264c4623a2e
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Dec 7 15:23:34 2021 +0100

    support custom keywords during rsync:// conversion
    
    CVS supports the definition of custom keywords. A common use case
    for custom keywords is to use the project name as a keyword. This
    avoids confusion when files are copied between projects using CVS,
    in case files contain a keyword that is in use by both projects.
    In other words, a file will retain its expanded custom keyword from
    project A, allowing to trace the initial file version back to its
    origin, after the file was copied into project B's CVS repository.
    
    This feature is in active use by OpenBSD and NetBSD, for example.
    Existing conversions of their CVS repositories to Git expand
    the corresponding custom keywords as well, and so should we.
    Historically, X11 and FreeBSD were also using custom keywords.
    
    During conversion via rsync:// we copy the CVSROOT directory and the
    desired CVS module from the rsync server. The file CVSROOT/config
    contains directives which configure the use of custom keywords.
    Parse this file and expand keywords accordingly when checking out
    versions of files from our local copy of the CVS repository.
    
    For now, we only support custom keywords which correspond to the
    Id keyword since this is known to be in common use by projects.
    The latest releases of CVS (1.12.x) have optional support for arbitrary
    keyword aliases via custom keywords. Support for this could be added
    later, should there be a need to do so. In any case, the pserver access
    method already supports arbitrary custom keywords because such keywords
    will be expanded by the CVS server when we check out files from it.
    
    While here, optimize our use of rsync a bit.
    Fetch only CVSROOT and the desired CVS module over rsync, rather
    than fetching the entire CVS repository directory, which may contain
    unrelated CVS modules that require disk space but will not be used.

Link to build: https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/78/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/78/console

Harbormaster returned this revision to the author for changes because remote builds failed.Dec 8 2021, 6:06 PM
Harbormaster failed remote builds in B25515: Diff 24658!
  • more docstring formatting tweaks in response to jenkins build failure

Build is green

Patch application report for D6791 (id=24659)

Rebasing onto 965629d6c8...

Current branch diff-target is up to date.
Changes applied before test
commit 5cc4bebcca123a12ff93664a5478e005cabe0bf1
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Dec 8 18:07:25 2021 +0100

    more docstring formatting tweaks in response to jenkins build failure

commit c3b144fa176a8b453fa65b3c343c24e48e5cb956
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Dec 8 18:03:37 2021 +0100

    change docstring indentation in response to jenkins build failure

commit fd42519c0c0da0c72b62ae26d7d0b264c4623a2e
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Dec 7 15:23:34 2021 +0100

    support custom keywords during rsync:// conversion
    
    CVS supports the definition of custom keywords. A common use case
    for custom keywords is to use the project name as a keyword. This
    avoids confusion when files are copied between projects using CVS,
    in case files contain a keyword that is in use by both projects.
    In other words, a file will retain its expanded custom keyword from
    project A, allowing to trace the initial file version back to its
    origin, after the file was copied into project B's CVS repository.
    
    This feature is in active use by OpenBSD and NetBSD, for example.
    Existing conversions of their CVS repositories to Git expand
    the corresponding custom keywords as well, and so should we.
    Historically, X11 and FreeBSD were also using custom keywords.
    
    During conversion via rsync:// we copy the CVSROOT directory and the
    desired CVS module from the rsync server. The file CVSROOT/config
    contains directives which configure the use of custom keywords.
    Parse this file and expand keywords accordingly when checking out
    versions of files from our local copy of the CVS repository.
    
    For now, we only support custom keywords which correspond to the
    Id keyword since this is known to be in common use by projects.
    The latest releases of CVS (1.12.x) have optional support for arbitrary
    keyword aliases via custom keywords. Support for this could be added
    later, should there be a need to do so. In any case, the pserver access
    method already supports arbitrary custom keywords because such keywords
    will be expanded by the CVS server when we check out files from it.
    
    While here, optimize our use of rsync a bit.
    Fetch only CVSROOT and the desired CVS module over rsync, rather
    than fetching the entire CVS repository directory, which may contain
    unrelated CVS modules that require disk space but will not be used.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/79/ for more details.

stsp requested review of this revision.Dec 8 2021, 6:09 PM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/loader/cvs/cvs2gitdump/cvs2gitdump.py
611

isn't line[m.start(1):dsign] the same thing as m.group(1)?

swh/loader/cvs/loader.py
310

avoids crashing when value is empty

415–417

Sounds good. If this becomes an issue, you could use chardet

This revision is now accepted and ready to land.Dec 9 2021, 12:02 PM

integrate tweaks suggested by vlorentz and squash commits

Build is green

Patch application report for D6791 (id=24689)

Rebasing onto 965629d6c8...

Current branch diff-target is up to date.
Changes applied before test
commit dcb895ca2ff176daeba34ad8047a372caa3cd2ee
Author: Stefan Sperling <stsp@stsp.name>
Date:   Tue Dec 7 15:23:34 2021 +0100

    support custom keywords during rsync:// conversion
    
    CVS supports the definition of custom keywords. A common use case
    for custom keywords is to use the project name as a keyword. This
    avoids confusion when files are copied between projects using CVS,
    in case files contain a keyword that is in use by both projects.
    In other words, a file will retain its expanded custom keyword from
    project A, allowing to trace the initial file version back to its
    origin, after the file was copied into project B's CVS repository.
    
    This feature is in active use by OpenBSD and NetBSD, for example.
    Existing conversions of their CVS repositories to Git expand
    the corresponding custom keywords as well, and so should we.
    Historically, X11 and FreeBSD were also using custom keywords.
    
    During conversion via rsync:// we copy the CVSROOT directory and the
    desired CVS module from the rsync server. The file CVSROOT/config
    contains directives which configure the use of custom keywords.
    Parse this file and expand keywords accordingly when checking out
    versions of files from our local copy of the CVS repository.
    
    For now, we only support custom keywords which correspond to the
    Id keyword since this is known to be in common use by projects.
    The latest releases of CVS (1.12.x) have optional support for arbitrary
    keyword aliases via custom keywords. Support for this could be added
    later, should there be a need to do so. In any case, the pserver access
    method already supports arbitrary custom keywords because such keywords
    will be expanded by the CVS server when we check out files from it.
    
    While here, optimize our use of rsync a bit.
    Fetch only CVSROOT and the desired CVS module over rsync, rather
    than fetching the entire CVS repository directory, which may contain
    unrelated CVS modules that require disk space but will not be used.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/80/ for more details.