Page MenuHomeSoftware Heritage

rlog: fix loading of CVS commits which have a commit ID
ClosedPublic

Authored by stsp on Oct 27 2021, 12:26 PM.

Details

Summary

The CVS commit ID is an optional attribute which is only generated
by relatively recent releases of CVS clients. Our rlog parser was
skipping such commits because it failed to match on them due to an
error in a regular expression.
This resulted in an incomplete import of CVS revision history.

Here is a sample line from cvs rlog output which carries a
commit ID and was not matched because the regex lacked the
trailing semicolon:
date: 2007-07-17 15:02:50 +0200; author: larsl; state: Exp; lines: +619 -285; commitid: oju0x8tTc9aUB7qs;

Found while testing ingestion of the GNU dino repository from
cvs.sannah.gnu.org/sources/dino

Diff Detail

Repository
rDLDCVS CVS Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6561 (id=23835)

Rebasing onto 7f761b8550...

Current branch diff-target is up to date.
Changes applied before test
commit 3c5e365fee4ae71c1a39111171ff1261d0c22eb6
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Oct 27 12:20:05 2021 +0200

    rlog: fix loading of CVS commits which have a commit ID
    
    The CVS commit ID is an optional attribute which is only generated
    by relatively recent releases of CVS clients. Our rlog parser was
    skipping such commits because it failed to match on them due to an
    error in a regular expression.
    This resulted in an incomplete import of CVS revision history.
    
    Here is a sample line from cvs rlog output which carries a
    commit ID and was not matched because the regex lacked the
    trailing semicolon:
    date: 2007-07-17 15:02:50 +0200;  author: larsl;  state: Exp;  lines: +619 -285;  commitid: oju0x8tTc9aUB7qs;
    
    Found while testing ingestion of the GNU dino repository from
    cvs.sannah.gnu.org/sources/dino

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/36/ for more details.

stsp requested review of this revision.Oct 27 2021, 12:28 PM
This revision is now accepted and ready to land.Oct 27 2021, 1:49 PM
This revision was landed with ongoing or failed builds.Oct 27 2021, 4:01 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D6561 (id=23854)

Rebasing onto 0829dc3309...

Current branch diff-target is up to date.
Changes applied before test
commit 509ac801df7440a95cdf9b4b3bc60af7cb5ac356
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Oct 27 12:20:05 2021 +0200

    rlog: fix loading of CVS commits which have a commit ID
    
    The CVS commit ID is an optional attribute which is only generated
    by relatively recent releases of CVS clients. Our rlog parser was
    skipping such commits because it failed to match on them due to an
    error in a regular expression.
    This resulted in an incomplete import of CVS revision history.
    
    Here is a sample line from cvs rlog output which carries a
    commit ID and was not matched because the regex lacked the
    trailing semicolon:
    date: 2007-07-17 15:02:50 +0200;  author: larsl;  state: Exp;  lines: +619 -285;  commitid: oju0x8tTc9aUB7qs;
    
    Found while testing ingestion of the GNU dino repository from
    cvs.sannah.gnu.org/sources/dino

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/40/ for more details.