Page MenuHomeSoftware Heritage

fix expansion of multiple RCS keywords on a line via rsync
ClosedPublic

Authored by stsp on Nov 29 2021, 6:33 PM.

Details

Summary

The function RcsKeywords.expand_keyword() is used to expand keywords
when fetching an origin over rsync. This function failed to process
multiple keywords on a single line, even though the existing code
already keeps looping in an attempt to expand multiple keywords.

For example, consider this line from a file in the ccvs CVS repository:

#ident	"@(#)cvs/contrib/pcl-cvs:$Name:  $Id$"

Here, a regular CVS server expands both keywords on this line.

The Name keyword is special; It expands only if an explicit tag name was
given on the CVS command line. This keyword always expands to an empty
string for now, until perhaps one day the CVS loader learns about tags.

Our regular expression which attempts to match keywords on a line splits
the above example into two match groups:

1: #ident	"@(#)cvs/contrib/pcl-cvs:$Name:  $
2: Id$

The Name keyword was then expanded as expected, but the Id keyword was missed.
To fix this, attempt another match starting from the terminating character of
the previous match, such that we match the following two strings:

1: #ident	"@(#)cvs/contrib/pcl-cvs:$Name:  $
2: $Id$

Now our CVS loader expands both keywords like the CVS server does.
Add new test data to confirm that this works as intended.

Diff Detail

Repository
rDLDCVS CVS Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6708 (id=24377)

Rebasing onto bc00d6b169...

Current branch diff-target is up to date.
Changes applied before test
commit 939dd546b05048333f337d0a4e99f87eae272869
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Nov 24 12:13:11 2021 +0100

    fix expansion of multiple RCS keywords on a line via rsync
    
    The function RcsKeywords.expand_keyword() is used to expand keywords
    when fetching an origin over rsync. This function failed to process
    multiple keywords on a single line, even though the existing code
    already keeps looping in an attempt to expand multiple keywords.
    
    For example, consider this line from a file in the ccvs CVS repository:
    
      #ident        "@(#)cvs/contrib/pcl-cvs:$Name:  $Id$"
    
    Here, a regular CVS server expands both keywords on this line.
    
    The Name keyword is special; It expands only if an explicit tag name was
    given on the CVS command line. This keyword always expands to an empty
    string for now, until perhaps one day the CVS loader learns about tags.
    
    Our regular expression which attempts to match keywords on a line splits
    the above example into two match groups:
    
      1: #ident     "@(#)cvs/contrib/pcl-cvs:$Name:  $
      2: Id$
    
    The Name keyword was then expanded as expected, but the Id keyword was missed.
    To fix this, attempt another match starting from the terminating character of
    the previous match, such that we match the following two strings:
    
      1: #ident     "@(#)cvs/contrib/pcl-cvs:$Name:  $
      2: $Id$
    
    Now our CVS loader expands both keywords like the CVS server does.
    Add new test data to confirm that this works as intended.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/68/ for more details.

stsp requested review of this revision.Nov 29 2021, 6:35 PM
ardumont added a subscriber: ardumont.

Nice catch.

I guess it's currently limited to deal with "only" a 2 keywords in the same line case.
And we'll try and fix futher if any other cases is detected later.

Another curiosity question inline ;)

swh/loader/cvs/tests/test_loader.py
445

interesting, how come you have so many snapshots?

Usually, on loaders (both "dvcs" and "package" ones) end up creating one snapshot after a load (what we call a visit).

This revision is now accepted and ready to land.Dec 1 2021, 10:27 AM
swh/loader/cvs/tests/test_loader.py
445

interesting, how come you have so many snapshots?

Hmm, I don't know. Will try to figure it out.

I guess it's currently limited to deal with "only" a 2 keywords in the same line case.
And we'll try and fix futher if any other cases is detected later.

It should work with more than just 2. The code keeps matching the regex against the remainder of the line in a loop.