Page MenuHomeSoftware Heritage

fix Log keyword expansion with trailing whitespace in prefix
ClosedPublic

Authored by stsp on Dec 9 2021, 3:45 PM.

Details

Summary

Our expansion of the Log keyword was slightly wrong. We need to
trim trailing whitespace from the "prefix" line content which
preceeds the Log keyword when we write out line content which
followed the Log keyword. Update the Log expansion example given
in a comment to document this (see there for details; this behaviour
of CVS is hard to explain without illustration).

Found while testing conversion of the OpenBSD CVS repository.
Add a new test which uses an RCS file from this repository to
reproduce this problem.

Diff Detail

Repository
rDLDCVS CVS Loader
Branch
log-keyword-fix
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25544
Build 39933: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 39932: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6813 (id=24691)

Rebasing onto dcb895ca2f...

Current branch diff-target is up to date.
Changes applied before test
commit a66c6b4937d4acfbd8791ebf7d97470fc5875707
Author: Stefan Sperling <stsp@stsp.name>
Date:   Thu Dec 9 14:15:02 2021 +0100

    fix Log keyword expansion with trailing whitespace in prefix
    
    Our expansion of the Log keyword was slightly wrong. We need to
    trim trailing whitespace from the "prefix" line content which
    preceeds the Log keyword when we write out line content which
    followed the Log keyword. Update the Log expansion example given
    in a comment to document this (see there for details; this behaviour
    of CVS is hard to explain without illustration).
    
    Found while testing conversion of the OpenBSD CVS repository.
    Add a new test which uses an RCS file from this repository to
    reproduce this problem.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/81/ for more details.

stsp requested review of this revision.Dec 9 2021, 3:47 PM
ardumont added a subscriber: ardumont.

lgtm, one suggestion inline.

swh/loader/cvs/cvs2gitdump/cvs2gitdump.py
678–682

since we are already doing some formatting strings, might as well go all the way?

This revision is now accepted and ready to land.Dec 10 2021, 9:49 AM
swh/loader/cvs/cvs2gitdump/cvs2gitdump.py
678–682

since we are already doing some formatting strings, might as well go all the way?

This suggestion does not work as-is because of the .encode('ascii') call applied to the format string. The prefix is a byte string and not necessarily ASCII-encoded. Yet the data in the format string should contain ASCII characters only. There may be other ways to use a format string which would make it work, of course.

(The author name encoded in this keyword is a UNIX account name which is usually ASCII but could in theory be encoded differently. This is one of numerous potential encoding-related issues in the CVS loader which we may have to address once we see a relevant failure. But it is out of scope for this patch.)

swh/loader/cvs/cvs2gitdump/cvs2gitdump.py
678–682

right, i missed that.
nvm then ;)