Page MenuHomeSoftware Heritage

cvsclient: handle files which lack a trailing newline
ClosedPublic

Authored by stsp on Oct 27 2021, 12:24 PM.

Details

Summary

CVS uses \n as a protocol message separator, which forces us
to read protocol message line-by-line. File content sent by
the server has a length known and is transmitted in bytes.
The server appends a final "ok\n" message (or perhaps an error
message) when it is done sending file contents.

Properly handle the case where this final message gets buffered
along with file contents and is not delimited from file contents
by \n because the file lacks a trailing newline. Previously, the
final protocol message ended up being written out to file contents
in this case.

Found while testing ingestion of the GNU dino CVS repository from
cvs.savannah.gnu.org/sources/dino.

Related to T3691

Diff Detail

Repository
rDLDCVS CVS Loader
Branch
cvsclient-no-trailing-nl
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 24715
Build 38576: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 38575: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6558 (id=23832)

Rebasing onto 7f761b8550...

Current branch diff-target is up to date.
Changes applied before test
commit d3b3344bc26d90acb081fe827aed60257d931cfe
Author: Stefan Sperling <stsp@stsp.name>
Date:   Wed Oct 27 11:55:04 2021 +0200

    cvsclient: handle files which lack a trailing newline
    
    CVS uses \n as a protocol message separator, which forces us
    to read protocol message line-by-line. File content sent by
    the server has a length known and is transmitted in bytes.
    The server appends a final "ok\n" message (or perhaps an error
    message) when it is done sending file contents.
    
    Properly handle the case where this final message gets buffered
    along with file contents and is not delimited from file contents
    by \n because the file lacks a trailing newline. Previously, the
    final protocol message ended up being written out to file contents
    in this case.
    
    Found while testing ingestion of the GNU dino CVS repository from
    cvs.savannah.gnu.org/sources/dino.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/33/ for more details.

stsp requested review of this revision.Oct 27 2021, 12:26 PM

Could you add a test?

Yes, I have a test in progress.

It would be easier to add the test later, because the test will require other outstanding pserver protocol fixes to be merged as well.
Otherwise the pserver protocol handling is too broken and triggers unrelated failures.

I could add the test stacked on top of all required fixes now, but perhaps we can simply do this once all the fixes are merged?

In D6558#170347, @stsp wrote:

perhaps we can simply do this once all the fixes are merged?

Sure!

This revision is now accepted and ready to land.Oct 27 2021, 1:48 PM