Page MenuHomeSoftware Heritage

validate input paths in the CVS loader
ClosedPublic

Authored by stsp on Dec 10 2021, 2:59 PM.

Details

Summary

The CVS loader creates files on the local file system based on
paths which were read from a local copy of a CVS repository or
sent by a CVS server as part of its "cvs rlog" response.

Ensure that such paths will not be able to escape the temporary
directory which stores checked out versions of files.

Diff Detail

Repository
rDLDCVS CVS Loader
Branch
path-check
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25563
Build 39968: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 39967: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6823 (id=24711)

Rebasing onto a66c6b4937...

Current branch diff-target is up to date.
Changes applied before test
commit 1c48bd3b97200b069c6b66264872b8cdcdb8b868
Author: Stefan Sperling <stsp@stsp.name>
Date:   Fri Dec 10 13:36:52 2021 +0100

    validate input paths in the CVS loader
    
    The CVS loader creates files on the local file system based on
    paths which were read from a local copy of a CVS repository or
    sent by a CVS server as part of its "cvs rlog" response.
    
    Ensure that such paths will not be able to escape the temporary
    directory which stores checked out versions of files.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/82/ for more details.

stsp requested review of this revision.Dec 10 2021, 3:01 PM
  • add missing check for "/../" to the rcsparse code path

Build is green

Patch application report for D6823 (id=24714)

Rebasing onto a66c6b4937...

Current branch diff-target is up to date.
Changes applied before test
commit 043a20c85e956e4bf2eb4be0767d70f443d82b7c
Author: Stefan Sperling <stsp@stsp.name>
Date:   Fri Dec 10 15:04:55 2021 +0100

    add missing check for "/../" to the rcsparse code path

commit 1c48bd3b97200b069c6b66264872b8cdcdb8b868
Author: Stefan Sperling <stsp@stsp.name>
Date:   Fri Dec 10 13:36:52 2021 +0100

    validate input paths in the CVS loader
    
    The CVS loader creates files on the local file system based on
    paths which were read from a local copy of a CVS repository or
    sent by a CVS server as part of its "cvs rlog" response.
    
    Ensure that such paths will not be able to escape the temporary
    directory which stores checked out versions of files.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/83/ for more details.

olasd added inline comments.
swh/loader/cvs/loader.py
150–158

These statements should probably be broken out into a common method that can be unit-tested separately?

swh/loader/cvs/loader.py
150–158

(it's not 100% clear to me what kinds of paths are rejected that wouldn't be caught by the first ".." statement)

ardumont added inline comments.
swh/loader/cvs/loader.py
485
swh/loader/cvs/loader.py
150–158

Given the repetition and the only change between the 2 chunks (here and below) is
the log message, I second what proposed olasd.

Please, extract a function (or method) to check for the path correctness. I gather that
function will do nothing if the check passes but raises in case the check fails
otherwise.

That allows to document the function and indeed eventually cover it with dedicated
tests.

157

Also, ^ feels more neutral to me.

482–487

I'm a bit edgy when it comes to open code solely for tests.

Please, use mocks instead if you can (tentatively suggested something that probably need to be improved).
That might be less clear though.

swh/loader/cvs/tests/test_loader.py
1086–1126

using mocks instead of opening the runtime code

swh/loader/cvs/loader.py
150–158

(it's not 100% clear to me what kinds of paths are rejected that wouldn't be caught by the first ".." statement)

It intends to catch absolute paths that point to anywhere in the file system, instead of into the temporary directory where we store the repository. In case of rlog we obtain the list of existing paths from the server, which is why this matters. In case of rsync we mirror the repository and walk a local tree. The only possible problem there would be symlinks that point outside that tree, but that is a separate issue which I intend to fix after this change is done.

swh/loader/cvs/tests/test_loader.py
1086–1126

I need a way to ensure that the mocker returns the file only the first time it gets called. Afterwards it must return None. To see why, search loader.py for the string "Try to fetch more rlog data from this Attic directory."

move path-safety check into a helper function

This does not yet address the test code problem pointed out by ardumont.

Build is green

Patch application report for D6823 (id=24862)

Rebasing onto cbde981229...

First, rewinding head to replay your work on top of it...
Applying: validate input paths in the CVS loader
Changes applied before test
commit 7e22b880957575b02068d345e1bf183b7c80cc98
Author: Stefan Sperling <stsp@stsp.name>
Date:   Fri Dec 10 13:36:52 2021 +0100

    validate input paths in the CVS loader
    
    The CVS loader creates files on the local file system based on
    paths which were read from a local copy of a CVS repository or
    sent by a CVS server as part of its "cvs rlog" response.
    
    Ensure that such paths will not be able to escape the temporary
    directory which stores checked out versions of files.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/89/ for more details.

swh/loader/cvs/tests/test_loader.py
1086–1126

Looks like I can use mock.side_effect to achieve this.
I have cobbled something together that seems to work.

new version which uses a mock class for the test

Build is green

Patch application report for D6823 (id=24863)

Rebasing onto cbde981229...

First, rewinding head to replay your work on top of it...
Applying: validate input paths in the CVS loader
Changes applied before test
commit e4244eef1454734f714faf839eef407f027fbe64
Author: Stefan Sperling <stsp@stsp.name>
Date:   Fri Dec 10 13:36:52 2021 +0100

    validate input paths in the CVS loader
    
    The CVS loader creates files on the local file system based on
    paths which were read from a local copy of a CVS repository or
    sent by a CVS server as part of its "cvs rlog" response.
    
    Ensure that such paths will not be able to escape the temporary
    directory which stores checked out versions of files.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/90/ for more details.

remove self.log.info() call that was used for debugging

Build is green

Patch application report for D6823 (id=24864)

Rebasing onto cbde981229...

First, rewinding head to replay your work on top of it...
Applying: validate input paths in the CVS loader
Changes applied before test
commit 41e00281bdeb8d24a8af21362388d7fabba5db8a
Author: Stefan Sperling <stsp@stsp.name>
Date:   Fri Dec 10 13:36:52 2021 +0100

    validate input paths in the CVS loader
    
    The CVS loader creates files on the local file system based on
    paths which were read from a local copy of a CVS repository or
    sent by a CVS server as part of its "cvs rlog" response.
    
    Ensure that such paths will not be able to escape the temporary
    directory which stores checked out versions of files.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/91/ for more details.

@stsp Thanks a lot!

I've suggested one more consistent change to add the missing coverage part and then i'll let you be ;)

swh/loader/cvs/loader.py
154

Can you add the other unsafe rlog file which tests this (I've called it "unsafe_rlog_wrong_arborescence.rlog" below)?

I've proposed a patch below to deal with the testing part slight adaptation (to avoid duplication).

swh/loader/cvs/tests/test_loader.py
1086–1121

A couple of suggestions to deal with multiple unsafe rlog files (the ones currently covered by the code).

@stsp did you see my previous comment by any chance?

We'll figure out my last suggestion later.

This revision is now accepted and ready to land.Jan 6 2022, 12:17 PM

new patch with some suggestions from ardumont applied

Build is green

Patch application report for D6823 (id=24962)

Rebasing onto cbde981229...

First, rewinding head to replay your work on top of it...
Applying: validate input paths in the CVS loader
Changes applied before test
commit 113dcd68d355dd65af3a090d3230010cce542dde
Author: Stefan Sperling <stsp@stsp.name>
Date:   Fri Dec 10 13:36:52 2021 +0100

    validate input paths in the CVS loader
    
    The CVS loader creates files on the local file system based on
    paths which were read from a local copy of a CVS repository or
    sent by a CVS server as part of its "cvs rlog" response.
    
    Ensure that such paths will not be able to escape the temporary
    directory which stores checked out versions of files.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/92/ for more details.

Build is green

Patch application report for D6823 (id=24964)

Rebasing onto cbde981229...

Current branch diff-target is up to date.
Changes applied before test
commit 238c9c0335af32fa6e07a1bad7017c116cd99628
Author: Stefan Sperling <stsp@stsp.name>
Date:   Fri Dec 10 13:36:52 2021 +0100

    validate input paths in the CVS loader
    
    The CVS loader creates files on the local file system based on
    paths which were read from a local copy of a CVS repository or
    sent by a CVS server as part of its "cvs rlog" response.
    
    Ensure that such paths will not be able to escape the temporary
    directory which stores checked out versions of files.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/93/ for more details.

This revision was automatically updated to reflect the committed changes.