Page MenuHomeSoftware Heritage

from_disks: fix some of the pattern checking logic
ClosedPublic

Authored by marmoute on Sep 28 2022, 7:30 PM.

Details

Summary

The patterns were validated from $PWD and later applied on path relative
to root_path. So we shuffle a bit the code to test them against
root_path. We make the absolute pattern relative in the same go.

This code is coming from swh-scanner and should probably get an
overhaul, however for now we start by making it not broken.

Diff Detail

Repository
rDMOD Data model
Branch
fix-pattern
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 31947
Build 50000: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 49999: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D8571 (id=30923)

Rebasing onto 2d65a24a5f...

Current branch diff-target is up to date.
Changes applied before test
commit 847dcd28b3449dc1acb9631088e7eb4a66945fdc
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Wed Sep 28 19:28:23 2022 +0200

    from_disks: fix some of the pattern checking logic
    
    The pattern were validated from $PWD and later applied on path relative
    to `root_path`. So we shuffle a bit of code to test them againt
    root_path. We make the absolute pattern relative in the same go.
    
    This code is coming from swh-scanner and should probably get an
    overhaul, how ever for now we start with making it no broken.

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/530/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/530/console

Harbormaster returned this revision to the author for changes because remote builds failed.Sep 28 2022, 7:31 PM
Harbormaster failed remote builds in B31895: Diff 30923!

Compatibility python < 3.10

Build is green

Patch application report for D8571 (id=30926)

Rebasing onto 2d65a24a5f...

Current branch diff-target is up to date.
Changes applied before test
commit 35f85a59b0970e92a9e75d8b39f52a431ffd13ef
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Wed Sep 28 19:28:23 2022 +0200

    from_disks: fix some of the pattern checking logic
    
    The pattern were validated from $PWD and later applied on path relative
    to `root_path`. So we shuffle a bit of code to test them againt
    root_path. We make the absolute pattern relative in the same go.
    
    This code is coming from swh-scanner and should probably get an
    overhaul, how ever for now we start with making it no broken.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/531/ for more details.

Thanks!

This lgtm overall.

Given it's a well covered module though, that'd be neat to add the necessary tests to
cover at least the new introduced path (line 311). That'd clarify the current code
change (for the reviewer).

Other than that, couple of remarks inline mostly regarding typos to fix.

Ii've also fixed the typos in the description, please amend the commit accordingly ;)

Cheers,

swh/model/from_disk.py
302–304

Those were documented when we did not have types yet.
So now it's ok to drop those (one less typo \o/.

312
This revision now requires changes to proceed.Sep 29 2022, 9:42 AM

Thanks!

This lgtm overall.

Given it's a well covered module though, that'd be neat to add the necessary tests to
cover at least the new introduced path (line 311). That'd clarify the current code
change (for the reviewer).

I went for a minimal changes for that untested code as I am mostly planning a full overhaul of that area. So I did not spent to effort to add new test (the code isn't currently directly tested either as far as I understand).
The check I ran in swh-scanner confirm this fix the issue (not great, I agree).

Here is a short list of issue I noted about the current filtering code:

  • only filter directories (while we will need filtering for files too (eg .pyc))
  • only support rooted glob unless jumping through hoops (while we need "matching anywhere" filtering (eg pycache directory)
  • no support for other matching (like regexp, or plain matching).

The current code does not have a lot of user, so wider rewrite seems achievable as a follow up.

Other than that, couple of remarks inline mostly regarding typos to fix.

Ii've also fixed the typos in the description, please amend the commit accordingly ;)

The typo are fixed, Pushing an update

Build is green

Patch application report for D8571 (id=30976)

Rebasing onto 2d65a24a5f...

Current branch diff-target is up to date.
Changes applied before test
commit 6a38c4ad047df00f8c41cf2072b18babc68522f5
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Wed Sep 28 19:28:23 2022 +0200

    from_disks: fix some of the pattern checking logic
    
    The pattern were validated from $PWD and later applied on path relative
    to `root_path`. So we shuffle a bit of code to test them againt
    root_path. We make the absolute pattern relative in the same go.
    
    This code is coming from swh-scanner and should probably get an
    overhaul, how ever for now we start with making it no broken.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/533/ for more details.

I went for a minimal changes for that untested code as I am mostly planning a full
overhaul of that area. So I did not spent to effort to add new test (the code isn't
currently directly tested either as far as I understand).

ok then.


To clarify something, it's possible I mixed up terms again. When i said module, in
coverage context, i was not talking about swh.model.from_disk , but the package
swh.model which is well covered in general ;).

This revision is now accepted and ready to land.Sep 30 2022, 9:49 AM