Page MenuHomeSoftware Heritage

identifiers.parse_swhid: Make SWHIDs with whitespaces invalid
ClosedPublic

Authored by ardumont on Nov 12 2020, 10:42 AM.

Details

Summary

This should address the last concern demonstrated during our deposit discussion [1]

when the following stack of diffs lands, this probably needs an email on swh-devel as a heads-up.

"swh.model.identifiers.parse_swhid is getting more strict. This might notify you on invalid SWHIDs when it did not before"

[1] D4442#110837

Related to T2769
Depends on D4461

Test Plan

tox

Diff Detail

Repository
rDMOD Data model
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 17035
Build 26290: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 26289: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4462 (id=15822)

Could not rebase; Attempt merge onto 4e3fdc062c...

Updating 4e3fdc0..5b9c395
Fast-forward
 swh/model/identifiers.py            | 79 ++++++++++++++++++++++++++++++------
 swh/model/tests/test_identifiers.py | 81 +++++++++++++++++++++++--------------
 2 files changed, 117 insertions(+), 43 deletions(-)
Changes applied before test
commit 5b9c395b17a0fe030fcabded5527a7f11145720f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Ensure qualifier values respect the grammar as well
    
    Related to T2769

commit b924c077d8681cbc82cbd809e9fbc4a9eb29c6f2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

commit cb11afe5c7fae1301a26a9b500ec2c2932170e5c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 18:21:01 2020 +0100

    model.identifiers: Improve error message by exposing the grammar
    
    Related to T2769

commit 47946aac0220fcfd55c7998908c71a6f402a5ddb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 16:11:32 2020 +0100

    test: Migrate parse_swhid test cases to pytest
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/170/ for more details.

ardumont added inline comments.
swh/model/identifiers.py
862

I'm not so keen on that error message.
Feel free to propose something better here.

  • Rebase
  • Adapt according to D4458's review

Build is green

Patch application report for D4462 (id=15825)

Could not rebase; Attempt merge onto 4e3fdc062c...

Updating 4e3fdc0..4976d15
Fast-forward
 swh/model/identifiers.py            | 43 +++++++++++++++-----
 swh/model/tests/test_identifiers.py | 81 +++++++++++++++++++++++--------------
 2 files changed, 84 insertions(+), 40 deletions(-)
Changes applied before test
commit 4976d15a10d5c3a19125567fbd3d18b50b9bc8df
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Ensure qualifier values respect the grammar as well
    
    Related to T2769

commit 20d736195e8ec28918b90ec598a672fabedee702
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

commit 8130a27e0261962bc5b36bddf914856ca24d15d2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 18:21:01 2020 +0100

    model.identifiers: Improve error messages in case of invalid SWHIDs
    
    Related to T2769

commit 47946aac0220fcfd55c7998908c71a6f402a5ddb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 16:11:32 2020 +0100

    test: Migrate parse_swhid test cases to pytest
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/173/ for more details.

vlorentz added inline comments.
swh/model/identifiers.py
862

At the beginning of the function: if re.search(r"\s", swhid): raise ValidationError("Invalid SWHID: SWHIDs can not contain whitespaces.")

Build is green

Patch application report for D4462 (id=15828)

Could not rebase; Attempt merge onto 4e3fdc062c...

Updating 4e3fdc0..3a6f5e3
Fast-forward
 swh/model/identifiers.py            | 43 +++++++++++++++-----
 swh/model/tests/test_identifiers.py | 81 +++++++++++++++++++++++--------------
 2 files changed, 84 insertions(+), 40 deletions(-)
Changes applied before test
commit 3a6f5e37480dfc0584c707d119b3cfb62cf45269
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Ensure qualifier values respect the grammar as well
    
    Related to T2769

commit 1a71edf9830257f300605bebaf097b89b52e4d8a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

commit 22c7c888870986e9303554184f42d4429d4b2ba2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 18:21:01 2020 +0100

    model.identifiers: Improve error messages in case of invalid SWHIDs
    
    Related to T2769

commit 47946aac0220fcfd55c7998908c71a6f402a5ddb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 16:11:32 2020 +0100

    test: Migrate parse_swhid test cases to pytest
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/176/ for more details.

swh/model/identifiers.py
862

oh that'd be nice, but is that true?

ardumont added inline comments.
swh/model/identifiers.py
862

I read back the grammar and it's true alright ;)

thanks.

swh/model/identifiers.py
862

uuuuh, actually no. nbsp would be matched by \s but is allowed in IRIs via ucschar. So check for [ \t\n\r\f\v] instead

  • Rebase
  • Refuse whitespaces anywhere in the SWHID
ardumont retitled this revision from identifiers.parse_swhid: Ensure qualifier values respect the grammar as well to identifiers.parse_swhid: Make SWHIDs with whitespaces invalid.Nov 12 2020, 11:37 AM

Build is green

Patch application report for D4462 (id=15830)

Could not rebase; Attempt merge onto 4e3fdc062c...

Updating 4e3fdc0..8fd439a
Fast-forward
 swh/model/identifiers.py            | 44 ++++++++++++++-----
 swh/model/tests/test_identifiers.py | 88 ++++++++++++++++++++++++-------------
 2 files changed, 91 insertions(+), 41 deletions(-)
Changes applied before test
commit 8fd439a0c4329d90b77fa6e05fc5fda1e59519e0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: whitespaces are not allowed in SWHIDs
    
    So parse_swhid raises a ValidationError when that is detected.
    
    Related to T2769

commit 71365044b5f386da3827ccc29184d640256fb070
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

commit 22c7c888870986e9303554184f42d4429d4b2ba2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 18:21:01 2020 +0100

    model.identifiers: Improve error messages in case of invalid SWHIDs
    
    Related to T2769

commit 47946aac0220fcfd55c7998908c71a6f402a5ddb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 16:11:32 2020 +0100

    test: Migrate parse_swhid test cases to pytest
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/178/ for more details.

Build is green

Patch application report for D4462 (id=15831)

Could not rebase; Attempt merge onto 4e3fdc062c...

Updating 4e3fdc0..8fd439a
Fast-forward
 swh/model/identifiers.py            | 44 ++++++++++++++-----
 swh/model/tests/test_identifiers.py | 88 ++++++++++++++++++++++++-------------
 2 files changed, 91 insertions(+), 41 deletions(-)
Changes applied before test
commit 8fd439a0c4329d90b77fa6e05fc5fda1e59519e0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: whitespaces are not allowed in SWHIDs
    
    So parse_swhid raises a ValidationError when that is detected.
    
    Related to T2769

commit 71365044b5f386da3827ccc29184d640256fb070
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

commit 22c7c888870986e9303554184f42d4429d4b2ba2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 18:21:01 2020 +0100

    model.identifiers: Improve error messages in case of invalid SWHIDs
    
    Related to T2769

commit 47946aac0220fcfd55c7998908c71a6f402a5ddb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 16:11:32 2020 +0100

    test: Migrate parse_swhid test cases to pytest
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/179/ for more details.

Actually rewrite commit message (i had forgotten to close my commit message
buffer ¯\_(ツ)_/¯)

Build is green

Patch application report for D4462 (id=15834)

Could not rebase; Attempt merge onto 22c7c88887...

Updating 22c7c88..f225435
Fast-forward
 swh/model/identifiers.py            | 22 +++++++++++++++++++---
 swh/model/tests/test_identifiers.py | 21 +++++++++++++++++++++
 2 files changed, 40 insertions(+), 3 deletions(-)
Changes applied before test
commit f225435ef7ae8f8ce239f819448b6c84c6c721ff
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Make SWHIDs with whitespaces invalid
    
    So parse_swhid raises a ValidationError when that is detected.
    
    Related to T2769

commit 71365044b5f386da3827ccc29184d640256fb070
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/180/ for more details.

Adapt regexp and add more test failing data

Build is green

Patch application report for D4462 (id=15835)

Could not rebase; Attempt merge onto 22c7c88887...

Updating 22c7c88..9e354bf
Fast-forward
 swh/model/identifiers.py            | 22 +++++++++++++++++++---
 swh/model/tests/test_identifiers.py | 27 +++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 3 deletions(-)
Changes applied before test
commit 9e354bf79bd17116eb99aeef8337cbf91d0ce607
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Make SWHIDs with whitespaces invalid
    
    So parse_swhid raises a ValidationError when that is detected.
    
    Related to T2769

commit 71365044b5f386da3827ccc29184d640256fb070
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/181/ for more details.

This revision is now accepted and ready to land.Nov 12 2020, 11:57 AM

Build is green

Patch application report for D4462 (id=15837)

Could not rebase; Attempt merge onto 22c7c88887...

Updating 22c7c88..559a283
Fast-forward
 swh/model/identifiers.py            | 22 +++++++++++++++++++---
 swh/model/tests/test_identifiers.py | 27 +++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 3 deletions(-)
Changes applied before test
commit 559a283cb095de2b7da173e5995883b0af801f64
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Make SWHIDs with whitespaces invalid
    
    So parse_swhid raises a ValidationError when that is detected.
    
    Related to T2769

commit fb504b400ba90fe8a0815769b6a431489fc8a560
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/183/ for more details.