Page MenuHomeSoftware Heritage

identifiers.parse_swhid: Make SWHIDs with whitespaces invalid
ClosedPublic

Authored by ardumont on Nov 12 2020, 10:42 AM.

Details

Summary

This should address the last concern demonstrated during our deposit discussion [1]

when the following stack of diffs lands, this probably needs an email on swh-devel as a heads-up.

"swh.model.identifiers.parse_swhid is getting more strict. This might notify you on invalid SWHIDs when it did not before"

[1] D4442#110837

Related to T2769
Depends on D4461

Test Plan

tox

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D4462 (id=15822)

Could not rebase; Attempt merge onto 4e3fdc062c...

Updating 4e3fdc0..5b9c395
Fast-forward
 swh/model/identifiers.py            | 79 ++++++++++++++++++++++++++++++------
 swh/model/tests/test_identifiers.py | 81 +++++++++++++++++++++++--------------
 2 files changed, 117 insertions(+), 43 deletions(-)
Changes applied before test
commit 5b9c395b17a0fe030fcabded5527a7f11145720f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Ensure qualifier values respect the grammar as well
    
    Related to T2769

commit b924c077d8681cbc82cbd809e9fbc4a9eb29c6f2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

commit cb11afe5c7fae1301a26a9b500ec2c2932170e5c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 18:21:01 2020 +0100

    model.identifiers: Improve error message by exposing the grammar
    
    Related to T2769

commit 47946aac0220fcfd55c7998908c71a6f402a5ddb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 16:11:32 2020 +0100

    test: Migrate parse_swhid test cases to pytest
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/170/ for more details.

ardumont added inline comments.
swh/model/identifiers.py
860

I'm not so keen on that error message.
Feel free to propose something better here.

  • Rebase
  • Adapt according to D4458's review

Build is green

Patch application report for D4462 (id=15825)

Could not rebase; Attempt merge onto 4e3fdc062c...

Updating 4e3fdc0..4976d15
Fast-forward
 swh/model/identifiers.py            | 43 +++++++++++++++-----
 swh/model/tests/test_identifiers.py | 81 +++++++++++++++++++++++--------------
 2 files changed, 84 insertions(+), 40 deletions(-)
Changes applied before test
commit 4976d15a10d5c3a19125567fbd3d18b50b9bc8df
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Ensure qualifier values respect the grammar as well
    
    Related to T2769

commit 20d736195e8ec28918b90ec598a672fabedee702
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

commit 8130a27e0261962bc5b36bddf914856ca24d15d2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 18:21:01 2020 +0100

    model.identifiers: Improve error messages in case of invalid SWHIDs
    
    Related to T2769

commit 47946aac0220fcfd55c7998908c71a6f402a5ddb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 16:11:32 2020 +0100

    test: Migrate parse_swhid test cases to pytest
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/173/ for more details.

vlorentz added inline comments.
swh/model/identifiers.py
860

At the beginning of the function: if re.search(r"\s", swhid): raise ValidationError("Invalid SWHID: SWHIDs can not contain whitespaces.")

Build is green

Patch application report for D4462 (id=15828)

Could not rebase; Attempt merge onto 4e3fdc062c...

Updating 4e3fdc0..3a6f5e3
Fast-forward
 swh/model/identifiers.py            | 43 +++++++++++++++-----
 swh/model/tests/test_identifiers.py | 81 +++++++++++++++++++++++--------------
 2 files changed, 84 insertions(+), 40 deletions(-)
Changes applied before test
commit 3a6f5e37480dfc0584c707d119b3cfb62cf45269
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Ensure qualifier values respect the grammar as well
    
    Related to T2769

commit 1a71edf9830257f300605bebaf097b89b52e4d8a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

commit 22c7c888870986e9303554184f42d4429d4b2ba2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 18:21:01 2020 +0100

    model.identifiers: Improve error messages in case of invalid SWHIDs
    
    Related to T2769

commit 47946aac0220fcfd55c7998908c71a6f402a5ddb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 16:11:32 2020 +0100

    test: Migrate parse_swhid test cases to pytest
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/176/ for more details.

swh/model/identifiers.py
860

oh that'd be nice, but is that true?

ardumont added inline comments.
swh/model/identifiers.py
860

I read back the grammar and it's true alright ;)

thanks.

swh/model/identifiers.py
860

uuuuh, actually no. nbsp would be matched by \s but is allowed in IRIs via ucschar. So check for [ \t\n\r\f\v] instead

  • Rebase
  • Refuse whitespaces anywhere in the SWHID
ardumont retitled this revision from identifiers.parse_swhid: Ensure qualifier values respect the grammar as well to identifiers.parse_swhid: Make SWHIDs with whitespaces invalid.Nov 12 2020, 11:37 AM

Build is green

Patch application report for D4462 (id=15830)

Could not rebase; Attempt merge onto 4e3fdc062c...

Updating 4e3fdc0..8fd439a
Fast-forward
 swh/model/identifiers.py            | 44 ++++++++++++++-----
 swh/model/tests/test_identifiers.py | 88 ++++++++++++++++++++++++-------------
 2 files changed, 91 insertions(+), 41 deletions(-)
Changes applied before test
commit 8fd439a0c4329d90b77fa6e05fc5fda1e59519e0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: whitespaces are not allowed in SWHIDs
    
    So parse_swhid raises a ValidationError when that is detected.
    
    Related to T2769

commit 71365044b5f386da3827ccc29184d640256fb070
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

commit 22c7c888870986e9303554184f42d4429d4b2ba2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 18:21:01 2020 +0100

    model.identifiers: Improve error messages in case of invalid SWHIDs
    
    Related to T2769

commit 47946aac0220fcfd55c7998908c71a6f402a5ddb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 16:11:32 2020 +0100

    test: Migrate parse_swhid test cases to pytest
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/178/ for more details.

Build is green

Patch application report for D4462 (id=15831)

Could not rebase; Attempt merge onto 4e3fdc062c...

Updating 4e3fdc0..8fd439a
Fast-forward
 swh/model/identifiers.py            | 44 ++++++++++++++-----
 swh/model/tests/test_identifiers.py | 88 ++++++++++++++++++++++++-------------
 2 files changed, 91 insertions(+), 41 deletions(-)
Changes applied before test
commit 8fd439a0c4329d90b77fa6e05fc5fda1e59519e0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: whitespaces are not allowed in SWHIDs
    
    So parse_swhid raises a ValidationError when that is detected.
    
    Related to T2769

commit 71365044b5f386da3827ccc29184d640256fb070
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

commit 22c7c888870986e9303554184f42d4429d4b2ba2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 18:21:01 2020 +0100

    model.identifiers: Improve error messages in case of invalid SWHIDs
    
    Related to T2769

commit 47946aac0220fcfd55c7998908c71a6f402a5ddb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Nov 10 16:11:32 2020 +0100

    test: Migrate parse_swhid test cases to pytest
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/179/ for more details.

Actually rewrite commit message (i had forgotten to close my commit message
buffer ¯\_(ツ)_/¯)

Build is green

Patch application report for D4462 (id=15834)

Could not rebase; Attempt merge onto 22c7c88887...

Updating 22c7c88..f225435
Fast-forward
 swh/model/identifiers.py            | 22 +++++++++++++++++++---
 swh/model/tests/test_identifiers.py | 21 +++++++++++++++++++++
 2 files changed, 40 insertions(+), 3 deletions(-)
Changes applied before test
commit f225435ef7ae8f8ce239f819448b6c84c6c721ff
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Make SWHIDs with whitespaces invalid
    
    So parse_swhid raises a ValidationError when that is detected.
    
    Related to T2769

commit 71365044b5f386da3827ccc29184d640256fb070
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/180/ for more details.

Adapt regexp and add more test failing data

Build is green

Patch application report for D4462 (id=15835)

Could not rebase; Attempt merge onto 22c7c88887...

Updating 22c7c88..9e354bf
Fast-forward
 swh/model/identifiers.py            | 22 +++++++++++++++++++---
 swh/model/tests/test_identifiers.py | 27 +++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 3 deletions(-)
Changes applied before test
commit 9e354bf79bd17116eb99aeef8337cbf91d0ce607
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Make SWHIDs with whitespaces invalid
    
    So parse_swhid raises a ValidationError when that is detected.
    
    Related to T2769

commit 71365044b5f386da3827ccc29184d640256fb070
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/181/ for more details.

This revision is now accepted and ready to land.Nov 12 2020, 11:57 AM

Build is green

Patch application report for D4462 (id=15837)

Could not rebase; Attempt merge onto 22c7c88887...

Updating 22c7c88..559a283
Fast-forward
 swh/model/identifiers.py            | 22 +++++++++++++++++++---
 swh/model/tests/test_identifiers.py | 27 +++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 3 deletions(-)
Changes applied before test
commit 559a283cb095de2b7da173e5995883b0af801f64
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:39:24 2020 +0100

    identifiers.parse_swhid: Make SWHIDs with whitespaces invalid
    
    So parse_swhid raises a ValidationError when that is detected.
    
    Related to T2769

commit fb504b400ba90fe8a0815769b6a431489fc8a560
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Nov 12 10:08:45 2020 +0100

    identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid
    
    Related to T2769

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/183/ for more details.