Page MenuHomeSoftware Heritage

Add support for uppercase sha1 url arguments
AbandonedPublic

Authored by kalpitk on Mar 25 2019, 10:21 AM.

Details

Summary

url with sha1 in uppercase (or mixed upper-lower case) was gives 404.

Convert snapshot_id to snapshot_id.lower(), so that SHA1 remains lowercase consistently. I will also work with SHA1 with mixed upper and lower cases.

Related T1505

Currently passing a sha1 as url argument to the webapp endpoints works only if it is in lowercase form.
If a sha1 is passed in uppercase form, we end up in a 404 error page (see [1] for instance).

For commodity of use, we should support passing sha1s in uppercase form but also with
mixed uppercase and lowercase parts.

Redirecting to the lowercase sha1 endpoints when such cases are encountered seems the
right way to implement this.

[1] https://archive.softwareheritage.org/browse/snapshot/1A8893E6A86F444E8BE8E7BDA6CB34FB1735A00E/

Diff Detail

Repository
rDWAPPS Web applications
Branch
upper-case-sha1
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 4788
Build 6385: tox-on-jenkinsJenkins
Build 6384: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vlorentz edited the summary of this revision. (Show Details)
vlorentz edited the summary of this revision. (Show Details)
vlorentz added a subscriber: vlorentz.

Thanks for the diff!

Could you add tests for each of these endpoints? Could you also do the same changes for all pages that accept a sha1 as argument?

This revision now requires changes to proceed.Mar 25 2019, 10:34 AM
kalpitk edited the summary of this revision. (Show Details)
  • fix upper case SHA1 in multiple files

@kalpitk, thanks for handling this. At first, I wanted to implement this using url redirection
but simply relying on the application route regexps is indeed simpler.

In the code you have submitted so far, there is no need to convert the input sha1 parameters
to lowercase, there will be correctly converted to their bytes representation by dedicated
functions from the swh stack.

Can you also add support for uppercase sha1s in the api part of swh-web ?

Last but not least, adding tests for these new features is a requirement before I can
land this diff.

swh/web/browse/views/content.py
41 ↗(On Diff #4073)

no need to convert to lowercase here

175 ↗(On Diff #4073)

same here and below

swh/web/browse/views/snapshot.py
33

too much ' characters here

@anlambert I'll update for api endpoints also.

I am currently unfamiliar with writing tests for django, so may take some time.

swh/web/browse/views/snapshot.py
33

Since string continues to multiple lines, I must use triple quotes.

@kalpitk , regarding the tests implementation, you should not have to write new testcases but simply modify the
tests input as generated in the file swh/web/tests/strategies.py. By providing multiple hex representations (lowercase,
uppercase, mixed cases) of the same checksum to testcases, you will be able to validate your changes.

swh/web/browse/views/snapshot.py
33

Oh I see, you can write the route regexp in one line (I think its more readable) and add # noqa afterwards to make flake8 happy

This revision now requires changes to proceed.Mar 25 2019, 11:59 AM
  • fix uppercase SHA1 for api
  • add tests for uppercase sha1

@anlambert Oh i didn't read your comment. Anyways, I just wrote some tests.

Many tests assert that the page has same sha1 as passed in url. This would fail tests for sha1 in uppercase.

@kalpitk, I am ok with your tests plan but there is still some work to do here:

  • remove the calls to the lower method inside endpoint implementations (as I said, this is not needed)
  • there is numerous other endpoints that need to be processed (simply search for [0-9a-f] inside the swh-web codebase to identify them)
  • tests implementation must be factorized as test_* and test_*_with_uppercase_sha1 are basically the same thing
This revision now requires changes to proceed.Mar 25 2019, 3:23 PM

@kalpitk, I am ok with your tests plan but there is still some work to do here:

  • remove the calls to the lower method inside endpoint implementations (as I said, this is not needed)
  • there is numerous other endpoints that need to be processed (simply search for [0-9a-f] inside the swh-web codebase to identify them)
  • tests implementation must be factorized as test_* and test_*_with_uppercase_sha1 are basically the same thing

I didnt't get the third point.

I didnt't get the third point.

For your test implementations, you simply duplicated the code from the test_* method and change the sha1 parameters
to their uppercase version. But code duplication is not a good practice as it complicates future code maintenance.

For instance in swh/web/browse/views/test_release.py, the tests can be factorized using a new "private" method,
see below:

def _release_browse_test(self, release):
    url = reverse('browse-release',
                  url_args={'sha1_git': release})

    release_data = self.release_get(release)
    # release may contain uppercase characters in it
    release_data['id'] = release

    resp = self.client.get(url)

    self._release_browse_checks(resp, release_data)

@given(release())
def test_release_browse(self, release):
    self._release_browse_test(release)

@given(release())
def test_release_browse_with_uppercase_sha1(self, release):
    self._release_browse_test(release.upper())

@kalpitk, I am ok with your tests plan but there is still some work to do here:

  • remove the calls to the lower method inside endpoint implementations (as I said, this is not needed)
  • there is numerous other endpoints that need to be processed (simply search for [0-9a-f] inside the swh-web codebase to identify them)
  • tests implementation must be factorized as test_* and test_*_with_uppercase_sha1 are basically the same thing

And regarding first, I had removed lower calls when hash_to_hex was called. Where else I should remove those calls?

  • fix uppercase sha1 and tests

And regarding first, I had removed lower calls when hash_to_hex was called. Where else I should remove those calls?

You have to remove them everywhere and adapt tests if needed as I pointed it for the browse/release case.

  • remove lower calls from endpoints

@kalpitk, we are getting close to something landable.

There is some couple of nitpicks regarding code style to address (see my inline comments).

Also, can you update the commit message to this:

Add support for uppercase sha1 in url arguments

Closes T1505
swh/web/browse/views/snapshot.py
60

you forgot to remove this line

swh/web/tests/api/views/test_directory.py
38

Nitpick: replace this two lines by:

self._api_directory_test(directory.upper())
swh/web/tests/api/views/test_release.py
47

same here

swh/web/tests/api/views/test_revision.py
26 ↗(On Diff #4089)

There is no need to pass the sha1_git parameter here, you can directly use the revision one.

41 ↗(On Diff #4089)

adapt method above and replace this two lines by one

46 ↗(On Diff #4089)

same here

swh/web/tests/api/views/test_snapshot.py
123

same here

swh/web/tests/browse/views/test_directory.py
99

replace this line with

directory_entries[0]['dir_id'] = directory
swh/web/tests/browse/views/test_release.py
39

Here, I would rather write:

release = release.upper()
release_data = self.release_get(release)
release_data['id'] = release
self._release_browse_test(release, release_data)
swh/web/tests/browse/views/test_revision.py
79 ↗(On Diff #4089)

Turn the two lines into one

This revision now requires changes to proceed.Mar 26 2019, 1:18 PM
anlambert retitled this revision from fix snapshot for upper case SHA1 to Add support for uppercase sha1 url arguments.Mar 26 2019, 1:21 PM

Add support for uppercase sha1 in url arguments

Closes T1505

This revision is now accepted and ready to land.Mar 26 2019, 5:18 PM
anlambert added inline comments.
swh/web/urls.py
52 ↗(On Diff #4092)

After some discussion with my team, we think that for this endpoint uppercase characters should not be authorized (see swh ids spec here: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#persistent-identifiers).

Can you remove it and then I land the diff ?

This revision now requires changes to proceed.Mar 26 2019, 5:32 PM
kalpitk added inline comments.
swh/web/urls.py
52 ↗(On Diff #4092)

Then I need to remove upper case support from browse/directory also

anlambert added inline comments.
swh/web/urls.py
52 ↗(On Diff #4092)

Oh I see, tests broke again. Let me handle that. I will accept the diff again, amend your commit and land the diff.

This revision is now accepted and ready to land.Mar 26 2019, 11:06 PM

@kalpitk , After testing your changes, I have the feeling that modifying url regexps to handle the uppercase sha1s is not the good solution here.

I think this would be better handled using url redirections. There must be some way to do that automatically by reading the Django urls index
of swh-web and adding the adequate redirections. I need to think more about that. I get back to you tomorrow.

This revision now requires changes to proceed.Mar 26 2019, 11:38 PM

@kalpitk , after some brainstorming on how to easily add url redirections for uppercase sha1s, I ended up
with the following solution showcased in that separate branch: https://forge.softwareheritage.org/source/swh-web/history/redirect_upper_sha1/

I did not handle all endpoints so far neither updated the tests (which in these case will simply consist to check that
url redirection have been performed). Can you update that diff (or abandon this one in favor of a new one) using
the proposed approach ?