Page MenuHomeSoftware Heritage

Fix display of directories named 'directory' in some views
AbandonedPublic

Authored by anlambert on Nov 27 2019, 6:14 PM.

Details

Reviewers
vlorentz
Group Reviewers
Reviewers
Summary

Browsing a directory named 'directory' from a snapshot context view was raising a 404 error.

Reorder URL regexps of the concerned views to fix that issue.

Related to T2115

Depends on D2375

Diff Detail

Repository
rDWAPPS Web applications
Branch
fix-directory-named-directory-browsing
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 9437
Build 13852: Cypress tests for swh-web diffsJenkins
Build 13851: tox-on-jenkinsJenkins
Build 13850: arc lint + arc unit

Event Timeline

vlorentz added inline comments.
swh/web/tests/browse/views/test_origin.py
542–599

Shouldn't test data be out of the tests now?

swh/web/tests/browse/views/test_origin.py
542–599

Indeed, I could move the data creation inside an hypothesis strategy. I should also use the class from swh.model.model instead of raw dicts.

Update: Move sample origin creation in tests.data and add an hypothesis strategy to get
origins containing a directory named 'directory'.

Instead of using raw dicts to create directory, revision and snapshot data, I used
model classes from swh.model.model. I improved a bit some of those models to
automatically compute the object identifiers for commodity of use (D2375).

The CI build will fail as it requires D2375 to be landed first.

Update: Rebase, fix a comment and a test as there is now an origin that contains the empty directory in tests data

Ugh. If this becomes a recurring (rolling? *g*) problem, we should consider pinning the modules we depend on for static checks. There is no much value in having those changing too often.

vlorentz added inline comments.
swh/web/browse/views/origin.py
29–34

They are still ambiguous.

origin/https://foobar/visit/1234/directory/directory/ can be parsed in these ways:

  • first pattern, with origin_url=https://foobar/, timestamp=1234, path=directory
  • second pattern, with origin_url=https://foobar/visit/1234, path=directory
  • fourth pattern, with origin_url=https://foobar/visit/1234/directory

origin/https://foobar/visit/1234/directory/ can be parsed in these ways:

  • third pattern, with origin_url=https://foobar/, timestamp=1234
  • fourth pattern, with origin_url=https://foobar/visit/1234

And none of these are more valid than others, so changing order won't fix the issue.

swh/web/tests/data.py
255–297

this is way too long for such a simple thing to add :/

This revision now requires changes to proceed.Dec 3 2019, 12:14 PM
swh/web/browse/views/origin.py
29–34

Indeed, this will not work for origins containing visit or directory in their url (sigh).

Currently, the origins that felt into those categories in the archive are:

softwareheritage=> select * from origin where url ~ '.*/visit/.*';
    id    |                          url                           
----------+--------------------------------------------------------
    55571 | https://github.com/visit/CitybreakSsoDemo
   137981 | https://github.com/visit/template_fix
   142252 | https://github.com/visit/html_fix
  2600737 | https://github.com/visit/spark-svn-mirror
  2703707 | https://github.com/visit/Cbis.ProductManagement.Client
  3448566 | https://github.com/visit/Citybreak.DistanceLookup
  3786892 | https://github.com/visit/QConsole.NET
 13264749 | https://github.com/visit/turisthack.se
 13496506 | https://github.com/visit/crystalquartz
 24401239 | https://github.com/visit/CBIS.WriteAPI.Client
 27434679 | https://github.com/visit/api-doc
 69842117 | https://github.com/visit/galaxy-docs
 28586454 | https://github.com/visit/ha-api
 29281950 | https://github.com/visit/cbisimport
 32562703 | https://github.com/visit/ngit
 34607943 | https://github.com/visit/reveal.js
 35033245 | https://github.com/visit/gelf4net
 37858795 | https://github.com/visit/structuremap
 46831477 | https://github.com/visit/Spark
 61764716 | https://github.com/visit/citybreak-online
 64497527 | https://github.com/visit/visit-doc-template
 84983803 | https://pypi.org/project/visit/
(22 lignes)

softwareheritage=> select * from origin where url ~ '.*/directory/.*';
    id    |                                url                                 
----------+--------------------------------------------------------------------
 84497127 | https://gitlab.com/TownPound/Cyclos/directory/directory-api-JS.git
 84958944 | https://pypi.org/project/directory/
 89452895 | http://hdiff.luite.com/cgit/directory/
 89492532 | https://git.savannah.gnu.org/cgit/directory/dev/
 89492517 | https://git.savannah.gnu.org/cgit/directory/sbin/
 89493242 | https://git.savannah.gnu.org/cgit/directory/cvsroot/
 89494005 | http://git.savannah.gnu.org/git/directory/run
 89493912 | http://git.savannah.gnu.org/git/directory/home
 89492511 | https://git.savannah.gnu.org/cgit/directory/lib64/
 89493249 | https://git.savannah.gnu.org/cgit/directory/net/
 89494075 | http://git.savannah.gnu.org/git/directory/media
 89493014 | https://git.savannah.gnu.org/cgit/directory/lib/
 89493140 | https://git.savannah.gnu.org/cgit/directory/tmp/
 89493752 | http://git.savannah.gnu.org/git/directory/srv
 89494342 | http://git.savannah.gnu.org/git/directory/sbin
 89492933 | https://git.savannah.gnu.org/cgit/directory/etc/
 89493575 | http://git.savannah.gnu.org/git/directory/usr
 89493439 | http://git.savannah.gnu.org/git/directory/var
 89492699 | https://git.savannah.gnu.org/cgit/directory/local/
 89493186 | https://git.savannah.gnu.org/cgit/directory/sources/
 89492476 | https://git.savannah.gnu.org/cgit/directory/home/
 89493683 | http://git.savannah.gnu.org/git/directory/net
 89493457 | http://git.savannah.gnu.org/git/directory/sources
 89493547 | http://git.savannah.gnu.org/git/directory/lib
 89492783 | https://git.savannah.gnu.org/cgit/directory/mnt/
 89492998 | https://git.savannah.gnu.org/cgit/directory/proc/
 89493023 | https://git.savannah.gnu.org/cgit/directory/root/
 89492575 | https://git.savannah.gnu.org/cgit/directory/boot/
 89492673 | https://git.savannah.gnu.org/cgit/directory/usr/
 89492674 | https://git.savannah.gnu.org/cgit/directory/var/
 89492781 | https://git.savannah.gnu.org/cgit/directory/srv/
 89493307 | https://git.savannah.gnu.org/cgit/directory/sys/
 89493314 | https://git.savannah.gnu.org/cgit/directory/archives/
 89492689 | https://git.savannah.gnu.org/cgit/directory/lost+found/
 89492690 | https://git.savannah.gnu.org/cgit/directory/run/
 89492966 | https://git.savannah.gnu.org/cgit/directory/bin/
 89493899 | http://git.savannah.gnu.org/git/directory/sys
 89493911 | http://git.savannah.gnu.org/git/directory/bin
 89493178 | https://git.savannah.gnu.org/cgit/directory/opt/
 89494044 | http://git.savannah.gnu.org/git/directory/root
 89493954 | http://git.savannah.gnu.org/git/directory/archives
 89493347 | https://git.savannah.gnu.org/cgit/directory/media/
 89493849 | http://git.savannah.gnu.org/git/directory/opt
 89493662 | http://git.savannah.gnu.org/git/directory/local
 89493930 | http://git.savannah.gnu.org/git/directory/tmp
 89494298 | http://git.savannah.gnu.org/git/directory/mnt
 89493889 | http://git.savannah.gnu.org/git/directory/lib64
 89494119 | http://git.savannah.gnu.org/git/directory/boot
 89494306 | http://git.savannah.gnu.org/git/directory/lost+found
 89494452 | http://git.savannah.gnu.org/git/directory/etc
 89494362 | http://git.savannah.gnu.org/git/directory/dev
 89494559 | http://git.savannah.gnu.org/git/directory/proc
 89494501 | http://git.savannah.gnu.org/git/directory/cvsroot
(53 lignes)

I will abandon that diff and create a task to globally disambiguate browse urls and rely on query parameters instead of url arguments as this is the only solution to fix that mess.

swh/web/tests/data.py
255–297

So imagine how long it was when using raw dicts instead models.

Abandoning this as the issue is more global than it seems and the whole URLs scheme of swh-web/browse needs to be disambiguated (T2135).