Page MenuHomeSoftware Heritage

Fix display of directories named 'directory' in some views
AbandonedPublic

Authored by anlambert on Nov 27 2019, 6:14 PM.

Details

Reviewers
vlorentz
Group Reviewers
Reviewers
Summary

Browsing a directory named 'directory' from a snapshot context view was raising a 404 error.

Reorder URL regexps of the concerned views to fix that issue.

Related to T2115

Depends on D2375

Diff Detail

Repository
rDWAPPS Web applications
Branch
fix-directory-named-directory-browsing
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 9416
Build 13811: Cypress tests for swh-web diffsJenkins
Build 13810: tox-on-jenkinsJenkins
Build 13809: arc lint + arc unit

Event Timeline

vlorentz added inline comments.
swh/web/tests/browse/views/test_origin.py
547–604

Shouldn't test data be out of the tests now?

swh/web/tests/browse/views/test_origin.py
547–604

Indeed, I could move the data creation inside an hypothesis strategy. I should also use the class from swh.model.model instead of raw dicts.

Update: Move sample origin creation in tests.data and add an hypothesis strategy to get
origins containing a directory named 'directory'.

Instead of using raw dicts to create directory, revision and snapshot data, I used
model classes from swh.model.model. I improved a bit some of those models to
automatically compute the object identifiers for commodity of use (D2375).

The CI build will fail as it requires D2375 to be landed first.

Update: Rebase, fix a comment and a test as there is now an origin that contains the empty directory in tests data

Ugh. If this becomes a recurring (rolling? *g*) problem, we should consider pinning the modules we depend on for static checks. There is no much value in having those changing too often.

vlorentz added inline comments.
swh/web/browse/views/origin.py
29–34

They are still ambiguous.

origin/https://foobar/visit/1234/directory/directory/ can be parsed in these ways:

  • first pattern, with origin_url=https://foobar/, timestamp=1234, path=directory
  • second pattern, with origin_url=https://foobar/visit/1234, path=directory
  • fourth pattern, with origin_url=https://foobar/visit/1234/directory

origin/https://foobar/visit/1234/directory/ can be parsed in these ways:

  • third pattern, with origin_url=https://foobar/, timestamp=1234
  • fourth pattern, with origin_url=https://foobar/visit/1234

And none of these are more valid than others, so changing order won't fix the issue.

swh/web/tests/data.py
256–298 ↗(On Diff #8369)

this is way too long for such a simple thing to add :/

This revision now requires changes to proceed.Dec 3 2019, 12:14 PM
swh/web/browse/views/origin.py
29–34

Indeed, this will not work for origins containing visit or directory in their url (sigh).

Currently, the origins that felt into those categories in the archive are:

softwareheritage=> select * from origin where url ~ '.*/visit/.*';
    id    |                          url                           
----------+--------------------------------------------------------
    55571 | https://github.com/visit/CitybreakSsoDemo
   137981 | https://github.com/visit/template_fix
   142252 | https://github.com/visit/html_fix
  2600737 | https://github.com/visit/spark-svn-mirror
  2703707 | https://github.com/visit/Cbis.ProductManagement.Client
  3448566 | https://github.com/visit/Citybreak.DistanceLookup
  3786892 | https://github.com/visit/QConsole.NET
 13264749 | https://github.com/visit/turisthack.se
 13496506 | https://github.com/visit/crystalquartz
 24401239 | https://github.com/visit/CBIS.WriteAPI.Client
 27434679 | https://github.com/visit/api-doc
 69842117 | https://github.com/visit/galaxy-docs
 28586454 | https://github.com/visit/ha-api
 29281950 | https://github.com/visit/cbisimport
 32562703 | https://github.com/visit/ngit
 34607943 | https://github.com/visit/reveal.js
 35033245 | https://github.com/visit/gelf4net
 37858795 | https://github.com/visit/structuremap
 46831477 | https://github.com/visit/Spark
 61764716 | https://github.com/visit/citybreak-online
 64497527 | https://github.com/visit/visit-doc-template
 84983803 | https://pypi.org/project/visit/
(22 lignes)

softwareheritage=> select * from origin where url ~ '.*/directory/.*';
    id    |                                url                                 
----------+--------------------------------------------------------------------
 84497127 | https://gitlab.com/TownPound/Cyclos/directory/directory-api-JS.git
 84958944 | https://pypi.org/project/directory/
 89452895 | http://hdiff.luite.com/cgit/directory/
 89492532 | https://git.savannah.gnu.org/cgit/directory/dev/
 89492517 | https://git.savannah.gnu.org/cgit/directory/sbin/
 89493242 | https://git.savannah.gnu.org/cgit/directory/cvsroot/
 89494005 | http://git.savannah.gnu.org/git/directory/run
 89493912 | http://git.savannah.gnu.org/git/directory/home
 89492511 | https://git.savannah.gnu.org/cgit/directory/lib64/
 89493249 | https://git.savannah.gnu.org/cgit/directory/net/
 89494075 | http://git.savannah.gnu.org/git/directory/media
 89493014 | https://git.savannah.gnu.org/cgit/directory/lib/
 89493140 | https://git.savannah.gnu.org/cgit/directory/tmp/
 89493752 | http://git.savannah.gnu.org/git/directory/srv
 89494342 | http://git.savannah.gnu.org/git/directory/sbin
 89492933 | https://git.savannah.gnu.org/cgit/directory/etc/
 89493575 | http://git.savannah.gnu.org/git/directory/usr
 89493439 | http://git.savannah.gnu.org/git/directory/var
 89492699 | https://git.savannah.gnu.org/cgit/directory/local/
 89493186 | https://git.savannah.gnu.org/cgit/directory/sources/
 89492476 | https://git.savannah.gnu.org/cgit/directory/home/
 89493683 | http://git.savannah.gnu.org/git/directory/net
 89493457 | http://git.savannah.gnu.org/git/directory/sources
 89493547 | http://git.savannah.gnu.org/git/directory/lib
 89492783 | https://git.savannah.gnu.org/cgit/directory/mnt/
 89492998 | https://git.savannah.gnu.org/cgit/directory/proc/
 89493023 | https://git.savannah.gnu.org/cgit/directory/root/
 89492575 | https://git.savannah.gnu.org/cgit/directory/boot/
 89492673 | https://git.savannah.gnu.org/cgit/directory/usr/
 89492674 | https://git.savannah.gnu.org/cgit/directory/var/
 89492781 | https://git.savannah.gnu.org/cgit/directory/srv/
 89493307 | https://git.savannah.gnu.org/cgit/directory/sys/
 89493314 | https://git.savannah.gnu.org/cgit/directory/archives/
 89492689 | https://git.savannah.gnu.org/cgit/directory/lost+found/
 89492690 | https://git.savannah.gnu.org/cgit/directory/run/
 89492966 | https://git.savannah.gnu.org/cgit/directory/bin/
 89493899 | http://git.savannah.gnu.org/git/directory/sys
 89493911 | http://git.savannah.gnu.org/git/directory/bin
 89493178 | https://git.savannah.gnu.org/cgit/directory/opt/
 89494044 | http://git.savannah.gnu.org/git/directory/root
 89493954 | http://git.savannah.gnu.org/git/directory/archives
 89493347 | https://git.savannah.gnu.org/cgit/directory/media/
 89493849 | http://git.savannah.gnu.org/git/directory/opt
 89493662 | http://git.savannah.gnu.org/git/directory/local
 89493930 | http://git.savannah.gnu.org/git/directory/tmp
 89494298 | http://git.savannah.gnu.org/git/directory/mnt
 89493889 | http://git.savannah.gnu.org/git/directory/lib64
 89494119 | http://git.savannah.gnu.org/git/directory/boot
 89494306 | http://git.savannah.gnu.org/git/directory/lost+found
 89494452 | http://git.savannah.gnu.org/git/directory/etc
 89494362 | http://git.savannah.gnu.org/git/directory/dev
 89494559 | http://git.savannah.gnu.org/git/directory/proc
 89494501 | http://git.savannah.gnu.org/git/directory/cvsroot
(53 lignes)

I will abandon that diff and create a task to globally disambiguate browse urls and rely on query parameters instead of url arguments as this is the only solution to fix that mess.

swh/web/tests/data.py
256–298 ↗(On Diff #8369)

So imagine how long it was when using raw dicts instead models.

Abandoning this as the issue is more global than it seems and the whole URLs scheme of swh-web/browse needs to be disambiguated (T2135).