Page MenuHomeSoftware Heritage

/browse/search/ crashes when swh-search results are not in swh-storage
Closed, MigratedEdits Locked

Description

swh-search is up to date with the main swh-storage cluster, while swh-web uses a replica that is currently lagging behind. This means that search results returned by swh-search may not exist in the storage.

When this happens, the JS code gets a 404 when querying https://archive.softwareheritage.org/api/1/origin/visit/latest/doc/ with an HTML body.

However, it expects to always get a JSON body, causing this crash:

https://sentry.softwareheritage.org/organizations/swh/issues/105105/?referrer=phabricator_plugin

SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON
  at None (./swh/web/browse/assets/browse/origin-search.js:61:66)
  at Array.map (<anonymous>)
  at call (./swh/web/browse/assets/browse/origin-search.js:61:55)
...
(3 additional frame(s) were not displayed)

which means the page gets stuck on the "Checking" spinner

Event Timeline

vlorentz triaged this task as Normal priority.
vlorentz raised the priority of this task from Normal to High.
vlorentz updated the task description. (Show Details)
vlorentz added a subscriber: vlorentz.

Ah right I noticed that behavior when fixing a recent bug in the webapp (D8820), will fix that asap.

Actually this is not related to the replica lag but it is a regression induced by recent commit rDWAPPS4cc9676a54cc368394c05b7f19c92ea072f8041e.

Basically, all requests end up with 404 now. Weird that I did not catch the issue with cypress tests, my guess is that everything is mocked.

So yes let's fix that asap.

Hmm, I do not hit the issue locally so that is why tests did not spot the issue. This seems to only happen in production.

(swh) ✔ ~/swh/swh-environment/swh-web [master|⚑ 165] 
16:34 $ curl -i "http://localhost:5004/api/1/origin/https%3A%2F%2Fgithub.com%2FwereHamster%2Fpercy-material-ui-stylesheet-issue/visit/latest/?require_snapshot=true"
HTTP/1.1 404 Not Found
Date: Wed, 09 Nov 2022 15:34:58 GMT
Server: WSGIServer/0.2 CPython/3.9.2
Content-Type: application/json
Vary: Accept, Origin, Cookie, Accept-Encoding
Allow: GET, OPTIONS, HEAD, OPTIONS
X-Frame-Options: SAMEORIGIN
Content-Length: 1657

{"exception":"NotFoundExc","reason":"Origin with url https://github.com/wereHamster/percy-material-ui-stylesheet-issue not found!","traceback":"Traceback (most recent call last):\n  File \"/home/anlambert/.virtualenvs/swh/lib/python3.9/site-packages/rest_framework/views.py\", line 506, in dispatch\n    response = handler(request, *args, **kwargs)\n  File \"/home/anlambert/.virtualenvs/swh/lib/python3.9/site-packages/rest_framework/decorators.py\", line 50, in handler\n    return func(*args, **kwargs)\n  File \"/home/anlambert/swh/swh-environment/swh-web/swh/web/api/apiurls.py\", line 98, in api_view_f\n    response = f(request, **kwargs)\n  File \"/home/anlambert/swh/swh-environment/swh-web/swh/web/api/apidoc.py\", line 395, in documented_view\n    raise exc\n  File \"/home/anlambert/swh/swh-environment/swh-web/swh/web/api/apidoc.py\", line 392, in documented_view\n    return {\"data\": f(request, **kwargs), \"doc_data\": doc_data}\n  File \"/home/anlambert/swh/swh-environment/swh-web/swh/web/api/views/origin.py\", line 422, in api_origin_visit_latest\n    return api_lookup(\n  File \"/home/anlambert/swh/swh-environment/swh-web/swh/web/api/views/utils.py\", line 67, in api_lookup\n    res = lookup_fn(*args, **kwargs)\n  File \"/home/anlambert/swh/swh-environment/swh-web/swh/web/utils/archive.py\", line 997, in lookup_origin_visit_latest\n    origin_url = lookup_origin(\n  File \"/home/anlambert/swh/swh-environment/swh-web/swh/web/utils/archive.py\", line 228, in lookup_origin\n    raise NotFoundExc(msg)\nswh.web.utils.exc.NotFoundExc: Origin with url https://github.com/wereHamster/percy-material-ui-stylesheet-issue not found!\n"}(swh) ✔ ~/swh/swh-environment/swh-web [master|⚑ 165] 
16:34 $ curl -i "https://archive.softwareheritage.org/api/1/origin/https%3A%2F%2Fgithub.com%2FwereHamster%2Fpercy-material-ui-stylesheet-issue/visit/latest/?require_snapshot=true"
HTTP/1.1 404 Not Found
Date: Wed, 09 Nov 2022 15:34:46 GMT
Server: Apache
Content-Length: 196
Content-Type: text/html; charset=iso-8859-1
X-Varnish: 30298984 30419887
Age: 23
Via: 1.1 varnish (Varnish/6.1)
Strict-Transport-Security: max-age=15768000;
Connection: keep-alive

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL was not found on this server.</p>
</body></html>

It seems the fix is to only encode the ? in an origin URL when it is provided as URL argument.
The issue we hit in production might be related to the varnish cache I think.

16:48 $ curl "https://webapp.staging.swh.network/api/1/origin/http%3A%2F%2Fapps.fz-juelich.de%2Fjsc%2Fjube%2Fjube2%2Fdownload.php%3Fversion%3D2.2.2/visit/latest/?require_snapshot=true"
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL was not found on this server.</p>
</body></html>


(swh) ✔ ~/swh/swh-environment/swh-web [master|⚑ 165] 
16:48 $ curl "https://webapp.staging.swh.network/api/1/origin/http://apps.fz-juelich.de/jsc/jube/jube2/download.php?version=2.2.2/visit/latest/?require_snapshot=true"
<!DOCTYPE html><html lang=en> <head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width, initial-scale=1, shrink-to-fit=no"><title>Error 404 &ndash; Software Heritage archive</title><link type=text/css href=/static/css/vendors.dfce793c06269b88766c.css rel=stylesheet><script type=text/javascript src=/static/js/vendors.13c95cb68a145a1a92cc.js></script><link type=text/css href=/static/css/webapp.a5ee322795df19e158b7.css rel=stylesheet><script type=text/javascript src=/static/js/webapp.af2bd555c622bfbed4af.js></script><link type=text/css href=/static/css/guided_tour.877fa77fcebc36a05b5c.css rel=stylesheet><script type=text/javascript src=/static/js/guided_tour.bdbc068176205b1438f3.js></script><script>
/*
@licstart  The following is the entire license notice for the JavaScript code in this page.

Copyright (C) 2015-2022  The Software Heritage developers

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

@licend  The above is the entire license notice for the JavaScript code in this page.
*/
    </script><script>
      SWH_CONFIG = {"sentry_dsn": "https://e7b7c32e211048f0bc370112a252fd49@sentry.softwareheritage.org/13"};
      swh.webapp.sentryInit(SWH_CONFIG.sentry_dsn);
    </script><script src=/jsreverse/ type=text/javascript></script><script>
      swh.webapp.setSwhObjectIcons({"alias": "mdi mdi-star", "branch": "mdi mdi-source-branch", "branches": "mdi mdi-source-branch", "content": "mdi mdi-file-document", "cnt": "mdi mdi-file-document", "directory": "mdi mdi-folder", "dir": "mdi mdi-folder", "origin": "mdi mdi-source-repository", "ori": "mdi mdi-source-repository", "person": "mdi mdi-account", "revisions history": "mdi mdi-history", "release": "mdi mdi-tag", "rel": "mdi mdi-tag", "releases": "mdi mdi-tag", "revision": "mdi mdi-rotate-90 mdi-source-commit", "rev": "mdi mdi-rotate-90 mdi-source-commit", "snapshot": "mdi mdi-camera", "snp": "mdi mdi-camera", "visits": "mdi mdi-calendar-month"});
    </script><script id=swh_user_logged_in type=application/json>false</script><link rel=icon href=/static/img/icons/swh-logo-32x32.png sizes=32x32><link rel=icon href=/static/img/icons/swh-logo-archive-192x192.png sizes=192x192><link rel=apple-touch-icon-precomposed href=/static/img/icons/swh-logo-archive-180x180.png><link rel=search type=application/opensearchdescription+xml title="Software Heritage archive of public source code" href=/static/xml/swh-opensearch.xml><meta name=msapplication-TileImage content=/static/img/icons/swh-logo-archive-270x270.png></head> <body class="hold-transition layout-fixed sidebar-mini "> <a id=top></a> <div class=wrapper> <div class=swh-top-bar> <ul> <li class=swh-position-left> <div id=swh-full-width-switch-container class="custom-control custom-switch d-none d-lg-block d-xl-block"> <input type=checkbox class=custom-control-input id=swh-full-width-switch onclick=swh.webapp.fullWidthToggled(event)> <label class="custom-control-label font-weight-normal" for=swh-full-width-switch>Full width</label> </div> </li> <li> <a href=https://www.softwareheritage.org>Home</a> </li> <li> <a href=https://forge.softwareheritage.org/ >Development</a> </li> <li> <a href=https://docs.softwareheritage.org/devel/ >Documentation</a> </li> <li> <a class=swh-donate-link href=https://www.softwareheritage.org/donate>Donate</a> </li> <li class=swh-position-right> <a href=https://status.softwareheritage.org/ target=_blank class="swh-current-status mr-3 d-none d-lg-inline-block d-xl-inline-block"> <span id=swh-current-status-description>Operational</span> <i class="swh-current-status-indicator green"></i> </a> <a id=swh-login href="/oidc/login/?next=https://webapp.staging.swh.network/api/1/origin/http://apps.fz-juelich.de/jsc/jube/jube2/download.php?version=2.2.2/visit/latest/?require_snapshot=true">login</a> </li> </ul> </div> <div class=swh-banner> <div class=content> <div class=container-fluid> <div class="row align-items-center justify-content-end"> <div class="col-12 text-center align-middle pr-3"> We are hiring ! See our <a target=_blank href=https://www.softwareheritage.org/jobs/ rel="noopener noreferrer">job offers</a>. </div> </div> </div> </div> </div> <nav class="main-header navbar navbar-expand-lg navbar-light navbar-static-top swh-navbar
                  swh-navbar-banner "> <div class=navbar-header> <a class="nav-link swh-push-menu" data-widget=pushmenu data-enable-remember=true href=#> <i class="mdi mdi-24px mdi-menu mdi-fw" aria-hidden=true></i> </a> </div> <div class=navbar style="width: 94%;"> <div class=swh-navbar-content> <h4>Error</h4> <form class="form-horizontal d-none d-md-flex input-group swh-search-navbar needs-validation" id=swh-origins-search-top> <input class=form-control placeholder="Enter a SWHID to resolve or keyword(s) to search for in origin URLs" type=text id=swh-origins-search-top-input oninput=swh.webapp.validateSWHIDInput(this) required> <div class=input-group-append> <button class="btn btn-primary" type=submit> <i class="swh-search-icon mdi mdi-24px mdi-magnify" aria-hidden=true></i> </button> </div> </form> </div> </div> </nav> </div> <aside class="swh-sidebar main-sidebar  main-sidebar-banner 
                  sidebar-no-expand sidebar-light-primary elevation-4 swh-sidebar-expanded"> <a href=/ class=brand-link> <img class=brand-image src=/static/img/swh-logo.png> <div href=/ class="brand-text sitename"> <span class=first-word>Software</span> <span class=second-word>Heritage</span> </div> </a> <a href=/ class=swh-words-logo> <div class=swh-words-logo-swh> <span class=first-word>Software</span> <span class=second-word>Heritage</span> </div> <span>Archive</span> </a> <div class=sidebar> <nav class=mt-2> <ul class="nav nav-pills nav-sidebar flex-column" data-widget=treeview role=menu data-accordion=false> <li class=nav-header>Features</li> <li class="nav-item swh-search-item" title="Search archived software"> <a href=/browse/search/ class="nav-link swh-search-link"> <i style="color: #e20026;" class="nav-icon mdi mdi-24px mdi-magnify"></i> <p>Search</p> </a> </li> <li class="nav-item swh-vault-item" title="Download archived software from the Vault"> <a href=/vault/ class="nav-link swh-vault-link"> <i style="color: #e20026;" class="nav-icon mdi mdi-24px mdi-download"></i> <p>Downloads</p> </a> </li> <li class="nav-item swh-origin-save-item" title="Request the saving of a software origin into the archive"> <a href=/save/ class="nav-link swh-origin-save-link"> <i style="color: #e20026;" class="nav-icon mdi mdi-24px mdi-camera"></i> <p>Save code now</p> </a> </li> <li class="nav-item swh-add-forge-now-item" title="Request adding a new forge listing"> <a href=/add-forge/request/create/ class="nav-link swh-add-forge-now-link"> <i style="color: #e20026;" class="nav-icon mdi mdi-24px mdi-anvil"></i> <p>Add forge now</p> </a> </li> <li class="nav-item swh-help-item" title="How to browse the archive ?"> <a href=# class="nav-link swh-help-link" onclick=swh.guided_tour.guidedTourButtonClick(event)> <i style="color: #e20026;" class="nav-icon mdi mdi-24px mdi-help-circle"></i> <p>Help</p> </a> </li> </ul> </nav> </div> </aside> <div class=content-wrapper> <section class=content> <div class=container id=swh-web-content> <div class="swh-corner-ribbon
                         swh-corner-ribbon-banner "> Staging<br>v0.0.398 </div> <div class=swh-http-error> <div class=swh-http-error-head>Error</div> <div class=swh-http-error-code> <img class="hidden-xs swh-image-error" src=/static/img/swh-logo.svg> 404 <img class="hidden-xs swh-image-error" src=/static/img/swh-logo.svg> </div> <h3>Resource not found</h3> <div class=swh-http-error-desc> <pre>The resource /api/1/origin/http://apps.fz-juelich.de/jsc/jube/jube2/download.php could not be found on the server.</pre> <div> <a class=btn onclick=window.history.back();> <i class="mdi mdi-arrow-left" aria-hidden=true></i> Go back to previous page </a> or <a href=/ class=btn> <i class="mdi mdi-arrow-left" aria-hidden=true></i> Go back to homepage </a> </div> </div> </div> </div> </section> </div> <div class="modal fade" id=swh-web-modal-message tabindex=-1 role=dialog aria-labelledby=swh-web-modal-message-label aria-hidden=true> <div class=modal-dialog> <div class=modal-content> <div class=modal-header> <h6 class=modal-title id=swh-web-modal-message-label></h6> <button type=button class=close data-dismiss=modal aria-label=Close> <span aria-hidden=true>&times;</span> </button> </div> <div class=modal-body> <p></p> </div> <div class=modal-footer> <button type=button class="btn btn-default btn-sm" data-dismiss=modal>Ok</button> </div> </div> </div> </div> <div class="modal fade" id=swh-web-modal-confirm tabindex=-1 role=dialog aria-labelledby=swh-web-modal-confirm-label aria-hidden=true> <div class=modal-dialog> <div class=modal-content> <div class=modal-header> <h6 class=modal-title id=swh-web-modal-confirm-label></h6> <button type=button class=close data-dismiss=modal aria-label=Close> <span aria-hidden=true>&times;</span> </button> </div> <div class=modal-body> <p></p> </div> <div class=modal-footer> <button type=button class="btn btn-default btn-sm" data-dismiss=modal>Cancel</button> <button type=button id=swh-web-modal-confirm-ok-btn class="btn btn-default btn-sm" data-dismiss=modal>Ok</button> </div> </div> </div> </div> <div class="modal fade" id=swh-web-modal-html tabindex=-1 role=dialog aria-labelledby=swh-web-modal-html-label aria-hidden=true> <div class=modal-dialog> <div class=modal-content> <div class=modal-header> <h6 class=modal-title id=swh-web-modal-html-label></h6> <button type=button class=close data-dismiss=modal aria-label=Close> <span aria-hidden=true>&times;</span> </button> </div> <div class=modal-body> </div> </div> </div> </div> <footer class=footer> <div class="container text-center"> <a href=https://www.softwareheritage.org>Software Heritage</a> &mdash; Copyright (C) 2015&ndash;2022, The Software Heritage developers. License: <a href=https://www.gnu.org/licenses/agpl.html>GNU AGPLv3+</a>. <br> The source code of Software Heritage <em>itself</em> is available on our <a href=https://forge.softwareheritage.org/ >development forge</a>. <br> The source code files <em>archived</em> by Software Heritage are available under their own copyright and licenses. <br> <span class=link-color>Terms of use: </span> <a href=https://www.softwareheritage.org/legal/bulk-access-terms-of-use/ >Archive access</a>, <a href=https://www.softwareheritage.org/legal/api-terms-of-use/ >API</a>- <a href=https://www.softwareheritage.org/contact/ >Contact</a>- <a href=/jslicenses/ rel=jslicense>JavaScript license information</a>- <a href=/api/ >Web API</a><br> </div> </footer> <div id=back-to-top> <a href=#top><img alt="back to top" src=/static/img/arrow-up-small.png></a> </div> <script>
      swh.webapp.setContainerFullWidth();
      var statusServerURL = "https://status.softwareheritage.org/";
      var statusJsonPath = "1.0/status/578e5eddcdc0cc7951000520";
      swh.webapp.initStatusWidget(statusServerURL + statusJsonPath);
    </script> </body> </html> 


(swh) ✔ ~/swh/swh-environment/swh-web [master|⚑ 165] 
16:48 $ curl "https://webapp.staging.swh.network/api/1/origin/http://apps.fz-juelich.de/jsc/jube/jube2/download.php%3Fversion=2.2.2/visit/latest/?require_snapshot=true"
{"origin":"http://apps.fz-juelich.de/jsc/jube/jube2/download.php?version=2.2.2","visit":1,"date":"2022-10-31T09:14:01.790901+00:00","status":"full","snapshot":"2757925df6e6b34a3235b60749e77cc4d691aa80","type":"content","metadata":{},"origin_url":"https://webapp.staging.swh.network/api/1/origin/http://apps.fz-juelich.de/jsc/jube/jube2/download.php%3Fversion=2.2.2/get/","snapshot_url":"https://webapp.staging.swh.network/api/1/snapshot/2757925df6e6b34a3235b60749e77cc4d691aa80/"}
anlambert claimed this task.

Fixed and deployed.

It seems the fix is to only encode the ? in an origin URL when it is provided as URL argument.

Ah yes, that makes sense