Page MenuHomeSoftware Heritage

investigate the persistent seemingly benign errors on cooking progress
Open, NormalPublic

Description

A directory not found error persistently appears in vault logs when cooking from the web UI.
It seems to have no negative impact on functionality, the cooking runs and the bundle is OK.

Search for "ard"

swh-web_1                       | [10/Dec/2020 15:55:03] [DEBUG] urllib3.connectionpool._get_conn:272 - Resetting dropped connection: swh-storage
swh-storage_1                   | [2020-12-10 15:55:03 +0000] [85] [DEBUG] POST /origin/search
swh-web_1                       | [10/Dec/2020 15:55:03] [DEBUG] urllib3.connectionpool._make_request:461 - http://swh-storage:5002 "POST /origin/search HTTP/1.1" 200 135
...

Click on origin https://github.com/ardumont/home-manager

swh-web_1                       | [10/Dec/2020 15:55:11] [DEBUG] urllib3.connectionpool._new_conn:231 - Starting new HTTP connection (1): swh-storage:5002
swh-storage_1                   | [2020-12-10 15:55:11 +0000] [85] [DEBUG] POST /origin/get
swh-web_1                       | [10/Dec/2020 15:55:11] [DEBUG] urllib3.connectionpool._make_request:461 - http://swh-storage:5002 "POST /origin/get HTTP/1.1" 200 84
swh-storage_1                   | [2020-12-10 15:55:11 +0000] [85] [DEBUG] POST /origin/visit/get_latest
swh-web_1                       | [10/Dec/2020 15:55:11] [DEBUG] urllib3.connectionpool._make_request:461 - http://swh-storage:5002 "POST /origin/visit/get_latest HTTP/1.1" 200 167
...
swh-objstorage_1                | 192.168.16.19 [10/Dec/2020:15:55:12 +0000] "POST /content/get HTTP/1.1" 200 10838 "-" "python-requests/2.25.0"
swh-web_1                       | [10/Dec/2020 15:55:12] [DEBUG] urllib3.connectionpool._make_request:461 - http://swh-storage:5002 "POST /content/data HTTP/1.1" 200 10688

Click "Download"

swh-web_1                       | [10/Dec/2020 15:55:25] [DEBUG] urllib3.connectionpool._new_conn:231 - Starting new HTTP connection (1): swh-vault:5005
swh-vault_1                     | config: {'db': 'postgresql:///?service=swh-vault', 'cache': {'slicing': '0:1/1:4', 'cls': 'pathslicing', 'root': '/srv/softwareheritage/vault'}, 'storage': {'storage': {'cls': 'remote', 'url': 'http://swh-storage:5002/'}, 'cls': 'retry'}, 'scheduler': {'cls': 'remote', 'url': 'http://swh-scheduler:5008/'}, 'client_max_size': 1073741824}
swh-vault_1                     | ERROR:root:directory 28ed8ee75346dde7547fbc82ddbe0ed7ce2978a7 was not found.
swh-vault_1                     | Traceback (most recent call last):
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/asynchronous.py", line 71, in middleware_handler
swh-vault_1                     |     return await handler(request)
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/asynchronous.py", line 178, in decorated_meth
swh-vault_1                     |     result = obj_meth(**kw)
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/db/common.py", line 62, in _meth
swh-vault_1                     |     return meth(self, *args, db=db, cur=cur, **kwargs)
swh-vault_1                     |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/vault/backend.py", line 131, in progress
swh-vault_1                     |     raise NotFoundExc(f"{obj_type} {hex_id} was not found.")
swh-vault_1                     | swh.vault.exc.NotFoundExc: directory 28ed8ee75346dde7547fbc82ddbe0ed7ce2978a7 was not found.
swh-vault_1                     | INFO:aiohttp.access:192.168.16.24 [10/Dec/2020:15:55:25 +0000] "POST /progress HTTP/1.1" 400 1133 "-" "python-requests/2.25.0"
swh-web_1                       | [10/Dec/2020 15:55:25] [DEBUG] urllib3.connectionpool._make_request:461 - http://swh-vault:5005 "POST /progress HTTP/1.1" 400 976

Click "OK"

swh-storage_1                   | [2020-12-10 15:56:10 +0000] [86] [DEBUG] POST /directory/missing
swh-scheduler_1                 | [2020-12-10 15:56:10 +0000] [78] [DEBUG] POST /task/create
swh-vault_1                     | INFO:aiohttp.access:192.168.16.24 [10/Dec/2020:15:56:10 +0000] "POST /cook HTTP/1.1" 200 414 "-" "python-requests/2.25.0"
swh-web_1                       | [10/Dec/2020 15:56:10] [DEBUG] urllib3.connectionpool._make_request:461 - http://swh-vault:5005 "POST /cook HTTP/1.1" 200 266
swh-scheduler-runner_1          | INFO:swh.scheduler.celery_backend.runner:Grabbed 1 tasks cook-vault-bundle
swh-vault-worker_1              | [2020-12-10 15:56:10,779: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[f79a7ea7-bf25-441f-8d15-bae8647e4519]
swh-vault-worker_1              | [2020-12-10 15:56:10,781: DEBUG/ForkPoolWorker-1] Loading config file /cooker.yml
swh-vault_1                     | INFO:aiohttp.access:192.168.16.30 [10/Dec/2020:15:56:10 +0000] "POST /set_status HTTP/1.1" 200 147 "-" "python-requests/2.25.0"
swh-vault_1                     | INFO:aiohttp.access:192.168.16.30 [10/Dec/2020:15:56:10 +0000] "POST /set_progress HTTP/1.1" 200 147 "-" "python-requests/2.25.0"
swh-scheduler-runner_1          | INFO:swh.scheduler.cli.admin.runner:Scheduled 1 tasks
swh-storage_1                   | [2020-12-10 15:56:10 +0000] [86] [DEBUG] POST /directory/ls
...
swh-storage_1                   | [2020-12-10 15:56:11 +0000] [86] [DEBUG] POST /directory/ls
swh-storage_1                   | [2020-12-10 15:56:11 +0000] [86] [DEBUG] POST /content/data
swh-objstorage_1                | 192.168.16.19 [10/Dec/2020:15:56:11 +0000] "POST /content/get HTTP/1.1" 200 8202 "-" "python-requests/2.25.0"
...
swh-storage_1                   | [2020-12-10 15:56:14 +0000] [86] [DEBUG] POST /content/data
swh-objstorage_1                | 192.168.16.19 [10/Dec/2020:15:56:14 +0000] "POST /content/get HTTP/1.1" 200 596 "-" "python-requests/2.25.0"
swh-vault_1                     | INFO:aiohttp.access:192.168.16.30 [10/Dec/2020:15:56:14 +0000] "POST /put_bundle HTTP/1.1" 200 147 "-" "python-requests/2.25.0"
swh-vault_1                     | INFO:aiohttp.access:192.168.16.30 [10/Dec/2020:15:56:14 +0000] "POST /set_status HTTP/1.1" 200 147 "-" "python-requests/2.25.0"
swh-vault_1                     | INFO:aiohttp.access:192.168.16.30 [10/Dec/2020:15:56:14 +0000] "POST /set_progress HTTP/1.1" 200 147 "-" "python-requests/2.25.0"
swh-vault_1                     | INFO:aiohttp.access:192.168.16.30 [10/Dec/2020:15:56:14 +0000] "POST /send_notif HTTP/1.1" 200 147 "-" "python-requests/2.25.0"
swh-storage_1                   | [2020-12-10 15:56:14 +0000] [86] [DEBUG] Closing connection.
...
swh-vault-worker_1              | [2020-12-10 15:56:14,370: INFO/ForkPoolWorker-1] Task swh.vault.cooking_tasks.SWHCookingTask[f79a7ea7-bf25-441f-8d15-bae8647e4519] succeeded in 3.5817152590025216s: None

Go to Downloads page

swh-web_1                       | [10/Dec/2020 15:57:22] [DEBUG] urllib3.connectionpool._new_conn:231 - Starting new HTTP connection (1): swh-vault:5005
swh-vault_1                     | INFO:aiohttp.access:192.168.16.24 [10/Dec/2020:15:57:22 +0000] "POST /progress HTTP/1.1" 200 470 "-" "python-requests/2.25.0"
swh-web_1                       | [10/Dec/2020 15:57:22] [DEBUG] urllib3.connectionpool._make_request:461 - http://swh-vault:5005 "POST /progress HTTP/1.1" 200 322
swh-vault_1                     | INFO:aiohttp.access:192.168.16.24 [10/Dec/2020:15:57:22 +0000] "POST /progress HTTP/1.1" 200 470 "-" "python-requests/2.25.0"
swh-web_1                       | [10/Dec/2020 15:57:22] [DEBUG] urllib3.connectionpool._make_request:461 - http://swh-vault:5005 "POST /progress HTTP/1.1" 200 322

Click "Download"

swh-web_1                       | [10/Dec/2020 15:57:32] [DEBUG] urllib3.connectionpool._make_request:461 - http://swh-vault:5005 "POST /fetch HTTP/1.1" 200 264286
swh-vault_1                     | INFO:aiohttp.access:192.168.16.24 [10/Dec/2020:15:57:32 +0000] "POST /fetch HTTP/1.1" 200 264437 "-" "python-requests/2.25.0"
swh-vault_1                     | INFO:aiohttp.access:192.168.16.24 [10/Dec/2020:15:57:32 +0000] "POST /fetch HTTP/1.1" 200 264437 "-" "python-requests/2.25.0"
swh-web_1                       | [10/Dec/2020 15:57:32] [DEBUG] urllib3.connectionpool._make_request:461 - http://swh-vault:5005 "POST /fetch HTTP/1.1" 200 264286

Bundle is OK:
$ tar xf 28ed8ee75346dde7547fbc82ddbe0ed7ce2978a7.tar.gz
$ swh identify 28ed8ee75346dde7547fbc82ddbe0ed7ce2978a7/
swh:1:dir:28ed8ee75346dde7547fbc82ddbe0ed7ce2978a7	./28ed8ee75346dde7547fbc82ddbe0ed7ce2978a7/

Related Objects

Event Timeline

tenma triaged this task as Normal priority.Dec 14 2020, 3:00 PM
tenma created this task.
tenma updated the task description. (Show Details)
tenma updated the task description. (Show Details)

Yes, i think it's just the pattern the webapp uses.

First request, asking the vault server for a given directory id.
It's not found so it's logged as not being found vault server side.

Then the webapp goes on asking the vault server for cooking
such directory id.

If the user then requires it again a second time, nothing should
be logged as it's found immediately and passed along as downloadable.

or some such.