Page MenuHomeSoftware Heritage

FUSE: directories referencing artifacts missing from the archive are reported as empty
Closed, MigratedEdits Locked

Description

$ ls archive/swh:1:dir:9e7cba45f1ef65a45647bce181bdf78f65da90a4
$ 

whereas the directory swh:1:dir:9e7cba45f1ef65a45647bce181bdf78f65da90a4 is not empty, but references a revision (as a submodule named "sha1collisiondetection") which is itself missing from the archive: swh:1:rev:855827c583bc30645ba427885caa40c5b81764d2.

Here are the logs:

ERROR:swh.fuse:Cannot fetch metadata for object swh:1:rev:855827c583bc30645ba427885caa40c5b81764d2: 404 Client Error: Not Found for url: https://archive.softwareheritage.org/api/1/revision/855827c583bc30645ba427885caa40c5b81764d2/
ERROR:swh.fuse:Cannot lookup: 404 Client Error: Not Found for url: https://archive.softwareheritage.org/api/1/revision/855827c583bc30645ba427885caa40c5b81764d2/
Traceback (most recent call last):
  File "/home/haltode/work/swh/swh-environment/swh-fuse/swh/fuse/fuse.py", line 290, in lookup
    lookup_entry = await parent_entry.lookup(name)
  File "/home/haltode/work/swh/swh-environment/swh-fuse/swh/fuse/fs/entry.py", line 104, in lookup
    async for entry in self.get_entries():
  File "/home/haltode/work/swh/swh-environment/swh-fuse/swh/fuse/fs/entry.py", line 93, in get_entries
    entries = [x async for x in self.compute_entries()]
  File "/home/haltode/work/swh/swh-environment/swh-fuse/swh/fuse/fs/entry.py", line 93, in <listcomp>
    entries = [x async for x in self.compute_entries()]
  File "/home/haltode/work/swh/swh-environment/swh-fuse/swh/fuse/fs/artifact.py", line 115, in compute_entries
    await self.fuse.get_metadata(swhid)
  File "/home/haltode/work/swh/swh-environment/swh-fuse/swh/fuse/fuse.py", line 87, in get_metadata
    metadata = await loop.run_in_executor(None, self.web_api.get, swhid, typify)
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/haltode/work/swh/swh-environment/.venv/lib/python3.8/site-packages/swh/web/client/client.py", line 226, in get
    return self._getters[swhid_.object_type](swhid_, typify)
  File "/home/haltode/work/swh/swh-environment/.venv/lib/python3.8/site-packages/swh/web/client/client.py", line 306, in revision
    json = self._call(f"revision/{_get_swhid(swhid).object_id}/", **req_args).json()
  File "/home/haltode/work/swh/swh-environment/.venv/lib/python3.8/site-packages/swh/web/client/client.py", line 191, in _call
    r.raise_for_status()
  File "/home/haltode/work/swh/swh-environment/.venv/lib/python3.8/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://archive.softwareheritage.org/api/1/revision/855827c583bc30645ba427885caa40c5b81764d2/

Instead we should be able to list the directory entries and either report the symlink as broken or not show it at all.

Event Timeline

haltode renamed this task from FUSE: directory completly empty when one artifact is missing from the archive to FUSE: directory completely empty when one artifact is missing from the archive.Dec 4 2020, 12:34 PM
haltode triaged this task as Normal priority.
haltode created this task.
haltode created this object in space S1 Public.
zack renamed this task from FUSE: directory completely empty when one artifact is missing from the archive to FUSE: directories referencing artifacts missing from the archive are reported as empty.Dec 4 2020, 1:49 PM
zack updated the task description. (Show Details)
zack added a subscriber: zack.

good catch!, a broken symlink would be preferable over omitting the entry

haltode changed the task status from Open to Work in Progress.Dec 8 2020, 10:29 AM
haltode moved this task from Backlog to In progress on the Software Heritage filesystem board.