Page MenuHomeSoftware Heritage

FUSE: history/by-date/.status file implementation does not seem reliable
Closed, MigratedEdits Locked

Description

  • During live testing (T2811), after doing multiple ls in by-date/ multiple .status file appeared.
  • Trying to rely on the file presence to stop the waiting delay in tests resulted in random CI fails (T2829, D4628)

Overall we should rework its implementation and add further tests on this specific SwhFS behavior.

Event Timeline

haltode triaged this task as Normal priority.Nov 30 2020, 4:58 PM
haltode created this task.
haltode created this object in space S1 Public.
haltode changed the task status from Open to Work in Progress.Dec 3 2020, 4:29 PM
haltode moved this task from Backlog to In progress on the Software Heritage filesystem board.

Investigating this issue, I found the two underlying problems:

  1. The history cache is currently opening a second connection to the metadata db. This creates a locking problem on the db, and is also unnecessary (it is leftover code from a few months ago, when the history had its unique sqlite db, so it made sense to create a new connection back then).
  2. The way get_entries with offset works is that it's going to check the direntry cache, if the entries are not in it, it will call compute_entries, put this in cache, and yield on this list starting from the given offset. The problem with history/by-date/ is that it is not put in the direntry cache until the fetching is finished, so compute_entries gets called everytime. And unfortunately the list of entries returned is not going to be consistent over time because it is getting filled in the background with more history commits. So the offset can get messed up and return multiple same entries to readdir.

The first problem is pretty straightforward to fix, however I am not sure about a good way to solve the second one, maybe override the get_entries in RevisionHistoryShardByDate and put some hacky bit there to make the offset work?