Page MenuHomeSoftware Heritage

fuse: populate history/ incrementally to avoid temporary hangs on huge repos
Closed, MigratedEdits Locked

Description

Using a linux commit (swh:1:rev:f39d7d78b70e0f39facb1e4fab77ad3df5c52a35), doing a swh-graph API call graph/visit/nodes/{SWHID}?edges=rev:rev takes around 40s to complete/process all data, because we wait for all the history to be retrieved before starting to process it.

We should instead process it in a streaming fashion, starting to populate history/ incrementally, and using .status file to inform the user about progress.

Event Timeline

haltode triaged this task as Normal priority.Oct 20 2020, 4:04 PM
haltode created this task.
haltode created this object in space S1 Public.
zack renamed this task from Realistic benchmark to explore the Linux kernel history to history/ dir browsing is too slow on big repos like the Linux kernel.Oct 20 2020, 6:23 PM
zack renamed this task from history/ dir browsing is too slow on big repos like the Linux kernel to populate history/ incrementally to avoid temporary hangs on huge repos.Dec 15 2020, 11:58 AM
zack updated the task description. (Show Details)

I experimented a bit with the streaming idea, here is what we could do:

  • When calling the swh-web-client inner _call method, pass a stream=True argument (it will be used by the requests.get() call), so we retrieve a streaming response instead of waiting on the full results immediatly
  • Use asyncion.create_task() as we did for the history/ filling to iterate over the streaming content and fill the cache.
haltode changed the task status from Open to Work in Progress.Jan 5 2021, 6:21 PM
haltode moved this task from Backlog to In progress on the Software Heritage filesystem board.
vlorentz renamed this task from populate history/ incrementally to avoid temporary hangs on huge repos to fuse: populate history/ incrementally to avoid temporary hangs on huge repos.Apr 19 2021, 1:43 PM