We should update cache when new visits are added from a specific origin.
Description
Revisions and Commits
rDFUSE FUSE virtual file system | |||
D4744 | rDFUSEa6cef6bad56e cache: update cache with new origin visits | ||
D4744 | rDFUSEb6e8cf744f3e cache: add primary key to db tables |
Event Timeline
The difficulty with this one is deciding when to re-query the backend to check if there are new visits. Doing it too often will make the cache of visit metadata useless. Doing it too seldomly will make you miss new visits. Either way, we probably need to add a timestamp somewhere in the cache to note down when the metadata have been fetched last (!= most recent visit timestamp).
Considering the current rate of archive visit, it's probably pointless to re-fetch origin visits more than once per day.
This also raises the question of whether we want to enable users to selectively remove cache entries, e.g., a variant of "swh fs clean" that only removes something from the cache, rather than all of it. One maybe decent UI to do that would be something like swh fs clean [ID]..., where ID is either a SWHID or a origin URL; calling that will make swh-fuse only remove the parts of the cache about the ID. (This would be a separate task though.)