I experimented a bit with the streaming idea, here is what we could do:
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Dec 17 2020
Dec 16 2020
Dec 15 2020
The visibility part of this task is done, I filed a new task about supporting rm cache/{...} syntax in T2889.
Dec 14 2020
Dec 9 2020
Dec 8 2020
My API idea was to simply have something like ENTRIES_REGEXP = r'^.*:.*$' as a class attribute of each type of directory, and a validate_entry(self, name: str) method which, by default, just checks that it matches the regexp.
Dec 7 2020
Dec 4 2020
good catch!, a broken symlink would be preferable over omitting the entry
Dec 3 2020
In T2771#53972, @seirl wrote:We also need to discuss what exactly we put in cache/. I thought about symlinks to archive/ and meta/, what do you think? Removing the symlinks also means removing the data from the cache.
We also need to discuss what exactly we put in cache/. I thought about symlinks to archive/ and meta/, what do you think? Removing the symlinks also means removing the data from the cache.
New proposal (lather, rinse, repeat…) based on an idea from @seirl:
Dec 2 2020
Yes, this is actually documented (https://docs.softwareheritage.org/devel/swh-fuse/design.html#ori-nodes-origins) and logged (in DEBUG level) when there is a date conflict. We will probably decide on a more fine-grained layout in the future to account for this instead of simply ignoring and picking the first one.
$ swh fs mount /tmp/foobar/ -f ^CERROR:root:Error running FUSE: ERROR:asyncio:Task exception was never retrieved future: <Task finished coro=<_session_loop() done, defined at /home/dev/.local/lib/python3.7/site-packages/_pyfuse3.py:28> exception=AssertionError()> Traceback (most recent call last): File "/home/dev/.local/lib/python3.7/site-packages/_pyfuse3.py", line 30, in wrapper await fn(*args, **kwargs) File "src/internal.pxi", line 253, in _session_loop File "src/handlers.pxi", line 189, in fuse_readlink_async File "/home/dev/.local/lib/python3.7/site-packages/swh/fuse/fuse.py", line 257, in readlink assert isinstance(entry, FuseSymlinkEntry) AssertionError
@vlorentz: can we have the logs please?
Run mount with the "--foreground" option and/or check your user log in "journalctl --user".
TIA
See D4636.
The difficulty with this one is deciding when to re-query the backend to check if there are new visits. Doing it too often will make the cache of visit metadata useless. Doing it too seldomly will make you miss new visits. Either way, we probably need to add a timestamp somewhere in the cache to note down when the metadata have been fetched last (!= most recent visit timestamp).
nice catch, thanks @vlorentz.