@rdicosmo a full example of what?
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Oct 12 2020
The suggestion was to have extrinsic metadata on directories that come from a deposit of a bundle (e.g. .tar.gz or .zip file coming from HAL), instead of on a synthetic revision as is currently the case, so they can be accessed knowing the hash of the directory (which is an intrinsic id).
Oct 8 2020
Alternatively, we could keep writing the metadata on revision/releases, and use the provenance service (when it's ready) to find them from a directory SWHID. What do you think?
Oct 6 2020
Oct 1 2020
Sep 29 2020
Sep 25 2020
Sep 24 2020
aka T1910
Sep 23 2020
Build is green
Rename tests db to "storage"
it's unclear to me if the hunk in swh_storage_backend_config is related with the fix or not.
Build is green
it's unclear to me if the hunk in swh_storage_backend_config is related with the fix or not.
isn't the very last diff hunk enough for the job?
- Drop unneeded postgresql_proc
Related to D4013
More information:
(Pdb) cfg['storage'] {'check_config': {'check_write': True}, 'cls': 'local', 'db': 'postgresql://postgres@127.0.0.1:16607/tests', 'objstorage': {'args': {}, 'cls': 'memory'}} (Pdb) cfg['scheduler'] {'args': {'db': "user=postgres password=xxx dbname=tests host=127.0.0.1 port=16607 options=''"}, 'cls': 'local'}
Sep 22 2020
This is very probably superseded by @vlorentz 's work on swh.search.
Sep 21 2020
Sep 18 2020
Sep 17 2020
Btw, sqitch uses the native scripting for the database engine (in case of postgres, psql scripts), so if we ever end up using it, we can keep this approach.
So, I've first attempted something along the lines of https://www.depesz.com/2008/06/18/conditional-ddl/, which uses a function to execute DDL commands stored in a string.
Sep 16 2020
Sep 15 2020
We considered three possibilities for the schema (assuming that we want to get rid of the three separate tables for dir_entries, rev_entries and file_entries -- otherwise, there's 6 possibilities).
Sep 14 2020
everything ran now including and up to D3936.
Sep 10 2020
started back the migration, this time running from a shared tmux session on belvedere (before it was with my user):
Sep 9 2020
revisions done so far (as of yesterday, when it got stopped):
stand-by related to T2561
Sep 8 2020
Sep 7 2020
Sep 6 2020
Noting down that I had a tentative very preliminary implementation in the feature/fuse branch of swh-graph; see in particular fuse.py there.
It's probably no worth picking up and we should restart from scratch at this point, but might still contain useful material.
(The webclient in there has since become a proper thing, see T2279. So that part is definitely obsolete.)
Sep 4 2020
Sep 3 2020
Sep 2 2020
Aug 31 2020
Aug 28 2020
ETA as of 28/08/2020 with 2 update process concurrently running (and workers) of ~55 days (actual speed around 1.5M revisions per hour, which is not fast but steady and replication compliant).
I have started a 3rd update process, i'll update the task with some ETA update on monday.
Aug 27 2020
i guess so, yes.
well, storage is typed now but some endpoints remains inconsistent (as T645#47156 explicits with a unification proposal which is not done yet, aside content_get_data and content_get_metadata)
I guess this task can be closed, since this backfilling process will be part of the one we will run soon to fill the new kafka cluster
what's missing for this task to be closed?
Aug 26 2020
actually again modified so we do the 3 update queries (160, 161) next to each other to avoid caveats [1].
Aug 25 2020
I don't know where to add the link about this. I started a wip documentation
with pieces of what happens for a worker or databage (data) upgrades on the
intranet [1]