D1582 has been pushed the task can be closed
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 3 2019
Jun 25 2019
Jun 24 2019
Jun 20 2019
Jun 18 2019
Jun 15 2019
All columns commented in swh-scheduler, waiting review.
Some columns for swh-storage required a small discussion to frame appropriate comments.
Jun 14 2019
All columns are already commented in swh-indexer
Have added a few comments in D1582
Jun 13 2019
The latest upgrade is 136.sql while the version in 30-swh-schema.sql is 133. Should I name the next upgrade 137?
there seems to be an inconsistency between sql/upgrades and latest sql version in swh-storage. The latest upgrade is 136.sql while the version in 30-swh-schema.sql is 133. Should I name the next upgrade 137?
is there anything left to be done to close the task?
Jun 12 2019
modules swh-scheduler, swh-indexer, swh-storage, all seem to have column comments written in 30-swh-schema.sql
Can you provide a few more details so I can work on this? Maybe which packages will be affected and what is expected in the comments.
Jun 7 2019
In the mean time, i've stopped those indexers as this impacts other (i see transactions piling-up).
May 25 2019
May 22 2019
Apr 24 2019
We should investigate why they are there.
This is now done, aside from a minor issue noted below:
softwareheritage-indexer=# select count(*) from revision_intrinsic_metadata where metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb; count ------- 0 (1 row)
Apr 19 2019
Apr 3 2019
Apr 2 2019
Mar 28 2019
Mar 20 2019
Mar 15 2019
@vlorentz: lather, rinse, repeat.
softwareheritage-indexer=# DELETE FROM revision_intrinsic_metadata WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ; ERROR: deadlock detected DETAIL: Process 23900 waits for ShareLock on transaction 212164862; blocked by process 20175. Process 20175 waits for ShareLock on transaction 212164381; blocked by process 23900. HINT: See server log for query details. CONTEXT: while deleting tuple (772424,55) in relation "revision_intrinsic_metadata" Time: 33048,828 ms (00:33,049)
(just happened, after indexers have been restarted including D1218)
Mar 14 2019
That's better:
For the cleanup to actually happen fast, i deactivated the constraints
(done), executed the delete (done) and reinstalled the constraints (in
progress).
Mar 13 2019
Fixed in 4f6ab3c9ab17.
the current state is the last delete query is still running (on the indexer configuration).
Mar 12 2019
Mar 11 2019
Mar 4 2019
In T1549#29103, @vlorentz wrote:Once this is landed and deployed, ordering your DELETEs by revision_metadata.id will acquire locks in the same order as the idx_storage, solving the deadlock issue.
Once this is landed and deployed, ordering your DELETEs by revision_metadata.id will acquire locks in the same order as the idx_storage, solving the deadlock issue.
Mar 2 2019
The update completed, but a first attempt at the second DELETE failed with a deadlock (?!):
softwareheritage-indexer=# DELETE FROM revision_metadata WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ; ERROR: deadlock detected DETAIL: Process 10966 waits for ShareLock on transaction 197265813; blocked by process 11754. Process 11754 waits for ShareLock on transaction 197264487; blocked by process 10966. HINT: See server log for query details. CONTEXT: while deleting tuple (1380733,15) in relation "revision_metadata" Time: 170864,091 ms (02:50,864)
The following fix for the above (suggested by @vlorentz ) is now running:
update revision_metadata set translated_metadata = origin_intrinsic_metadata.metadata from origin_intrinsic_metadata where revision_metadata.id=origin_intrinsic_metadata.from_revision and revision_metadata.translated_metadata='{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}' and origin_intrinsic_metadata.metadata != '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}';
Mar 1 2019
As discussed on IRC, even after cleaning up origin_intrinsic_metadata, the DELETE on revision_metadata fails with:
softwareheritage-indexer=# DELETE FROM revision_metadata softwareheritage-indexer-# WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::json b ;
I've started the first of following queries on somerset (in a screen of my user):
DELETE FROM origin_intrinsic_metadata WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;
Feb 27 2019
To know that we went over the origin and did not find anything, without additional tables.
Feb 26 2019
I don't understand your comment. What are the remaining arguments for using NULL instead of just deleting rows?
and the NULL option
Yes :-)
so, do we agree that the right fix for this task is just to get rid of empty-ish rows? or are there other arguments that we haven't considered yet?
Sounds like a solution for T1528 :)
In T1549#28882, @vlorentz wrote:What is the provenance map?
What is the provenance map?
My tentative proposal is to delete all table entries for which no metadata has been found.
The invariant will be: if an origin/revision has metadata, there will be an entry in the table(s); if not, the origin/revision will not appear.
(but of course if you want to have a CLI tool to generate the info, sure; I just wanted to highlight here that the end goal is the doc)
More than a CLI tool, I'd like to have documentation about how to use the CodeMeta metadata that we extract, sort of "typing information" for the content of the various intrinsic metadata tables.
It might be something as simple as:
the current state is the last delete query is still running (on the indexer configuration).
s/clean up/deduplicate/
More like:
Feb 25 2019
sql script to migrate (in progress):
No, only the context key, which doesn't make sense anymore (there's a mappings column in metadata tables).
A priori, the solution would be to remove the context from the tool_configuration column (seen with @vlorentz).
Feb 20 2019
Feb 18 2019
Feb 14 2019
Or, as the OriginMetadataIndexer already fetches this data anyway, it could write it to the indexer db when it's done.
Should be easy to do, we already have this info in the indexer db (in origin_intrinsic_metadata.mappings). Then it's just a matter of creating oneshot tasks.
The naive solution to do this is adding a new indexer that pre-fetches snapshot+revision+root dir of an origin and writes its list of root files in the indexer db. Then we can read that to find which origins have a given file name pattern.