Previous runs of loaders didn't write to the metadata storage, only on revision objects
Description
Description
Revisions and Commits
Revisions and Commits
Event Timeline
Comment Actions
This task will take us one step towards a searchable archive :-)
We should keep a very conservative approach, I would suggest to keep the metadata and just copy.
this way, you don't need to distinguish between the fields that require and those that do not.
Finally, it will be less stressful to run a script that doesn't change the archive but is very useful for the search mechanisms we want to implement on the ERMDS (Extrinsic Raw MetaData Storage).
Comment Actions
Tail of log:
Processed 0.46M rows (~0.2%, last revision: 0095624edf008b754fb1ed5bd656d22c63f984ff) Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 1204, in <module> main(storage_dbconn, storage_url, deposit_dbconn, bytes.fromhex(first_id), True) File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 1165, in main handle_row(row, storage, deposit_cur, dry_run) File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 975, in handle_row storage, row["id"], metadata["original_artifact"][0]["filename"] File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 261, in pypi_origin_from_filename project_name = pypi_project_from_filename(filename) File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 252, in pypi_project_from_filename assert match, original_filename AssertionError: pypops-201408-r4.tar.gz
I'll make the script log the revisions it's unable to process, rather than uselessly fall flat on its face.
Comment Actions
2021-04-06 20:19:19,898 __main__ ERROR Could not parse revision metadata 00959a167bd98452c98ce73382f4b42179d53d32 Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 1161, in main handle_row(row, storage, deposit_cur, dry_run) File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 979, in handle_row storage, row["id"], metadata["original_artifact"][0]["filename"] File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 265, in pypi_origin_from_filename project_name = pypi_project_from_filename(filename) File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 256, in pypi_project_from_filename assert match, original_filename AssertionError: pypops-201408-r4.tar.gz
Comment Actions
2021-04-06 20:54:44,962 __main__ ERROR Could not parse revision metadata 00c6e2fe046dee3b5ef629f74f4801345840e70a Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 1161, in main handle_row(row, storage, deposit_cur, dry_run) File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 843, in handle_row assert "id" in actual_metadata or "title" in actual_metadata AssertionError
Comment Actions
I've relaunched the latest version of the migrate_extrinsic_metadata script on getty...