- User Since
- Sep 7 2015, 3:25 PM (295 w, 4 d)
Fri, Apr 30
If we want to remove perms normalization from the tarball loader, then we need to discuss that in a separate task. This fixes and introduces tests for the behavior that was intended by the original code, which was buggy.
D5652 adds tests for the manifest format, so I don't think you need to do that here.
Thu, Apr 29
Don't forget to drop the notice calls ;)
Tue, Apr 27
Mon, Apr 26
Please mount the shared cache directory in the image to avoid re-downloading all of PyPI from the internet on every build. I'd suggest duplicating the existing includes/agent-docker.groovy.j2 to a new agent-docker-sphinx.groovy.j2 file, to add the options relevant for this build (for instance, I don't know how useful mounting a tmpdir on /tmp would be).
Fri, Apr 23
Now deployed in prod:
New swh.loader.core deployed in staging.
I'd probably keep the new package repo instructions in here as they have no counterpart in the developer docs; Move that to the bottom, maybe?
Mon, Apr 19
Some partitions have reached the tail of the journal and everything is still running smoothly, yay.
So D5246 has landed a while ago. The s3 object copy process has now caught up on some partitions and I can confirm that the copy of the latest added objects happens without any race condition.
Fri, Apr 16
Wed, Apr 14
Sure, you can keep this implementation, that's why the diff was accepted in the first place.
Tue, Apr 13
Could you deduplicate swh_scheduler_peek_any_ready_priority_tasks and swh_scheduler_peek_tasks_with_priority?
Very, very nice! Thanks.
Please fix the coding-guidelines ref as well.
I don't understand why this is needed. Aren't we be able to explicitly send instances of the existing swh.loader.git.tasks.UpdateGitRepository task to a separate queue, and have a celery process consume the "regular" tasks from that queue directly?
Mon, Apr 12
The process has been restarted and is well ongoing (we have 800 million objects left to copy, at around 500 ops, so the ETA until reaching the tail of the log is around 3 weeks now).
Knobs to adjust the visibility of origins in the archive and in the web API
I'm sure some of the assumptions behind this script are going to break, eventually. But for now it's better than nothing!
Fri, Apr 9
I'd rephrase the suggestions to "Please install 'swh.model[cli]' for full functionality.", rather than try to guess what the user did. They likely got the swh-identify script from installing swh.model as a dependency of something else!
Jenkins runs the py3 tox environment explicitly, so this won't get run by CI. The same issue exists with the previously added "identify" environment.
Thu, Apr 8
And this is now available in production.
This has now been deployed and tested in staging with a canary origin (github.com/olasd/Pythagore). Time to deploy in production.
Apr 7 2021
Operationally, there's two axes we can play with:
Apr 6 2021
2021-04-06 20:54:44,962 __main__ ERROR Could not parse revision metadata 00c6e2fe046dee3b5ef629f74f4801345840e70a Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 1161, in main handle_row(row, storage, deposit_cur, dry_run) File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 843, in handle_row assert "id" in actual_metadata or "title" in actual_metadata AssertionError
2021-04-06 20:19:19,898 __main__ ERROR Could not parse revision metadata 00959a167bd98452c98ce73382f4b42179d53d32 Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 1161, in main handle_row(row, storage, deposit_cur, dry_run) File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 979, in handle_row storage, row["id"], metadata["original_artifact"]["filename"] File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 265, in pypi_origin_from_filename project_name = pypi_project_from_filename(filename) File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 256, in pypi_project_from_filename assert match, original_filename AssertionError: pypops-201408-r4.tar.gz
(I've also noticed dry_run was = True, so I fixed that as well :P)
Tail of log:
The script is now running on getty.
Add comment for the sha512 field
The migration script has now run to completion (took around a week).
Apr 2 2021
Mar 30 2021
I've deployed the extid schema changes on all storages, and I've started the migration script on getty.
Mar 29 2021
Mar 25 2021
+1 from me, this guide is great.
Mar 23 2021
Not quite sure about the split between BasePackageInfo / BaseManifestPackageInfo (and I really don't like the new name).
The following objects remain:
Shouldn't the extid() methods all return a tuple (extid_type, extid_value) rather than a plain bytes value? I can imagine a point where, for the same loader, we might want to change the extid_type, and the current implementation wouldn't be able to distinguish them.
After a lot of back and forth, and the release of swh.model v2.3.0 and swh.storage v0.26.0, this is now all done and deployed in staging and production.
After the release of swh.model v2, this is now done.
The missing topic (raw_extrinsic_metadata) has been handled as part of the migration in T3019. Closing.