Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 8 2023

gitlab-migration changed the status of T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage, a subtask of T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects, from Resolved to Migrated.
Jan 8 2023, 4:33 PM · Data Model, Storage manager, Extrinsic metadata
gitlab-migration changed the status of T2961: Add a "not_found" value in OriginVisitStatus.status from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:33 PM · Sprint 2021 01, Data Model
gitlab-migration changed the status of T2963: Add visit_type field to OriginVisitStatus model object from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:33 PM · Data Model, Sprint 2021 01
gitlab-migration changed the status of T2788: deduplicate validation logic between parse_swhid() and SWHID class constructor from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:32 PM · Data Model
gitlab-migration changed the status of T2769: Make function swh.model.identifiers.parse_swhid more strict from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:32 PM · Data Model
gitlab-migration changed the status of T2570: swh-identify: support exclusion patterns (e.g., for .git/) as swh-scanner does from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:31 PM · Data Model
gitlab-migration changed the status of T2494: tests: Use data model objects within tests (drop dicts), a subtask of T645: Type swh-storage endpoints with swh.model objects, from Resolved to Migrated.
Jan 8 2023, 4:31 PM · Data Model, Storage manager
gitlab-migration changed the status of T2423: Extract the `extra_headers` away from `Revision.metadata` into a top-level immutable object, a subtask of T2421: Make model objects immutable, from Resolved to Migrated.
Jan 8 2023, 4:30 PM · Data Model
gitlab-migration changed the status of T2423: Extract the `extra_headers` away from `Revision.metadata` into a top-level immutable object from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:30 PM · Data Model
gitlab-migration changed the status of T2421: Make model objects immutable from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:30 PM · Data Model
gitlab-migration changed the status of T2422: Add an `object_type` attribute to model classes from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:30 PM · Data Model
gitlab-migration changed the status of T2387: Clarify visits in SWHIDs from Wontfix to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:30 PM · Data Model
gitlab-migration changed the status of T2308: Better Validation in swh.model from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:29 PM · Data Model
gitlab-migration changed the status of T2288: pip install swh.model[cli] no longer provides a usable "swh" command from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:29 PM · Data Model
gitlab-migration changed the status of T2212: Specification for swh:2+: identifiers, a subtask of T2210: Data Model, from Duplicate to Migrated.
Jan 8 2023, 4:29 PM · Data Model, Roadmap 2020
gitlab-migration changed the status of T2212: Specification for swh:2+: identifiers from Duplicate to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:29 PM · Data Model, Roadmap 2020
gitlab-migration changed the status of T2210: Data Model from Invalid to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:29 PM · Data Model, Roadmap 2020
gitlab-migration changed the status of T1986: swh.model.identifiers: move validation from parsing_persistent_identifier to PersistentId constructor from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:28 PM · Easy hack, Data Model
gitlab-migration changed the status of T1687: Add filename as an optional part in persistent identifiers from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:27 PM · Data Model
gitlab-migration changed the status of T1645: Associating multiple authors to a revision from Duplicate to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:27 PM · Data Model
gitlab-migration changed the status of T1135: swh-identify: follow symlink by default for paths given as args from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Data Model
gitlab-migration changed the status of T1136: swh-identify: support recursive checksumming of directories from Invalid to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Data Model
gitlab-migration changed the status of T1134: swh-identify: support multiple path arguments from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Data Model
gitlab-migration changed the status of T1133: swh-identify: show filename in output from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Data Model
gitlab-migration changed the status of T1133: swh-identify: show filename in output, a subtask of T1136: swh-identify: support recursive checksumming of directories, from Resolved to Migrated.
Jan 8 2023, 4:24 PM · Data Model
gitlab-migration changed the status of T1125: parse_persistent_identifier: returned object type should not be abbreviated from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Data Model
gitlab-migration changed the status of T1112: swh.model.identifier: Improve persistent identifier representation from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Data Model
gitlab-migration changed the status of T1104: parse_persistent_identifier() should raise a parsing exception on invalid identifiers from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Data Model
gitlab-migration changed the status of T1039: add swh-model CLI front-end to compute persistent identifiers from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Data Model
gitlab-migration changed the status of T1003: Register a SWH uri scheme with IANA from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Data Model
gitlab-migration changed the status of T709: Clean up the swh.model.git API from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:20 PM · Data Model
gitlab-migration changed the status of T685: swh-model generates directory entries with bogus permissions from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:20 PM · Data Model, Restricted Project
gitlab-migration changed the status of T566: specify serialization format for repository snapshot objects from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:19 PM · Data Model
gitlab-migration changed the status of T270: TEXT → BYTEA conversion for columns that represent VCS tags/branches, a subtask of T271: Update clients on impacts + upgrade respective package dependencies, from Resolved to Migrated.
Jan 8 2023, 4:18 PM · Git loader, Directory loader, Data Model, Web app

Dec 22 2022

vlorentz added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

We have a new one that went unnoticed until 19 days ago: b'superduper/super/sub/bye.txt' is not a valid directory entry name.

Dec 22 2022, 12:40 PM · Archive integrity, Object storage, Data Model

Nov 22 2022

olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

To force kafka compaction to run I've done the following:

Nov 22 2022, 5:19 PM · Archive integrity, Object storage, Data Model

Nov 10 2022

anlambert added a comment to T4632: Origin URLs generated for Fedora origins.

@anlambert What about non-Fedora RPM repositories? (RHEL, SUSE, Rocky Linux, ...)

Nov 10 2022, 1:57 PM · Archive coverage, Data Model
vlorentz added a comment to T4632: Origin URLs generated for Fedora origins.

@anlambert What about non-Fedora RPM repositories? (RHEL, SUSE, Rocky Linux, ...)

Nov 10 2022, 1:29 PM · Archive coverage, Data Model
anlambert added a comment to T4632: Origin URLs generated for Fedora origins.

Actually for fedora, I found a better origin URL pattern: https://src.fedoraproject.org/rpms/{pkg_name}

Nov 10 2022, 11:27 AM · Archive coverage, Data Model

Nov 9 2022

anlambert added a comment to T4632: Origin URLs generated for Fedora origins.

After reviewing and hacking on the fedora lister, I think we should use origin URL in the form https://packages.fedoraproject.org/pkgs/{src_pkg_name} for a fedora source package.

Nov 9 2022, 2:09 PM · Archive coverage, Data Model

Nov 3 2022

bchauvet added a parent task for T4662: staging: Deploy nixguix lister and loader: T4079: Extend archive coverage.
Nov 3 2022, 10:17 AM · Data Model, Nixguix loader

Nov 2 2022

vlorentz added a parent task for T4668: replayer: Crashes because of directories with duplicated entry names in journal: T2033: Run Cassandra storage backend with production data.
Nov 2 2022, 11:32 AM · Data Model, Journal
vlorentz added a comment to T4668: replayer: Crashes because of directories with duplicated entry names in journal.

New objects with duplicated entries go through this method to be fixed: https://docs.softwareheritage.org/devel/apidoc/swh.model.model.html#swh.model.model.Directory.from_possibly_duplicated_entries (currently only used by swh.storage.backfill); but old kafka messages still have duplicate entries, causing the crash above.

Nov 2 2022, 11:20 AM · Data Model, Journal
vlorentz placed T4668: replayer: Crashes because of directories with duplicated entry names in journal up for grabs.
Nov 2 2022, 11:17 AM · Data Model, Journal
swh-sentry-integration assigned T4668: replayer: Crashes because of directories with duplicated entry names in journal to vlorentz.
Nov 2 2022, 11:17 AM · Data Model, Journal

Oct 26 2022

ardumont updated the task description for T3781: Replace the Nixguix loader with a lister.
Oct 26 2022, 5:48 PM · Data Model, Nixguix loader
ardumont added a comment to T3781: Replace the Nixguix loader with a lister.

With latest diffs, the filtering seems to sort properly the files and tarballs for the guix manifest:

Oct 26 2022, 2:44 PM · Data Model, Nixguix loader
ardumont updated the task description for T3781: Replace the Nixguix loader with a lister.
Oct 26 2022, 2:34 PM · Data Model, Nixguix loader
ardumont closed T4662: staging: Deploy nixguix lister and loader, a subtask of T3781: Replace the Nixguix loader with a lister, as Invalid.
Oct 26 2022, 2:33 PM · Data Model, Nixguix loader
ardumont closed T4662: staging: Deploy nixguix lister and loader as Invalid.

/me sighs

Oct 26 2022, 2:33 PM · Data Model, Nixguix loader
ardumont triaged T4662: staging: Deploy nixguix lister and loader as Normal priority.
Oct 26 2022, 2:31 PM · Data Model, Nixguix loader
ardumont updated the task description for T3781: Replace the Nixguix loader with a lister.
Oct 26 2022, 2:29 PM · Data Model, Nixguix loader

Oct 25 2022

olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

This is now done, the objects are fixed in the production DB and kafka.

Oct 25 2022, 8:10 PM · Archive integrity, Object storage, Data Model
olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

@vlorentz I'm running the following adaptation to your script:

Oct 25 2022, 7:03 PM · Archive integrity, Object storage, Data Model
ardumont updated the task description for T3781: Replace the Nixguix loader with a lister.
Oct 25 2022, 6:57 PM · Data Model, Nixguix loader
ardumont updated subscribers of T3781: Replace the Nixguix loader with a lister.

Last analysis without [1]. That last diff should fix the key entries marked with the key 'only-version-should-be-tarball'.

Oct 25 2022, 6:53 PM · Data Model, Nixguix loader
seirl added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Oh yeah, I was thinking of just removing the entire project, but your solution also works.

Oct 25 2022, 6:15 PM · Archive integrity, Object storage, Data Model
vlorentz updated subscribers of T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Holes are bad. And I just opened a diff to make the git loader apply the same transformation, as @olasd made the same comment: D8776

Oct 25 2022, 6:10 PM · Archive integrity, Object storage, Data Model
seirl added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Do you actually want to keep these objects? This would be inconsistent with the fixed loader behavior that would just reject those objects, and not load the repository at all.

Oct 25 2022, 6:06 PM · Archive integrity, Object storage, Data Model
vlorentz changed the visibility for T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').
Oct 25 2022, 5:57 PM · Archive integrity, Object storage, Data Model
ardumont added a revision to T3781: Replace the Nixguix loader with a lister: D8774: nixguix: Use content-disposition from http head request if provided.
Oct 25 2022, 5:50 PM · Data Model, Nixguix loader
vlorentz updated subscribers of T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

I tried to add a workaround in the backfiller, but it is incredibly hard to do properly, especially as entries as disordered, so raw_manifest needs to be fixed in two different ways.

Oct 25 2022, 5:41 PM · Archive integrity, Object storage, Data Model
ardumont added a revision to T3781: Replace the Nixguix loader with a lister: D8773: nixguix: Deal with edge case url with version instead of extension.
Oct 25 2022, 5:29 PM · Data Model, Nixguix loader
ardumont updated the task description for T3781: Replace the Nixguix loader with a lister.
Oct 25 2022, 2:50 PM · Data Model, Nixguix loader
ardumont added a revision to T3781: Replace the Nixguix loader with a lister: D8763: nixguix: Allow lister to ignore specific extensions.
Oct 25 2022, 10:41 AM · Data Model, Nixguix loader
ardumont added a revision to T3781: Replace the Nixguix loader with a lister: D8761: nixguix/test: Add all supported tarball extensions to test manifest.
Oct 25 2022, 10:09 AM · Data Model, Nixguix loader

Oct 24 2022

ardumont added a revision to T3781: Replace the Nixguix loader with a lister: D8758: swh.core.tarball: Wire support to existing tarball.
Oct 24 2022, 4:38 PM · Data Model, Nixguix loader
ardumont added a comment to T3781: Replace the Nixguix loader with a lister.

Checks that newly detected extensions are actually supported already.
Summary [1] and the actual checks [2]:

Oct 24 2022, 4:36 PM · Data Model, Nixguix loader
ardumont added a revision to T3781: Replace the Nixguix loader with a lister: D8757: Add support for more tarball/zip extensions.
Oct 24 2022, 3:53 PM · Data Model, Nixguix loader
ardumont added a comment to T3781: Replace the Nixguix loader with a lister.

I had a pass on extensions to further check what's a tarball or not [1]

Oct 24 2022, 3:19 PM · Data Model, Nixguix loader
ardumont updated the task description for T3781: Replace the Nixguix loader with a lister.
Oct 24 2022, 3:17 PM · Data Model, Nixguix loader
ardumont added a comment to T3781: Replace the Nixguix loader with a lister.

Improved version with noisy urls printed alongside the hash output [1] [2]:

Oct 24 2022, 3:16 PM · Data Model, Nixguix loader

Oct 19 2022

gitlab-migration changed the status of T3837: clean up swh-scheduler model db from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:05 PM · Scheduling utilities, Data Model, System administration
gitlab-migration changed the status of T3819: Deploy swh.model 4.1.0 / swh.storage 0.41.0 to production, a subtask of T3752: Store/represent time offsets as strings, from Resolved to Migrated.
Oct 19 2022, 6:05 PM · Data Model, Storage manager
gitlab-migration changed the status of T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3, a subtask of T3638: Make package loaders create releases objects instead of revisions, from Resolved to Migrated.
Oct 19 2022, 6:05 PM · Package Loader, Data Model, Archive content
gitlab-migration changed the status of T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3 from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:05 PM · System administration, Package Loader, Data Model, Archive content
gitlab-migration changed the status of T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3 from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:05 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
gitlab-migration changed the status of T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3, a subtask of T3638: Make package loaders create releases objects instead of revisions, from Resolved to Migrated.
Oct 19 2022, 6:05 PM · Package Loader, Data Model, Archive content

Oct 18 2022

vlorentz added a parent task for T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml'): T2033: Run Cassandra storage backend with production data.
Oct 18 2022, 3:40 PM · Archive integrity, Object storage, Data Model
vlorentz triaged T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml') as High priority.
Oct 18 2022, 3:39 PM · Archive integrity, Object storage, Data Model
swh-sentry-integration assigned T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml') to vlorentz.
Oct 18 2022, 3:39 PM · Archive integrity, Object storage, Data Model

Oct 17 2022

anlambert closed T4633: Make MerkleNode.collect return a set of MerkleNode instead of a dict as Resolved by committing rDMOD13e7adc3e854: merkle: Make MerkleNode.collect return a set of nodes instead of a dict.
Oct 17 2022, 5:20 PM · Data Model

Oct 14 2022

anlambert added a revision to T4633: Make MerkleNode.collect return a set of MerkleNode instead of a dict: D8686: merkle: Make MerkleNode.collect return a set of nodes instead of a dict.
Oct 14 2022, 4:00 PM · Data Model
anlambert closed T3858: Add diff features for class from_disk.Directory as Invalid.

Closing this as invalid as it already exists a method named collect in the merkle.MerkleNode class (base of from_disk.Directory) that does exactly what it is detailed in the task description.
Nevertheless, that method could be improved to give more flexibility in client code (T4633).

Oct 14 2022, 3:50 PM · Data Model
anlambert triaged T4633: Make MerkleNode.collect return a set of MerkleNode instead of a dict as Normal priority.
Oct 14 2022, 3:45 PM · Data Model

Oct 13 2022

vlorentz triaged T4632: Origin URLs generated for Fedora origins as Normal priority.
Oct 13 2022, 10:40 AM · Archive coverage, Data Model

Oct 7 2022

ardumont added a comment to T3781: Replace the Nixguix loader with a lister.

With D8637, listing is less noisy [1] (code [2]):

Oct 7 2022, 2:22 PM · Data Model, Nixguix loader

Oct 6 2022

ardumont updated the task description for T3781: Replace the Nixguix loader with a lister.
Oct 6 2022, 6:15 PM · Data Model, Nixguix loader
ardumont closed T3294: nixguix: Add support for pseudo-URLs with a missing schema, a subtask of T3781: Replace the Nixguix loader with a lister, as Resolved.
Oct 6 2022, 6:13 PM · Data Model, Nixguix loader
ardumont updated the task description for T4609: nipxkgs manifests list "git" origins as "urls".
Oct 6 2022, 6:12 PM · Data Model, Nixguix loader
ardumont triaged T4609: nipxkgs manifests list "git" origins as "urls" as Normal priority.
Oct 6 2022, 6:11 PM · Data Model, Nixguix loader
ardumont updated the task description for T4608: nixpkgs manifests list "recursive" file which are missing information to recompute their hashes.
Oct 6 2022, 6:08 PM · Data Model, Nixguix loader
ardumont added a comment to T4608: nixpkgs manifests list "recursive" file which are missing information to recompute their hashes.

I tried to have a look at the code and try to run it but fail [1].

Oct 6 2022, 4:54 PM · Data Model, Nixguix loader
ardumont updated the task description for T4608: nixpkgs manifests list "recursive" file which are missing information to recompute their hashes.
Oct 6 2022, 4:02 PM · Data Model, Nixguix loader
ardumont added a revision to T3781: Replace the Nixguix loader with a lister: D8637: nixguix: Exclude faulty "recursive" file origins from listing.
Oct 6 2022, 4:02 PM · Data Model, Nixguix loader
ardumont added a revision to T4608: nixpkgs manifests list "recursive" file which are missing information to recompute their hashes: D8637: nixguix: Exclude faulty "recursive" file origins from listing.
Oct 6 2022, 4:02 PM · Data Model, Nixguix loader
ardumont updated the task description for T3781: Replace the Nixguix loader with a lister.
Oct 6 2022, 3:36 PM · Data Model, Nixguix loader
ardumont triaged T4608: nixpkgs manifests list "recursive" file which are missing information to recompute their hashes as Normal priority.
Oct 6 2022, 3:32 PM · Data Model, Nixguix loader
ardumont updated the task description for T3781: Replace the Nixguix loader with a lister.
Oct 6 2022, 2:44 PM · Data Model, Nixguix loader
ardumont added a comment to T3781: Replace the Nixguix loader with a lister.

The actual nixpkgs manifests are either not built properly or not complete yet. They
sometimes are referencing hash we cannot compute back as only the derivation is
containing the information [1] [2].

Oct 6 2022, 2:37 PM · Data Model, Nixguix loader