Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 28 2021

vlorentz updated the task description for T3004: swh-storage documentation needs a better introduction.
Jan 28 2021, 5:12 PM · Documentation, Storage manager
vlorentz triaged T3004: swh-storage documentation needs a better introduction as Normal priority.
Jan 28 2021, 5:12 PM · Documentation, Storage manager
vsellier closed T3001: Webapp is not displaying the origin type on the search results as Resolved.

The fix is deployed on webapp1 and solved the problem.

Jan 28 2021, 3:33 PM · Storage manager, Web app
vsellier added a comment to T3001: Webapp is not displaying the origin type on the search results.

The storage version v0.21.1 is deployed in staging, the problem looks fixed :

❯ curl -s  https://webapp.staging.swh.network/api/1/origin/https://gitlab.com/miwc/miwc.github.io.git/visit/latest/\?require_snapshot\=true | jq ''
{
  "origin": "https://gitlab.com/miwc/miwc.github.io.git",
  "date": "2020-12-07T18:21:58.967952+00:00",
  "type": "git",
  "visit": 1,
  "status": "full",
  "snapshot": "759b36e0e3e81e8cbf601181829571daa645b5d2",
  "metadata": {},
  "origin_url": "https://webapp.staging.swh.network/api/1/origin/https://gitlab.com/miwc/miwc.github.io.git/get/",
  "snapshot_url": "https://webapp.staging.swh.network/api/1/snapshot/759b36e0e3e81e8cbf601181829571daa645b5d2/"
}
Jan 28 2021, 2:36 PM · Storage manager, Web app
vsellier added a revision to T3001: Webapp is not displaying the origin type on the search results: D4960: Correctly return origin_visit_status.type value everywhere.
Jan 28 2021, 12:12 PM · Storage manager, Web app
vsellier added projects to T3001: Webapp is not displaying the origin type on the search results: Web app, Storage manager.
Jan 28 2021, 12:11 PM · Storage manager, Web app

Jan 25 2021

vlorentz added a revision to T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects: D4935: identifiers: Add raw_extrinsic_metadata_identifier.
Jan 25 2021, 12:32 PM · Data Model, Storage manager, Extrinsic metadata

Jan 18 2021

vsellier closed T2966: Backfill origin_visit_status **with** the `visit_type` field properly given as Resolved.
Jan 18 2021, 12:02 PM · Storage manager, Sprint 2021 01, Scheduling utilities

Jan 15 2021

ardumont moved T2966: Backfill origin_visit_status **with** the `visit_type` field properly given from in-progress to code review on the Sprint 2021 01 board.
Jan 15 2021, 2:55 PM · Storage manager, Sprint 2021 01, Scheduling utilities
vsellier added a revision to T2966: Backfill origin_visit_status **with** the `visit_type` field properly given: D4871: Backfiller: Add type to the origin_visit_status topic.
Jan 15 2021, 2:41 PM · Storage manager, Sprint 2021 01, Scheduling utilities
vsellier changed the status of T2966: Backfill origin_visit_status **with** the `visit_type` field properly given from Open to Work in Progress.
Jan 15 2021, 2:40 PM · Storage manager, Sprint 2021 01, Scheduling utilities
vsellier closed T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type as Resolved.
Jan 15 2021, 2:01 PM · Storage manager, Sprint 2021 01

Jan 14 2021

ardumont moved T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic from in-progress to done on the Sprint 2021 01 board.
Jan 14 2021, 2:54 PM · Storage manager, Sprint 2021 01

Jan 13 2021

vsellier closed T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic as Resolved.

Implemented in T2964

Jan 13 2021, 4:51 PM · Storage manager, Sprint 2021 01
vsellier moved T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from in-progress to code review on the Sprint 2021 01 board.
Jan 13 2021, 4:50 PM · Storage manager, Sprint 2021 01
vsellier added a revision to T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type: D4858: Add persistence of the field OriginVisitStatus.type.
Jan 13 2021, 4:47 PM · Storage manager, Sprint 2021 01
vsellier added a revision to T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic: D4857: Add new field OriginVisitStatus.type field on test data.
Jan 13 2021, 4:22 PM · Storage manager, Sprint 2021 01
douardda claimed T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type.
Jan 13 2021, 11:39 AM · Storage manager, Sprint 2021 01
douardda moved T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from todo to in-progress on the Sprint 2021 01 board.
Jan 13 2021, 11:39 AM · Storage manager, Sprint 2021 01
vsellier moved T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from in-progress to todo on the Sprint 2021 01 board.
Jan 13 2021, 11:39 AM · Storage manager, Sprint 2021 01
douardda changed the status of T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from Open to Work in Progress.
Jan 13 2021, 11:39 AM · Storage manager, Sprint 2021 01
ardumont changed the status of T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic from Open to Work in Progress.
Jan 13 2021, 11:36 AM · Storage manager, Sprint 2021 01
vsellier claimed T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic.
Jan 13 2021, 11:34 AM · Storage manager, Sprint 2021 01
ardumont triaged T2968: Migrate origin_visit_status records to add the type value as Normal priority.
Jan 13 2021, 9:58 AM · System administration, Storage manager
ardumont removed a project from T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type: Scheduling utilities.
Jan 13 2021, 9:54 AM · Storage manager, Sprint 2021 01
ardumont removed a project from T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic: Scheduling utilities.
Jan 13 2021, 9:52 AM · Storage manager, Sprint 2021 01
ardumont renamed T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic from Adapt storage to actually write correctly the type in the origin_visit_status topics to Adapt storage to actually write the visit_type in the origin_visit_status topic.
Jan 13 2021, 9:52 AM · Storage manager, Sprint 2021 01
ardumont moved T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from Backlog to todo on the Sprint 2021 01 board.
Jan 13 2021, 9:51 AM · Storage manager, Sprint 2021 01
ardumont moved T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic from Backlog to todo on the Sprint 2021 01 board.
Jan 13 2021, 9:51 AM · Storage manager, Sprint 2021 01
ardumont moved T2966: Backfill origin_visit_status **with** the `visit_type` field properly given from Backlog to todo on the Sprint 2021 01 board.
Jan 13 2021, 9:51 AM · Storage manager, Sprint 2021 01, Scheduling utilities
ardumont updated the task description for T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic.
Jan 13 2021, 9:47 AM · Storage manager, Sprint 2021 01
ardumont triaged T2966: Backfill origin_visit_status **with** the `visit_type` field properly given as High priority.
Jan 13 2021, 9:42 AM · Storage manager, Sprint 2021 01, Scheduling utilities
ardumont triaged T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic as High priority.
Jan 13 2021, 9:41 AM · Storage manager, Sprint 2021 01
ardumont placed T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type up for grabs.
Jan 13 2021, 9:39 AM · Storage manager, Sprint 2021 01
ardumont triaged T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type as High priority.
Jan 13 2021, 9:39 AM · Storage manager, Sprint 2021 01
vsellier closed T2905: Deploy swh-search for production, a subtask of T2182: Switch production swh-web to use swh-search instead of postgresql search., as Resolved.
Jan 13 2021, 9:23 AM · System administration, Archive search, Storage manager

Jan 11 2021

vlorentz claimed T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects.
Jan 11 2021, 4:27 PM · Data Model, Storage manager, Extrinsic metadata

Jan 7 2021

olasd added a comment to T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects.

Sounds good to me.

Jan 7 2021, 8:18 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz closed T2497: Create an ElasticSearch cluster tuned for origin/metadata search, a subtask of T2182: Switch production swh-web to use swh-search instead of postgresql search., as Resolved.
Jan 7 2021, 2:13 PM · System administration, Archive search, Storage manager
vlorentz added a parent task for T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects: T2779: Put information (client, collection and deposit-id) inside metadata for metadata-only deposit.
Jan 7 2021, 1:53 PM · Data Model, Storage manager, Extrinsic metadata

Jan 6 2021

ardumont moved T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161) from Backlog to deployed/landed/monitoring on the System administration board.
Jan 6 2021, 3:46 PM · System administration, Storage manager
olasd added a subtask for T2182: Switch production swh-web to use swh-search instead of postgresql search.: T2905: Deploy swh-search for production.
Jan 6 2021, 11:15 AM · System administration, Archive search, Storage manager

Jan 5 2021

douardda added a revision to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders: D4807: Add an ExtID object.
Jan 5 2021, 5:17 PM · Storage manager
vlorentz added a comment to T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects.

Proposed manifest format:

Jan 5 2021, 3:21 PM · Data Model, Storage manager, Extrinsic metadata

Dec 9 2020

douardda added a revision to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders: D4698: Add support for ExtID in the storage.
Dec 9 2020, 5:13 PM · Storage manager
douardda added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

Question: who should be responsible for filling this table? The loader or the storage (as side effect of revision_add)?

Dec 9 2020, 9:21 AM · Storage manager

Dec 8 2020

marmoute added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.
In T2849#54256, @olasd wrote:
In T2849#54254, @zack wrote:

Thanks, this would be an important new feature. Some comments/random thoughts below.

Further design point, which I think it's already implicit in what you wrote, but that it'd be useful to make explicit:

  • this new mapping table is needed "only" to speed up things, if we lose it, it will just mean we will be slower in doing future archival (at least for a while), but won't be the end of the world

assuming this is true, loaders will need to be designed with graceful degradation for the incompleteness (or entire disappearance) of the mapping table.

Agreed. Missing objects in the mapping table would just mean that the loader does a bit more work to re-create the associated objects, and, hopefully, generate the same SWHIDs again (which would then be used to populate the mapping table).

Dec 8 2020, 3:28 PM · Storage manager
douardda added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

Ok so the plan is a first step as simple as possible, implementing what @olasd proposed in the task, put this table in the storage, and provide a simple batch get API endpoint.

Dec 8 2020, 3:23 PM · Storage manager

Dec 7 2020

olasd added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.
In T2849#54254, @zack wrote:

Thanks, this would be an important new feature. Some comments/random thoughts below.

Further design point, which I think it's already implicit in what you wrote, but that it'd be useful to make explicit:

  • this new mapping table is needed "only" to speed up things, if we lose it, it will just mean we will be slower in doing future archival (at least for a while), but won't be the end of the world

assuming this is true, loaders will need to be designed with graceful degradation for the incompleteness (or entire disappearance) of the mapping table.

Dec 7 2020, 9:30 PM · Storage manager
zack added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

Thanks, this would be an important new feature. Some comments/random thoughts below.

Dec 7 2020, 9:08 PM · Storage manager
douardda raised the priority of T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders from Normal to High.
Dec 7 2020, 10:30 AM · Storage manager
acezar added a revision to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders: D4649: HgLoaderFromDisk: Only load new commits.
Dec 7 2020, 9:49 AM · Storage manager

Dec 3 2020

marmoute added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.
In T2849#53968, @olasd wrote:

scope: mercurial will only need the mapping for revisions;

Mercurial could also use that mapping for every object. However, just keeping that mapping for revision is already providing a very large complexity boost (or reduction I should says) so that is "good enough" for us. I strongly suspect the same will apply to baazar.

Ack. I think having this for all objects in the short term would make it "too large", and having it for "objects with a history where computing the swhid is (deeply) recursive and expensive" (i.e. revisions and releases) is a decent balance.

Dec 3 2020, 1:48 PM · Storage manager
olasd added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

scope: mercurial will only need the mapping for revisions;

Mercurial could also use that mapping for every object. However, just keeping that mapping for revision is already providing a very large complexity boost (or reduction I should says) so that is "good enough" for us. I strongly suspect the same will apply to baazar.

Dec 3 2020, 1:33 PM · Storage manager
marmoute added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

scope: mercurial will only need the mapping for revisions;

Dec 3 2020, 11:57 AM · Storage manager
olasd triaged T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders as Normal priority.
Dec 3 2020, 11:14 AM · Storage manager

Dec 2 2020

vsellier added a comment to T2761: Install webapp counters in the staging webapp/storage.

After T2828, It's more clear of what must be deployed to have the counters working on staging:

  • the counters can be intialized via the /stat/refresh endpoint of the storage api (Note: It will create more counters than production as directory_entry_* and revision_history are not counted in production)
  • Add a script/service to execute the `swh_update_counter_bucketed` in an infinite loop
  • Create the buckets in the object_counts_bucketed
    • per object type : identifier|bucket_start|bucket_end. value and last_update will be updated be the stored procedures.
  • configure prometheus sql exporter for db1.staging [1]
  • configure profile_exporter on pergamon
    • Update the script to ensure the data are filtered by environments (to avoid staging data to be included in production counts [2])
    • Configure a new cron
      • loading an empty file for historical data
      • creating a new export_file
  • update webapp to be able to configure the counter origin
Dec 2 2020, 9:55 AM · Storage manager, Web app, Staging environment
ardumont raised the priority of T2761: Install webapp counters in the staging webapp/storage from Low to Normal.
Dec 2 2020, 9:41 AM · Storage manager, Web app, Staging environment
ardumont updated the task description for T2761: Install webapp counters in the staging webapp/storage.
Dec 2 2020, 9:40 AM · Storage manager, Web app, Staging environment

Nov 23 2020

vlorentz added a parent task for T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects: T2513: Copy metadata on revisions to the extrinsic metadata storage.
Nov 23 2020, 12:00 PM · Data Model, Storage manager, Extrinsic metadata

Nov 16 2020

ardumont updated the task description for T2761: Install webapp counters in the staging webapp/storage.
Nov 16 2020, 10:46 AM · Storage manager, Web app, Staging environment

Nov 9 2020

ardumont renamed T2761: Install webapp counters in the staging webapp/storage from Install webapp counter in the staging webapp/storage to Install webapp counters in the staging webapp/storage.
Nov 9 2020, 10:32 AM · Storage manager, Web app, Staging environment

Nov 6 2020

ardumont triaged T2761: Install webapp counters in the staging webapp/storage as Low priority.
Nov 6 2020, 3:40 PM · Storage manager, Web app, Staging environment
anlambert closed T2734: Investigate timeout issues when querying snapshot data as Resolved by committing rDWAPPS3c59879e2fe6: browse/snapshot_context: Put processed snapshot sizes in cache.
Nov 6 2020, 11:26 AM · Storage manager, Web app

Nov 3 2020

ardumont moved T871: Migrate swh-storage api functions relative to indexers to swh-indexer from Backlog to Archived on the SWORD deposit board.
Nov 3 2020, 4:07 PM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont moved T872: Deploy and restart indexers from Backlog to Archived on the SWORD deposit board.
Nov 3 2020, 4:07 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont moved T1533: Make sure api server uses explicit configurations from Backlog to Archived on the SWORD deposit board.
Nov 3 2020, 4:04 PM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault
moranegg moved T2415: origin_metadata_add fails to write properly origin_metadata from Deployed to Archived on the SWORD deposit board.
Nov 3 2020, 10:08 AM · SWORD deposit, Storage manager
moranegg moved T2415: origin_metadata_add fails to write properly origin_metadata from Backlog to Deployed on the SWORD deposit board.
Nov 3 2020, 10:07 AM · SWORD deposit, Storage manager

Nov 2 2020

anlambert added a revision to T2734: Investigate timeout issues when querying snapshot data: D4393: algos/snapshot: Add function to resolve branch alias to real target.
Nov 2 2020, 5:57 PM · Storage manager, Web app
vlorentz closed T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases as Resolved.
Nov 2 2020, 12:23 PM · Package Loader, Storage manager, Extrinsic metadata

Oct 27 2020

anlambert added a revision to T2734: Investigate timeout issues when querying snapshot data: D4369: algos/snapshot: Add function to resolve branch alias to real target.
Oct 27 2020, 4:48 PM · Storage manager, Web app

Oct 26 2020

anlambert added a revision to T2734: Investigate timeout issues when querying snapshot data: D4356: common/archive: Avoid db timeouts in lookup_snapshot_sizes.
Oct 26 2020, 6:43 PM · Storage manager, Web app
anlambert added a comment to T2734: Investigate timeout issues when querying snapshot data.

I have generated a flame graph for the django view rendering a directory in a snapshot context and we can clearly see that the lookup_snapshot_sizes function
puts quite a burden on the overall process time.

Oct 26 2020, 4:42 PM · Storage manager, Web app
ardumont added a comment to T2548: Restore CRAN visits deleted in january 2020 from backups.

And now it's here

;)

Oct 26 2020, 3:27 PM · Storage manager, System administration, Origin-CRAN
olasd added a comment to T2734: Investigate timeout issues when querying snapshot data.
09:16:44          +olasd ╡ in the storage backend, there's a provision to filter snapshot branches by target type; these queries aren't indexable properly because the table containing which branches have which type is completely separate from the join table between snapshots and branches
09:17:11               ⤷ ╡ so really this filter should be removed altogether, or rewritten to force reading the full list of branches of the snapshot first (indexed query) then filtering out the results
09:17:43               ⤷ ╡ (I'm not sure when this was introduced or why; it's much safer to do this filtering client-side)
09:19:49               ⤷ ╡ what happened starting friday is that postgresql decided to use the deduplication index on snapshot branches to try to do the filtering of branches by type; so it's basically pulling all branches of type aliases, and matching that to the list of branches of the snapshot we're trying to
look at
09:20:13               ⤷ ╡ I guess the amount of alias branches passed a threshold that made the planner change its course
09:22:45               ⤷ ╡ (or the amount of queries for snapshots was low enough that the table got kicked off the cache somehow)
Oct 26 2020, 12:14 PM · Storage manager, Web app
anlambert triaged T2734: Investigate timeout issues when querying snapshot data as High priority.
Oct 26 2020, 12:11 PM · Storage manager, Web app
olasd added a comment to T2548: Restore CRAN visits deleted in january 2020 from backups.

We don't have database backups going back 10 months (nor 8 months, which was the horizon we would have needed at the time the task was submitted).

Oct 26 2020, 11:44 AM · Storage manager, System administration, Origin-CRAN
olasd added a comment to T2549: Restore Mercurial visits deleted in august 2018 from backups.

We don't have easily accessible backups going back to (before) August 2018, so I don't think we'll be able to recover this data.

Oct 26 2020, 11:37 AM · Storage manager, System administration, Mercurial loader

Oct 23 2020

vlorentz added a revision to T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases: D4349: migrate_extrinsic_metadata: Write metadata on directories instead of revisions..
Oct 23 2020, 5:26 PM · Package Loader, Storage manager, Extrinsic metadata
olasd added a revision to T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects: D4348: Rename the RawExtrinsicMetadata id field to target.
Oct 23 2020, 5:20 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz added revisions to T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases: D4346: package loaders: write extrinsic metadata to directories instead of revisions., D4347: package loaders: write original_artifact metadata to directories instead of revisions..
Oct 23 2020, 5:01 PM · Package Loader, Storage manager, Extrinsic metadata

Oct 19 2020

olasd added a revision to T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects: D4307: Update the HashableObject interface to take the object itself.
Oct 19 2020, 4:26 PM · Data Model, Storage manager, Extrinsic metadata
olasd added a revision to T2704: Use a hash as id/ unicity key for MetadataFetcher and MetadataAuthority: D4307: Update the HashableObject interface to take the object itself.
Oct 19 2020, 4:26 PM · Data Model, Storage manager, Extrinsic metadata

Oct 14 2020

olasd added a comment to T2704: Use a hash as id/ unicity key for MetadataFetcher and MetadataAuthority.

This line of reasoning makes sense to me.

Oct 14 2020, 3:03 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz removed a parent task for T2686: Use hashes for all kafka keys: T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.
Oct 14 2020, 2:08 PM · Data Model, Storage manager
vlorentz removed a subtask for T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases: T2686: Use hashes for all kafka keys.
Oct 14 2020, 2:08 PM · Package Loader, Storage manager, Extrinsic metadata
vlorentz added a parent task for T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects: T2686: Use hashes for all kafka keys.
Oct 14 2020, 2:08 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz added a parent task for T2704: Use a hash as id/ unicity key for MetadataFetcher and MetadataAuthority: T2686: Use hashes for all kafka keys.
Oct 14 2020, 2:08 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz added subtasks for T2686: Use hashes for all kafka keys: T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects, T2704: Use a hash as id/ unicity key for MetadataFetcher and MetadataAuthority.
Oct 14 2020, 2:08 PM · Data Model, Storage manager
vlorentz triaged T2704: Use a hash as id/ unicity key for MetadataFetcher and MetadataAuthority as High priority.
Oct 14 2020, 2:07 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz edited projects for T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects, added: Data Model; removed Package Loader.
Oct 14 2020, 2:01 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz renamed T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects from Use intrinsic identifiers for RawExtrinsicMetadata objects to Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects.
Oct 14 2020, 2:01 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz triaged T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects as High priority.
Oct 14 2020, 2:01 PM · Data Model, Storage manager, Extrinsic metadata
olasd closed T829: Remove duplication between fetch_history and origin_visit as Resolved.

The fetch_history table is gone since swh.storage v0.0.155 / swh-storage schema v141.

Oct 14 2020, 12:32 PM · Storage manager, Archive content

Oct 13 2020

ardumont updated the task description for T2686: Use hashes for all kafka keys.
Oct 13 2020, 8:51 AM · Data Model, Storage manager

Oct 12 2020

vlorentz added a comment to T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.

FTR, olasd, douardda and I discussed an inconsistency in keys used in kafka, and decided to use hashes for all origin/visits/visit statuses; and doing the same for ext metadata in both kafka and the DB solves the issue about defining unicity.

Oct 12 2020, 1:52 PM · Package Loader, Storage manager, Extrinsic metadata
vlorentz updated the task description for T2686: Use hashes for all kafka keys.
Oct 12 2020, 1:07 PM · Data Model, Storage manager
vlorentz updated the task description for T2686: Use hashes for all kafka keys.
Oct 12 2020, 1:06 PM · Data Model, Storage manager
vlorentz added a subtask for T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases: T2686: Use hashes for all kafka keys.
Oct 12 2020, 1:06 PM · Package Loader, Storage manager, Extrinsic metadata