Page MenuHomeSoftware Heritage

Storage managerFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Details

Recent Activity

Fri, Jan 15

ardumont moved T2966: Backfill origin_visit_status **with** the `visit_type` field properly given from in-progress to code review on the Sprint 2021 01 board.
Fri, Jan 15, 2:55 PM · Storage manager, Sprint 2021 01, Scheduling utilities
vsellier added a revision to T2966: Backfill origin_visit_status **with** the `visit_type` field properly given: D4871: Backfiller: Add type to the origin_visit_status topic.
Fri, Jan 15, 2:41 PM · Storage manager, Sprint 2021 01, Scheduling utilities
vsellier changed the status of T2966: Backfill origin_visit_status **with** the `visit_type` field properly given from Open to Work in Progress.
Fri, Jan 15, 2:40 PM · Storage manager, Sprint 2021 01, Scheduling utilities
vsellier closed T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type as Resolved.
Fri, Jan 15, 2:01 PM · Storage manager, Sprint 2021 01

Thu, Jan 14

ardumont moved T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic from in-progress to done on the Sprint 2021 01 board.
Thu, Jan 14, 2:54 PM · Storage manager, Sprint 2021 01

Wed, Jan 13

vsellier closed T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic as Resolved.

Implemented in T2964

Wed, Jan 13, 4:51 PM · Storage manager, Sprint 2021 01
vsellier moved T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from in-progress to code review on the Sprint 2021 01 board.
Wed, Jan 13, 4:50 PM · Storage manager, Sprint 2021 01
vsellier added a revision to T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type: D4858: Add persistence of the field OriginVisitStatus.type.
Wed, Jan 13, 4:47 PM · Storage manager, Sprint 2021 01
vsellier added a revision to T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic: D4857: Add new field OriginVisitStatus.type field on test data.
Wed, Jan 13, 4:22 PM · Storage manager, Sprint 2021 01
douardda claimed T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type.
Wed, Jan 13, 11:39 AM · Storage manager, Sprint 2021 01
douardda moved T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from todo to in-progress on the Sprint 2021 01 board.
Wed, Jan 13, 11:39 AM · Storage manager, Sprint 2021 01
vsellier moved T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from in-progress to todo on the Sprint 2021 01 board.
Wed, Jan 13, 11:39 AM · Storage manager, Sprint 2021 01
douardda changed the status of T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from Open to Work in Progress.
Wed, Jan 13, 11:39 AM · Storage manager, Sprint 2021 01
ardumont changed the status of T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic from Open to Work in Progress.
Wed, Jan 13, 11:36 AM · Storage manager, Sprint 2021 01
vsellier claimed T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic.
Wed, Jan 13, 11:34 AM · Storage manager, Sprint 2021 01
ardumont triaged T2968: Migrate origin_visit_status records to add the type value as Normal priority.
Wed, Jan 13, 9:58 AM · Storage manager
ardumont removed a project from T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type: Scheduling utilities.
Wed, Jan 13, 9:54 AM · Storage manager, Sprint 2021 01
ardumont removed a project from T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic: Scheduling utilities.
Wed, Jan 13, 9:52 AM · Storage manager, Sprint 2021 01
ardumont renamed T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic from Adapt storage to actually write correctly the type in the origin_visit_status topics to Adapt storage to actually write the visit_type in the origin_visit_status topic.
Wed, Jan 13, 9:52 AM · Storage manager, Sprint 2021 01
ardumont moved T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from Backlog to todo on the Sprint 2021 01 board.
Wed, Jan 13, 9:51 AM · Storage manager, Sprint 2021 01
ardumont moved T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic from Backlog to todo on the Sprint 2021 01 board.
Wed, Jan 13, 9:51 AM · Storage manager, Sprint 2021 01
ardumont moved T2966: Backfill origin_visit_status **with** the `visit_type` field properly given from Backlog to todo on the Sprint 2021 01 board.
Wed, Jan 13, 9:51 AM · Storage manager, Sprint 2021 01, Scheduling utilities
ardumont updated the task description for T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic.
Wed, Jan 13, 9:47 AM · Storage manager, Sprint 2021 01
ardumont triaged T2966: Backfill origin_visit_status **with** the `visit_type` field properly given as High priority.
Wed, Jan 13, 9:42 AM · Storage manager, Sprint 2021 01, Scheduling utilities
ardumont triaged T2965: Adapt storage to actually write the visit_type in the origin_visit_status topic as High priority.
Wed, Jan 13, 9:41 AM · Storage manager, Sprint 2021 01
ardumont placed T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type up for grabs.
Wed, Jan 13, 9:39 AM · Storage manager, Sprint 2021 01
ardumont triaged T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type as High priority.
Wed, Jan 13, 9:39 AM · Storage manager, Sprint 2021 01
vsellier closed T2905: Deploy swh-search for production, a subtask of T2182: Switch production swh-web to use swh-search instead of postgresql search., as Resolved.
Wed, Jan 13, 9:23 AM · Archive search, Storage manager

Mon, Jan 11

vlorentz claimed T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects.
Mon, Jan 11, 4:27 PM · Data Model, Storage manager, Extrinsic metadata

Thu, Jan 7

olasd added a comment to T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects.

Sounds good to me.

Thu, Jan 7, 8:18 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz closed T2497: Create an ElasticSearch cluster tuned for origin/metadata search, a subtask of T2182: Switch production swh-web to use swh-search instead of postgresql search., as Resolved.
Thu, Jan 7, 2:13 PM · Archive search, Storage manager
vlorentz added a parent task for T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects: T2779: Put information (client, collection and deposit-id) inside metadata for metada-only deposit.
Thu, Jan 7, 1:53 PM · Data Model, Storage manager, Extrinsic metadata

Wed, Jan 6

ardumont moved T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161) from Backlog to deployed on the System administration board.
Wed, Jan 6, 3:46 PM · System administration, Storage manager
olasd added a subtask for T2182: Switch production swh-web to use swh-search instead of postgresql search.: T2905: Deploy swh-search for production.
Wed, Jan 6, 11:15 AM · Archive search, Storage manager

Tue, Jan 5

douardda added a revision to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders: D4807: [WIP] Add an ExtID object.
Tue, Jan 5, 5:17 PM · Storage manager
vlorentz added a comment to T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects.

Proposed manifest format:

Tue, Jan 5, 3:21 PM · Data Model, Storage manager, Extrinsic metadata

Dec 9 2020

douardda added a revision to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders: D4698: [WIP][RFC] Add support for ExtID in the storage.
Dec 9 2020, 5:13 PM · Storage manager
douardda added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

Question: who should be responsible for filling this table? The loader or the storage (as side effect of revision_add)?

Dec 9 2020, 9:21 AM · Storage manager

Dec 8 2020

marmoute added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.
In T2849#54256, @olasd wrote:
In T2849#54254, @zack wrote:

Thanks, this would be an important new feature. Some comments/random thoughts below.

Further design point, which I think it's already implicit in what you wrote, but that it'd be useful to make explicit:

  • this new mapping table is needed "only" to speed up things, if we lose it, it will just mean we will be slower in doing future archival (at least for a while), but won't be the end of the world

assuming this is true, loaders will need to be designed with graceful degradation for the incompleteness (or entire disappearance) of the mapping table.

Agreed. Missing objects in the mapping table would just mean that the loader does a bit more work to re-create the associated objects, and, hopefully, generate the same SWHIDs again (which would then be used to populate the mapping table).

Dec 8 2020, 3:28 PM · Storage manager
douardda added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

Ok so the plan is a first step as simple as possible, implementing what @olasd proposed in the task, put this table in the storage, and provide a simple batch get API endpoint.

Dec 8 2020, 3:23 PM · Storage manager

Dec 7 2020

olasd added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.
In T2849#54254, @zack wrote:

Thanks, this would be an important new feature. Some comments/random thoughts below.

Further design point, which I think it's already implicit in what you wrote, but that it'd be useful to make explicit:

  • this new mapping table is needed "only" to speed up things, if we lose it, it will just mean we will be slower in doing future archival (at least for a while), but won't be the end of the world

assuming this is true, loaders will need to be designed with graceful degradation for the incompleteness (or entire disappearance) of the mapping table.

Dec 7 2020, 9:30 PM · Storage manager
zack added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

Thanks, this would be an important new feature. Some comments/random thoughts below.

Dec 7 2020, 9:08 PM · Storage manager
douardda raised the priority of T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders from Normal to High.
Dec 7 2020, 10:30 AM · Storage manager
acezar added a revision to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders: D4649: HgLoaderFromDisk: Only load new commits.
Dec 7 2020, 9:49 AM · Storage manager

Dec 3 2020

marmoute added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.
In T2849#53968, @olasd wrote:

scope: mercurial will only need the mapping for revisions;

Mercurial could also use that mapping for every object. However, just keeping that mapping for revision is already providing a very large complexity boost (or reduction I should says) so that is "good enough" for us. I strongly suspect the same will apply to baazar.

Ack. I think having this for all objects in the short term would make it "too large", and having it for "objects with a history where computing the swhid is (deeply) recursive and expensive" (i.e. revisions and releases) is a decent balance.

Dec 3 2020, 1:48 PM · Storage manager
olasd added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

scope: mercurial will only need the mapping for revisions;

Mercurial could also use that mapping for every object. However, just keeping that mapping for revision is already providing a very large complexity boost (or reduction I should says) so that is "good enough" for us. I strongly suspect the same will apply to baazar.

Dec 3 2020, 1:33 PM · Storage manager
marmoute added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

scope: mercurial will only need the mapping for revisions;

Dec 3 2020, 11:57 AM · Storage manager
olasd triaged T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders as Normal priority.
Dec 3 2020, 11:14 AM · Storage manager

Dec 2 2020

vsellier added a comment to T2761: Install webapp counters in the staging webapp/storage.

After T2828, It's more clear of what must be deployed to have the counters working on staging:

  • the counters can be intialized via the /stat/refresh endpoint of the storage api (Note: It will create more counters than production as directory_entry_* and revision_history are not counted in production)
  • Add a script/service to execute the `swh_update_counter_bucketed` in an infinite loop
  • Create the buckets in the object_counts_bucketed
    • per object type : identifier|bucket_start|bucket_end. value and last_update will be updated be the stored procedures.
  • configure prometheus sql exporter for db1.staging [1]
  • configure profile_exporter on pergamon
    • Update the script to ensure the data are filtered by environments (to avoid staging data to be included in production counts [2])
    • Configure a new cron
      • loading an empty file for historical data
      • creating a new export_file
  • update webapp to be able to configure the counter origin
Dec 2 2020, 9:55 AM · Storage manager, Web app, Staging environment
ardumont raised the priority of T2761: Install webapp counters in the staging webapp/storage from Low to Normal.
Dec 2 2020, 9:41 AM · Storage manager, Web app, Staging environment