Page MenuHomeSoftware Heritage

Git loaderFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Details

Recent Activity

Mon, Nov 15

vlorentz updated the task description for T3544: Deal with GitHub removing support for git:// URLs.
Mon, Nov 15, 3:14 PM · Origin-GitHub, Git loader

Nov 4 2021

olasd added a comment to T3627: Consider dropping pull request references from the git loader ingestion.
In T3627#73323, @zack wrote:

Thanks for the summaries @olasd, both here and on list.
I've followed up on list.

Meanwhile here's what I propose we do (spoiler!):

a) A4: add to the archive Merkle DAG only the filtered snapshot (referencing "intrinsic" branches only, as per A2) and its transitive closure

Nov 4 2021, 12:11 PM · Git loader

Oct 31 2021

zack added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

Thanks for the summaries @olasd, both here and on list.
I've followed up on list.

Oct 31 2021, 4:11 PM · Git loader

Oct 26 2021

ardumont updated the title for P1208 sampled origins: Patch (drop PR branches) or no patch (current version), failure to ingest with huge packfile from Patch (drop PR branches) or no patch (current version), ingestion fails to sampled origins: Patch (drop PR branches) or no patch (current version), failure to ingest with huge packfile.
Oct 26 2021, 12:12 PM · Git loader
ardumont added a comment to P1208 sampled origins: Patch (drop PR branches) or no patch (current version), failure to ingest with huge packfile.

Related to T3627

Oct 26 2021, 12:11 PM · Git loader
ardumont edited P1208 sampled origins: Patch (drop PR branches) or no patch (current version), failure to ingest with huge packfile.
Oct 26 2021, 12:07 PM · Git loader
ardumont added a revision to T3627: Consider dropping pull request references from the git loader ingestion: D6550: wip: Log full pack size and check pack file limit after log instruction.
Oct 26 2021, 10:50 AM · Git loader

Oct 25 2021

ardumont created P1208 sampled origins: Patch (drop PR branches) or no patch (current version), failure to ingest with huge packfile.
Oct 25 2021, 9:46 PM · Git loader
ardumont added a revision to T3627: Consider dropping pull request references from the git loader ingestion: D6548: Instantiate a noop objstorage for testing purposes.
Oct 25 2021, 5:25 PM · Git loader

Oct 19 2021

olasd added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

Sent a summary of this discussion to the swh-devel list for input:

Oct 19 2021, 11:36 AM · Git loader

Oct 18 2021

olasd added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

B3 I am not convinced a "synthetic" flag on the Snapshot branch makes sense, or at least I find this name confusing, especially considering we already have a synthetic flag on Revision: it's not synthetic in the sense of it's not object crafted by SWH, it comes from the origin.

Oct 18 2021, 4:42 PM · Git loader
douardda added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

B3 I am not convinced a "synthetic" flag on the Snapshot branch makes sense, or at least I find this name confusing, especially considering we already have a synthetic flag on Revision: it's not synthetic in the sense of it's not object crafted by SWH, it comes from the origin.

Oct 18 2021, 11:59 AM · Git loader
olasd added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

I would like us to conclude this discussion soon.

Oct 18 2021, 11:29 AM · Git loader

Oct 15 2021

ardumont added a comment to T3635: git loader: enable "partial" global deduplication of revisions via the extid mapping table.

Now, I still don't understand what mapping is to be stored in the extid table. What is
meaning of (version 0, sha1-git of the commit/tag, revision/release id) above? (I
expect a mapping to be a couple).

Oct 15 2021, 6:25 PM · Git loader

Oct 14 2021

olasd added a comment to T3635: git loader: enable "partial" global deduplication of revisions via the extid mapping table.

Then I don't really get how this can help if we don't load revisions in topological order.

Oct 14 2021, 11:54 AM · Git loader
olasd updated the task description for T3655: loader git: enable global deduplication of head branches before fetching them.
Oct 14 2021, 11:41 AM · Git loader
douardda added a comment to T3635: git loader: enable "partial" global deduplication of revisions via the extid mapping table.

Ok I think what puzzle me in this description is the fact the 2 first bullets of the "git loader adaptations" are actually only one point: at the end of a successful loading, store a mapping in the extid table.

Oct 14 2021, 11:23 AM · Git loader
olasd added a parent task for T3635: git loader: enable "partial" global deduplication of revisions via the extid mapping table: T3655: loader git: enable global deduplication of head branches before fetching them.
Oct 14 2021, 11:18 AM · Git loader
olasd added subtasks for T3655: loader git: enable global deduplication of head branches before fetching them: T3635: git loader: enable "partial" global deduplication of revisions via the extid mapping table, T3654: loader git: load revisions in topological order.
Oct 14 2021, 11:18 AM · Git loader
olasd added a parent task for T3654: loader git: load revisions in topological order: T3655: loader git: enable global deduplication of head branches before fetching them.
Oct 14 2021, 11:18 AM · Git loader
olasd triaged T3655: loader git: enable global deduplication of head branches before fetching them as Normal priority.
Oct 14 2021, 11:18 AM · Git loader
olasd renamed T3635: git loader: enable "partial" global deduplication of revisions via the extid mapping table from Reduce git loader work (use extid mapping table) to git loader: enable "partial" global deduplication of revisions via the extid mapping table.
Oct 14 2021, 11:15 AM · Git loader
olasd added a comment to T3654: loader git: load revisions in topological order.

(I've removed T3653 as parent as this is a somewhat longer term endeavour. Not the topological sorting itself, but making sure that (most) existing revisions aren't dangling, before we can use this topological guarantee)

Oct 14 2021, 11:13 AM · Git loader
olasd removed a parent task for T3654: loader git: load revisions in topological order: T3653: Stabilize loader git.
Oct 14 2021, 11:12 AM · Git loader
olasd removed a subtask for T3653: Stabilize loader git: T3654: loader git: load revisions in topological order.
Oct 14 2021, 11:12 AM · Git loader
olasd triaged T3654: loader git: load revisions in topological order as Low priority.
Oct 14 2021, 11:11 AM · Git loader
ardumont updated the task description for T3653: Stabilize loader git.
Oct 14 2021, 10:46 AM · Git loader
ardumont added a subtask for T3653: Stabilize loader git: T3652: Ingest git loader origins with smaller packfiles.
Oct 14 2021, 10:44 AM · Git loader
ardumont added a parent task for T3652: Ingest git loader origins with smaller packfiles: T3653: Stabilize loader git.
Oct 14 2021, 10:44 AM · Git loader
ardumont updated the task description for T3653: Stabilize loader git.
Oct 14 2021, 10:41 AM · Git loader
ardumont updated the task description for T3653: Stabilize loader git.
Oct 14 2021, 10:39 AM · Git loader
ardumont closed T3625: Reduce git loader memory footprint as Resolved.

Actually deployed and the number of oom actually decreased.

Oct 14 2021, 10:38 AM · Git loader
ardumont closed T3625: Reduce git loader memory footprint, a subtask of T3653: Stabilize loader git, as Resolved.
Oct 14 2021, 10:38 AM · Git loader
ardumont updated the task description for T3653: Stabilize loader git.
Oct 14 2021, 10:38 AM · Git loader
ardumont added a parent task for T3625: Reduce git loader memory footprint: T3653: Stabilize loader git.
Oct 14 2021, 10:37 AM · Git loader
ardumont added a parent task for T3635: git loader: enable "partial" global deduplication of revisions via the extid mapping table: T3653: Stabilize loader git.
Oct 14 2021, 10:37 AM · Git loader
ardumont added subtasks for T3653: Stabilize loader git: T3625: Reduce git loader memory footprint, T3635: git loader: enable "partial" global deduplication of revisions via the extid mapping table, T3640: Make long running task stop fast when warm shutdown is triggered.
Oct 14 2021, 10:37 AM · Git loader
ardumont triaged T3653: Stabilize loader git as Normal priority.
Oct 14 2021, 10:36 AM · Git loader
ardumont triaged T3652: Ingest git loader origins with smaller packfiles as Normal priority.
Oct 14 2021, 10:29 AM · Git loader

Oct 11 2021

ardumont added a comment to T3625: Reduce git loader memory footprint.

Deploy storage v0.38 on worker (proxy buffer/filter adaptations client/loader side).
Restarted all loaders with it.

Oct 11 2021, 6:19 PM · Git loader
anlambert closed T3618: Reschedule loading of dumb git origins submitted to "Save code now" service as Resolved.

Issues related to git dumb loading have been handled and the 5 dumb origins that were failing have been resubmitted through save code now and successfully loaded, closing this.

Oct 11 2021, 1:51 PM · Save Code Now, Archive coverage, Git loader
douardda added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

An alternative to annotating synthetic refs: add a "type" or "forge_type" attribute to snapshots.

Oct 11 2021, 12:33 PM · Git loader

Oct 8 2021

ardumont changed the status of T3625: Reduce git loader memory footprint from Open to Work in Progress.

btw ^

Oct 8 2021, 5:55 PM · Git loader
olasd added a comment to T3625: Reduce git loader memory footprint.
In T3625#71799, @olasd wrote:

While we're at it, we should probably be adding some thresholds in the buffer proxy for:

  • cumulated length of messages for revisions and releases
Oct 8 2021, 4:02 PM · Git loader
olasd added a revision to T3625: Reduce git loader memory footprint: D6445: buffer: add a threshold for the number of revision parents in one batch.
Oct 8 2021, 4:01 PM · Git loader
olasd added a revision to T3625: Reduce git loader memory footprint: D6446: buffer: add a threshold for the estimated size of revision and release batches.
Oct 8 2021, 3:58 PM · Git loader
olasd added a revision to T3625: Reduce git loader memory footprint: D6443: buffer: add a threshold for the number of directory entries in one batch.
Oct 8 2021, 3:06 PM · Git loader
ardumont added a project to T3635: git loader: enable "partial" global deduplication of revisions via the extid mapping table: Git loader.
Oct 8 2021, 2:17 PM · Git loader
ardumont added a comment to T3625: Reduce git loader memory footprint.

I concur with this analysis btw

Oct 8 2021, 2:00 PM · Git loader

Oct 7 2021

vlorentz added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

An alternative to annotating synthetic refs: add a "type" or "forge_type" attribute to snapshots.

Oct 7 2021, 2:10 PM · Git loader