Page MenuHomeSoftware Heritage

Origin-GitHubTag
ActivePublic

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

Projects related to GitHub

Recent Activity

Fri, Nov 4

olasd added a comment to T4219: Investigate why GitHub fork detection did not bring a speed-up.

swh.loader.git 2.1.0 has now been deployed on all workers.

Fri, Nov 4, 9:25 PM · Origin-GitHub, Git loader

Thu, Nov 3

olasd added a revision to T4219: Investigate why GitHub fork detection did not bring a speed-up: D8808: Eagerly populate the set of local heads in RepoRepresentation.__init__.
Thu, Nov 3, 5:28 PM · Origin-GitHub, Git loader

Oct 19 2022

gitlab-migration changed the status of T4242: Deployed loader.git v1.8, a subtask of T4219: Investigate why GitHub fork detection did not bring a speed-up, from Resolved to Migrated.
Oct 19 2022, 6:06 PM · Origin-GitHub, Git loader
gitlab-migration changed the status of T4225: Deploy a more recent version of prometheus-statsd-exporter on all nodes, a subtask of T4219: Investigate why GitHub fork detection did not bring a speed-up, from Resolved to Migrated.
Oct 19 2022, 6:06 PM · Origin-GitHub, Git loader
gitlab-migration changed the status of T4206: prod: Deploy metadata loader v0.0.2, a subtask of T3273: Use "fork" relationships to speed-up initial load of large repositories, from Resolved to Migrated.
Oct 19 2022, 6:06 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
gitlab-migration changed the status of T4206: prod: Deploy metadata loader v0.0.2, a subtask of T1740: fetch extrinsic origin metadata from GitHub, from Resolved to Migrated.
Oct 19 2022, 6:06 PM · Metadata workflow, Origin-GitHub
gitlab-migration changed the status of T4193: staging: Deploy metadata loader, a subtask of T1740: fetch extrinsic origin metadata from GitHub, from Resolved to Migrated.
Oct 19 2022, 6:06 PM · Metadata workflow, Origin-GitHub

Jun 21 2022

vlorentz triaged T4344: Many NotFound repositories on GitHub since 2022-06-15 or 2022-06-16 as Normal priority.
Jun 21 2022, 10:08 AM · Origin-GitHub

May 30 2022

vlorentz added a parent task for T3273: Use "fork" relationships to speed-up initial load of large repositories: T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.
May 30 2022, 3:41 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

May 20 2022

vlorentz added revisions to T4219: Investigate why GitHub fork detection did not bring a speed-up: D7873: Add an unweighted average for filtered_objects + fix existing metric name, D7876: Log summary of filtered objects in store_data.
May 20 2022, 3:54 PM · Origin-GitHub, Git loader
vlorentz added a revision to T4219: Investigate why GitHub fork detection did not bring a speed-up: D7871: Add metrics in store_data on ratios of objects already stored.
May 20 2022, 1:48 PM · Origin-GitHub, Git loader
vlorentz added a comment to T4219: Investigate why GitHub fork detection did not bring a speed-up.

I did some profiling early this week, and found that when incrementally loading a linux fork we already visited:

May 20 2022, 10:55 AM · Origin-GitHub, Git loader

May 16 2022

vlorentz added a comment to T4219: Investigate why GitHub fork detection did not bring a speed-up.

This indicates we should load incrementally from the last snapshot of the origin AND the last snapshot of its parent, so we would capture these new commits without reloading half of the parent's history. As @olasd puts it, "that's a (very) lightweight way of doing global deduplication".

May 16 2022, 3:33 PM · Origin-GitHub, Git loader

May 13 2022

ardumont added a subtask for T4219: Investigate why GitHub fork detection did not bring a speed-up: T4242: Deployed loader.git v1.8.
May 13 2022, 6:01 PM · Origin-GitHub, Git loader
olasd closed T4225: Deploy a more recent version of prometheus-statsd-exporter on all nodes, a subtask of T4219: Investigate why GitHub fork detection did not bring a speed-up, as Resolved.
May 13 2022, 4:20 PM · Origin-GitHub, Git loader
vlorentz added a revision to T3273: Use "fork" relationships to speed-up initial load of large repositories: D7831: Use all base snapshots in determine_wants().
May 13 2022, 3:23 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
vlorentz added a revision to T4219: Investigate why GitHub fork detection did not bring a speed-up: D7831: Use all base snapshots in determine_wants().
May 13 2022, 3:23 PM · Origin-GitHub, Git loader
vlorentz updated subscribers of T4219: Investigate why GitHub fork detection did not bring a speed-up.

https://grafana.softwareheritage.org/d/FqGC4zu7z/vlorentz-loader-metrics?orgId=1&var-environment=production&var-interval=1h&var-visit_type=git&var-has_parent_origins=True shows we spend a considerable amount of time loading data from git repositories with an existing visit + a parent:

May 13 2022, 3:21 PM · Origin-GitHub, Git loader

May 10 2022

vlorentz added a comment to T3273: Use "fork" relationships to speed-up initial load of large repositories.

Currently can't do it on GitLab while logged out: https://gitlab.com/gitlab-org/gitlab/-/issues/361952

May 10 2022, 4:13 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

May 6 2022

olasd changed the status of T4225: Deploy a more recent version of prometheus-statsd-exporter on all nodes, a subtask of T4219: Investigate why GitHub fork detection did not bring a speed-up, from Open to Work in Progress.
May 6 2022, 5:00 PM · Origin-GitHub, Git loader

May 3 2022

vlorentz removed a subtask for T3273: Use "fork" relationships to speed-up initial load of large repositories: T2202: Collect extrinsic metadata.
May 3 2022, 11:16 AM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
vlorentz added a parent task for T1740: fetch extrinsic origin metadata from GitHub: T3273: Use "fork" relationships to speed-up initial load of large repositories.
May 3 2022, 11:16 AM · Metadata workflow, Origin-GitHub
vlorentz added subtasks for T3273: Use "fork" relationships to speed-up initial load of large repositories: T1740: fetch extrinsic origin metadata from GitHub, T2202: Collect extrinsic metadata.
May 3 2022, 11:16 AM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
vlorentz added a subtask for T3273: Use "fork" relationships to speed-up initial load of large repositories: T4219: Investigate why GitHub fork detection did not bring a speed-up.
May 3 2022, 11:15 AM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
vlorentz added a parent task for T4219: Investigate why GitHub fork detection did not bring a speed-up: T3273: Use "fork" relationships to speed-up initial load of large repositories.
May 3 2022, 11:15 AM · Origin-GitHub, Git loader
vlorentz closed T1740: fetch extrinsic origin metadata from GitHub as Resolved.
May 3 2022, 11:08 AM · Metadata workflow, Origin-GitHub
vlorentz closed T4186: Allow loaders to fetch extrinsic metadata, a subtask of T1740: fetch extrinsic origin metadata from GitHub, as Resolved.
May 3 2022, 11:07 AM · Metadata workflow, Origin-GitHub

May 2 2022

vlorentz added revisions to T4219: Investigate why GitHub fork detection did not bring a speed-up: D7726: loader.core: Add statsd timing metrics, D7727: loader.core: Add statsd metrics on collected metadata.
May 2 2022, 3:29 PM · Origin-GitHub, Git loader
vlorentz triaged T4219: Investigate why GitHub fork detection did not bring a speed-up as Normal priority.
May 2 2022, 3:29 PM · Origin-GitHub, Git loader

Apr 29 2022

olasd closed T3544: Deal with GitHub removing support for git:// URLs, a subtask of T2207: Improve ingestion efficiency , as Resolved.
Apr 29 2022, 4:11 PM · Origin-GitLab, Origin-GitHub, Roadmap 2020
olasd closed T3544: Deal with GitHub removing support for git:// URLs as Resolved.

I'm closing this. I've submitted T4216 to track the actual packfile limit issue.

Apr 29 2022, 4:11 PM · Origin-GitHub, Git loader
ardumont closed T4206: prod: Deploy metadata loader v0.0.2, a subtask of T1740: fetch extrinsic origin metadata from GitHub, as Resolved.
Apr 29 2022, 11:27 AM · Metadata workflow, Origin-GitHub
ardumont closed T4206: prod: Deploy metadata loader v0.0.2, a subtask of T3273: Use "fork" relationships to speed-up initial load of large repositories, as Resolved.
Apr 29 2022, 11:27 AM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

Apr 28 2022

ardumont changed the status of T4206: prod: Deploy metadata loader v0.0.2, a subtask of T1740: fetch extrinsic origin metadata from GitHub, from Open to Work in Progress.
Apr 28 2022, 3:43 PM · Metadata workflow, Origin-GitHub
ardumont changed the status of T4206: prod: Deploy metadata loader v0.0.2, a subtask of T3273: Use "fork" relationships to speed-up initial load of large repositories, from Open to Work in Progress.
Apr 28 2022, 3:43 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
vlorentz edited projects for T3273: Use "fork" relationships to speed-up initial load of large repositories, added: Origin-GitHub; removed GitHub lister.
Apr 28 2022, 3:27 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

Apr 27 2022

ardumont closed T4193: staging: Deploy metadata loader, a subtask of T1740: fetch extrinsic origin metadata from GitHub, as Resolved.
Apr 27 2022, 4:56 PM · Metadata workflow, Origin-GitHub

Apr 26 2022

ardumont changed the status of T4193: staging: Deploy metadata loader, a subtask of T1740: fetch extrinsic origin metadata from GitHub, from Open to Work in Progress.
Apr 26 2022, 3:33 PM · Metadata workflow, Origin-GitHub

Apr 22 2022

vlorentz added a subtask for T1740: fetch extrinsic origin metadata from GitHub: T4193: staging: Deploy metadata loader.
Apr 22 2022, 3:52 PM · Metadata workflow, Origin-GitHub
vlorentz added a revision to T1740: fetch extrinsic origin metadata from GitHub: D7633: Initialize base metadata fetcher + github.
Apr 22 2022, 3:52 PM · Metadata workflow, Origin-GitHub

Apr 21 2022

vlorentz removed a subtask for T1740: fetch extrinsic origin metadata from GitHub: T3859: investigate using metadata from GHTorrent.
Apr 21 2022, 8:39 PM · Metadata workflow, Origin-GitHub
vlorentz added revisions to T1740: fetch extrinsic origin metadata from GitHub: D7629: github: Remove dead code, D7630: github: Refactor rate-limiting out of the GitHubLister class.
Apr 21 2022, 8:36 PM · Metadata workflow, Origin-GitHub
vlorentz claimed T1740: fetch extrinsic origin metadata from GitHub.
Apr 21 2022, 9:02 AM · Metadata workflow, Origin-GitHub
vlorentz closed T1739: Define an architecture to fetch extrinsic metadata outside listers and loaders, a subtask of T1740: fetch extrinsic origin metadata from GitHub, as Resolved.
Apr 21 2022, 9:00 AM · Metadata workflow, Origin-GitHub

Apr 19 2022

vlorentz updated the task description for T3542: Decide what metadata we want to / can collect from GitHub.
Apr 19 2022, 12:10 PM · Origin-GitHub, Extrinsic metadata
vlorentz added a comment to T3542: Decide what metadata we want to / can collect from GitHub.

In summary, we would archive everything with priority "high" or "mid", as well as the "license" and "main language" fields, as they are all easy to fetch and store

Apr 19 2022, 11:18 AM · Origin-GitHub, Extrinsic metadata

Apr 11 2022

bchauvet lowered the priority of T3544: Deal with GitHub removing support for git:// URLs from High to Normal.
Apr 11 2022, 11:57 AM · Origin-GitHub, Git loader
bchauvet added a comment to T3544: Deal with GitHub removing support for git:// URLs.

dealt with (at least in terms of only using https for clones)

Apr 11 2022, 11:56 AM · Origin-GitHub, Git loader
bchauvet renamed T2207: Improve ingestion efficiency from 0 lag to Improve ingestion efficiency .
Apr 11 2022, 11:45 AM · Origin-GitLab, Origin-GitHub, Roadmap 2020
bchauvet added a parent task for T2207: Improve ingestion efficiency : T4080: Minimize archival lag w.r.t. upstream code hosting platforms.
Apr 11 2022, 11:44 AM · Origin-GitLab, Origin-GitHub, Roadmap 2020