GitLab returns very little data while logged out, so we won't be able to collect much. This seems to differ from their documentation, so I opened a ticket https://gitlab.com/gitlab-org/gitlab/-/issues/361952

Jul 6 2022, 12:41 PM · Extrinsic metadata, Origin-GitLab

Jul 5 2022

vlorentz closed T833: When listing an origin, add origin level metadata to RMD storage as Wontfix.

replaced loader-based metadata loading (T4188 / T4186)

Jul 5 2022, 6:33 PM · Extrinsic metadata, Restricted Project, GitHub lister

vlorentz closed T833: When listing an origin, add origin level metadata to RMD storage, a subtask of T2202: Collect extrinsic metadata, as Wontfix.

Jul 5 2022, 6:33 PM · Roadmap 2022, meta-task, Roadmap 2021, Extrinsic metadata

vlorentz closed T1747: Review APIs to get metadata from supported origins, a subtask of T1739: Define an architecture to fetch extrinsic metadata outside listers and loaders, as Resolved.

Jul 5 2022, 5:28 PM · Extrinsic metadata

vlorentz closed T1747: Review APIs to get metadata from supported origins as Resolved.

Jul 5 2022, 5:28 PM · Extrinsic metadata

vlorentz renamed T4377: Create API endpoint to expose raw extrinsic metadata on origins from Create API endpoint to expose raw extrinsic metadata from forges to Create API endpoint to expose raw extrinsic metadata on origins.

Jul 5 2022, 5:00 PM · Web app, Extrinsic metadata

vlorentz edited projects for T4377: Create API endpoint to expose raw extrinsic metadata on origins, added: Web app; removed Roadmap 2022.

Jul 5 2022, 12:17 PM · Web app, Extrinsic metadata

moranegg triaged T4377: Create API endpoint to expose raw extrinsic metadata on origins as High priority.

Jul 5 2022, 12:13 PM · Web app, Extrinsic metadata

Jun 2 2022

vlorentz moved T2202: Collect extrinsic metadata from Backlog to Work in progress on the Roadmap 2022 board.

Jun 2 2022, 9:57 AM · Roadmap 2022, meta-task, Roadmap 2021, Extrinsic metadata

vlorentz removed a project from T3490: Collect metadata from ClearlyDefined: meta-task.

Jun 2 2022, 9:56 AM · Extrinsic metadata

May 30 2022

vlorentz added a parent task for T3273: Use "fork" relationships to speed-up initial load of large repositories: T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

May 30 2022, 3:41 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

May 17 2022

vlorentz lowered the priority of T4252: Schedule recurring fetches of origin metadata from High to Normal.

May 17 2022, 3:11 PM · Extrinsic metadata

vlorentz updated the task description for T4252: Schedule recurring fetches of origin metadata.

May 17 2022, 3:06 PM · Extrinsic metadata

vlorentz triaged T4252: Schedule recurring fetches of origin metadata as High priority.

May 17 2022, 3:06 PM · Extrinsic metadata

May 13 2022

vlorentz added a revision to T3273: Use "fork" relationships to speed-up initial load of large repositories: D7831: Use all base snapshots in determine_wants().

May 13 2022, 3:23 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

May 10 2022

vlorentz added a comment to T3273: Use "fork" relationships to speed-up initial load of large repositories.

Currently can't do it on GitLab while logged out: https://gitlab.com/gitlab-org/gitlab/-/issues/361952

May 10 2022, 4:13 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

vlorentz placed T3859: investigate using metadata from GHTorrent up for grabs.

May 10 2022, 4:12 PM · Extrinsic metadata

May 3 2022

vlorentz removed a parent task for T2202: Collect extrinsic metadata: T3273: Use "fork" relationships to speed-up initial load of large repositories.

May 3 2022, 11:16 AM · Roadmap 2022, meta-task, Roadmap 2021, Extrinsic metadata

vlorentz removed a subtask for T3273: Use "fork" relationships to speed-up initial load of large repositories: T2202: Collect extrinsic metadata.

May 3 2022, 11:16 AM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

vlorentz added a parent task for T2202: Collect extrinsic metadata: T3273: Use "fork" relationships to speed-up initial load of large repositories.

May 3 2022, 11:16 AM · Roadmap 2022, meta-task, Roadmap 2021, Extrinsic metadata

vlorentz added subtasks for T3273: Use "fork" relationships to speed-up initial load of large repositories: T1740: fetch extrinsic origin metadata from GitHub, T2202: Collect extrinsic metadata.

May 3 2022, 11:16 AM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

vlorentz added a subtask for T3273: Use "fork" relationships to speed-up initial load of large repositories: T4219: Investigate why GitHub fork detection did not bring a speed-up.

May 3 2022, 11:15 AM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

vlorentz placed T3558: Enable the swh-search QL in production up for grabs.

May 3 2022, 11:08 AM · Archive search, System administration, Intrinsic metadata, Extrinsic metadata

vlorentz closed T1740: fetch extrinsic origin metadata from GitHub, a subtask of T833: When listing an origin, add origin level metadata to RMD storage, as Resolved.

May 3 2022, 11:08 AM · Extrinsic metadata, Restricted Project, GitHub lister

vlorentz closed T1740: fetch extrinsic origin metadata from GitHub, a subtask of T2202: Collect extrinsic metadata, as Resolved.

May 3 2022, 11:08 AM · Roadmap 2022, meta-task, Roadmap 2021, Extrinsic metadata

Apr 29 2022

ardumont closed T4206: prod: Deploy metadata loader v0.0.2, a subtask of T3273: Use "fork" relationships to speed-up initial load of large repositories, as Resolved.

Apr 29 2022, 11:27 AM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

Apr 28 2022

ardumont changed the status of T4206: prod: Deploy metadata loader v0.0.2, a subtask of T3273: Use "fork" relationships to speed-up initial load of large repositories, from Open to Work in Progress.

Apr 28 2022, 3:43 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

vlorentz edited projects for T3273: Use "fork" relationships to speed-up initial load of large repositories, added: Origin-GitHub; removed GitHub lister.

Apr 28 2022, 3:27 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

vlorentz edited projects for T3273: Use "fork" relationships to speed-up initial load of large repositories, added: Origin-GitLab; removed GitLab migration.

Apr 28 2022, 3:27 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

vlorentz added projects to T3273: Use "fork" relationships to speed-up initial load of large repositories: GitHub lister, GitLab migration.

Apr 28 2022, 3:27 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

vlorentz added a project to T3273: Use "fork" relationships to speed-up initial load of large repositories: Git loader.

Apr 28 2022, 3:26 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

vlorentz added a subtask for T3273: Use "fork" relationships to speed-up initial load of large repositories: T4206: prod: Deploy metadata loader v0.0.2.

Apr 28 2022, 3:26 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

Apr 27 2022

vlorentz placed T3559: Enable the swh-search QL in staging up for grabs.

Apr 27 2022, 2:28 PM · Archive search, System administration, Intrinsic metadata, Extrinsic metadata

vlorentz claimed T3273: Use "fork" relationships to speed-up initial load of large repositories.

Apr 27 2022, 2:12 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

vlorentz added revisions to T3273: Use "fork" relationships to speed-up initial load of large repositories: D7691: Store the result of MetadataFetcher.get_parent_origins, D7695: Replace 'base_url' argument with 'self.parent_origins' attribute, D7663: Add method get_parent_origins().

Apr 27 2022, 2:07 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader

Apr 22 2022

pratyush added a comment to T1747: Review APIs to get metadata from supported origins.

WIP

Apr 22 2022, 4:44 PM · Extrinsic metadata

Apr 21 2022

vlorentz removed a parent task for T3859: investigate using metadata from GHTorrent: T1740: fetch extrinsic origin metadata from GitHub.

Apr 21 2022, 8:39 PM · Extrinsic metadata

vlorentz added a comment to T3859: investigate using metadata from GHTorrent.

Removing this from the subtasks of T1740; given that we decided in T1739 to query the API at the same time we load origins

Apr 21 2022, 8:39 PM · Extrinsic metadata

vlorentz added a parent task for T2693: fetch extrinsic origin metadata from GitLab instances: T2202: Collect extrinsic metadata.

Apr 21 2022, 9:01 AM · Extrinsic metadata, Origin-GitLab

vlorentz added a subtask for T2202: Collect extrinsic metadata: T2693: fetch extrinsic origin metadata from GitLab instances.

Apr 21 2022, 9:01 AM · Roadmap 2022, meta-task, Roadmap 2021, Extrinsic metadata

vlorentz closed T1739: Define an architecture to fetch extrinsic metadata outside listers and loaders, a subtask of T2693: fetch extrinsic origin metadata from GitLab instances, as Resolved.

Apr 21 2022, 9:00 AM · Extrinsic metadata, Origin-GitLab

vlorentz closed T1739: Define an architecture to fetch extrinsic metadata outside listers and loaders as Resolved.

I started working this design. We'll see if it needs to change later

Apr 21 2022, 9:00 AM · Extrinsic metadata

Apr 19 2022

vlorentz updated the task description for T3542: Decide what metadata we want to / can collect from GitHub.

Apr 19 2022, 12:10 PM · Origin-GitHub, Extrinsic metadata

vlorentz added a comment to T3542: Decide what metadata we want to / can collect from GitHub.

In summary, we would archive everything with priority "high" or "mid", as well as the "license" and "main language" fields, as they are all easy to fetch and store

Apr 19 2022, 11:18 AM · Origin-GitHub, Extrinsic metadata

Apr 18 2022

pratyush added a comment to T1747: Review APIs to get metadata from supported origins.

name [url]

type

methods

auth/throttle

code_source

metadata_source

metadata_conformance

etl_code

status

Apr 18 2022, 8:46 PM · Extrinsic metadata

pratyush added a comment to T1747: Review APIs to get metadata from supported origins.

Apr 18 2022, 7:55 PM · Extrinsic metadata

Apr 11 2022

vlorentz added a comment to T1739: Define an architecture to fetch extrinsic metadata outside listers and loaders.

In T1739#82939, @olasd wrote:

Yes, all these are good points. As long as forges don't provide a way of loading the metadata in bulk, it makes sense to do it at the same time as loading.

Apr 11 2022, 2:44 PM · Extrinsic metadata

olasd added a comment to T1739: Define an architecture to fetch extrinsic metadata outside listers and loaders.

In T1739#82920, @vlorentz wrote:

The original idea for this was to have separate tasks to fetch metadata, so that loaders did not have forge-specific code to fetch metadata.

However, the idea of loading metadata from loader is more appealing the more I think about it:

Metadata are fetched at about the same time as we snapshot code; which would allow showing more consistent states of repositories

Active repositories automatically have their metadata fetched more often than inactive ones

We don't have one more moving part to monitor and schedule

This allows the Git loader to know a new repo is a "forge fork" of another one before it starts loading, so it can do an incremental load

Apr 11 2022, 2:36 PM · Extrinsic metadata

moranegg added a comment to T1739: Define an architecture to fetch extrinsic metadata outside listers and loaders.

To me the advantages are strong ,especially point 1 and 4.

Apr 11 2022, 2:11 PM · Extrinsic metadata

vlorentz added a comment to T1739: Define an architecture to fetch extrinsic metadata outside listers and loaders.

The original idea for this was to have separate tasks to fetch metadata, so that loaders did not have forge-specific code to fetch metadata.

Apr 11 2022, 1:45 PM · Extrinsic metadata

vlorentz closed T3542: Decide what metadata we want to / can collect from GitHub as Resolved.

Looks like *what* we want to collect is a solved issue.

Apr 11 2022, 9:46 AM · Origin-GitHub, Extrinsic metadata

vlorentz closed T3542: Decide what metadata we want to / can collect from GitHub, a subtask of T1747: Review APIs to get metadata from supported origins, as Resolved.

Apr 11 2022, 9:46 AM · Extrinsic metadata

Mar 25 2022

bchauvet raised the priority of T2202: Collect extrinsic metadata from Normal to High.

Mar 25 2022, 5:30 PM · Roadmap 2022, meta-task, Roadmap 2021, Extrinsic metadata

Mar 23 2022

bchauvet added a project to T2202: Collect extrinsic metadata: Roadmap 2022.

Mar 23 2022, 4:48 PM · Roadmap 2022, meta-task, Roadmap 2021, Extrinsic metadata

bchauvet added a parent task for T3097: Expose metadata in the WebApp and make it searchable: T4081: Show metadata on Web UI.

Mar 23 2022, 4:45 PM · Intrinsic metadata, Extrinsic metadata, Roadmap 2021, meta-task

Feb 23 2022

anlambert closed T3967: "Link" header is not properly displayed in apidoc when it contains [], a subtask of T3559: Enable the swh-search QL in staging, as Resolved.

Feb 23 2022, 5:39 PM · Archive search, System administration, Intrinsic metadata, Extrinsic metadata

Feb 22 2022

vlorentz added a parent task for T3558: Enable the swh-search QL in production: T3952: Make the search query language a first class citizen .

Feb 22 2022, 6:56 PM · Archive search, System administration, Intrinsic metadata, Extrinsic metadata

Feb 21 2022

vlorentz added a subtask for T3559: Enable the swh-search QL in staging: T3967: "Link" header is not properly displayed in apidoc when it contains [].

Feb 21 2022, 3:42 PM · Archive search, System administration, Intrinsic metadata, Extrinsic metadata

Feb 7 2022

vlorentz updated the task description for T3558: Enable the swh-search QL in production.

Feb 7 2022, 10:24 AM · Archive search, System administration, Intrinsic metadata, Extrinsic metadata

vlorentz updated the task description for T3559: Enable the swh-search QL in staging.

Feb 7 2022, 10:23 AM · Archive search, System administration, Intrinsic metadata, Extrinsic metadata

Jan 18 2022

vlorentz added a parent task for T3859: investigate using metadata from GHTorrent: T1740: fetch extrinsic origin metadata from GitHub.

Jan 18 2022, 12:43 PM · Extrinsic metadata

vlorentz triaged T3859: investigate using metadata from GHTorrent as Normal priority.

Jan 18 2022, 12:43 PM · Extrinsic metadata

Jan 13 2022

vsellier removed a revision from T833: When listing an origin, add origin level metadata to RMD storage: D6946: netbox: use the centralized admin db.

Jan 13 2022, 4:24 PM · Extrinsic metadata, Restricted Project, GitHub lister

vsellier added a revision to T833: When listing an origin, add origin level metadata to RMD storage: D6946: netbox: use the centralized admin db.

Jan 13 2022, 4:21 PM · Extrinsic metadata, Restricted Project, GitHub lister

Nov 22 2021

vlorentz closed T3636: Make the opam loader write extrinsic metadata as Resolved.

Nov 22 2021, 2:44 PM · Extrinsic metadata, Opam

Nov 10 2021

ardumont closed T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3, a subtask of T3636: Make the opam loader write extrinsic metadata, as Resolved.

Nov 10 2021, 4:43 PM · Extrinsic metadata, Opam

ardumont changed the status of T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3, a subtask of T3636: Make the opam loader write extrinsic metadata, from Open to Work in Progress.

Nov 10 2021, 3:33 PM · Extrinsic metadata, Opam

vlorentz added a subtask for T3636: Make the opam loader write extrinsic metadata: T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3.

Nov 10 2021, 3:20 PM · Extrinsic metadata, Opam

Nov 8 2021

vlorentz added a revision to T3636: Make the opam loader write extrinsic metadata: D6606: opam: Write package definitions to the extrinsic metadata storage.

Nov 8 2021, 11:58 AM · Extrinsic metadata, Opam

Oct 21 2021

vlorentz closed T1344: Write specs about metadata workflow , a subtask of T833: When listing an origin, add origin level metadata to RMD storage, as Resolved.

Oct 21 2021, 2:12 PM · Extrinsic metadata, Restricted Project, GitHub lister

moranegg added a subtask for T1739: Define an architecture to fetch extrinsic metadata outside listers and loaders: T3681: Review extrinsic metadata specification.

Oct 21 2021, 12:59 PM · Extrinsic metadata

moranegg added a parent task for T3681: Review extrinsic metadata specification: T1739: Define an architecture to fetch extrinsic metadata outside listers and loaders.

Oct 21 2021, 12:59 PM · Extrinsic metadata

moranegg added a comment to T3681: Review extrinsic metadata specification.

I think we (or I should say- I) missed ambiguity of the concept origin.

Oct 21 2021, 12:36 PM · Extrinsic metadata

moranegg triaged T3681: Review extrinsic metadata specification as Normal priority.

Oct 21 2021, 12:33 PM · Extrinsic metadata

Oct 8 2021

vlorentz added a comment to T3636: Make the opam loader write extrinsic metadata.

(this task is a dependency of T3638, because author != committer in revisions created by opam, and releases don't have a "committer" field, so switching to releases would lose this data)

Oct 8 2021, 2:33 PM · Extrinsic metadata, Opam

vlorentz added a parent task for T3636: Make the opam loader write extrinsic metadata: T3638: Make package loaders create releases objects instead of revisions.

Oct 8 2021, 2:32 PM · Extrinsic metadata, Opam

vlorentz edited projects for T3636: Make the opam loader write extrinsic metadata, added: Extrinsic metadata; removed Metadata workflow.

Oct 8 2021, 2:26 PM · Extrinsic metadata, Opam