Page MenuHomeSoftware Heritage
Feed Advanced Search

Jun 17 2021

ardumont claimed T3374: Ingest sourceforge repositories (origins of type git, svn, hg).
Jun 17 2021, 10:24 AM · System administration, Archive coverage, Origin-SourceForge

Jun 14 2021

ardumont updated the task description for T3374: Ingest sourceforge repositories (origins of type git, svn, hg).
Jun 14 2021, 11:29 AM · System administration, Archive coverage, Origin-SourceForge

Jun 11 2021

ardumont updated the task description for T3374: Ingest sourceforge repositories (origins of type git, svn, hg).
Jun 11 2021, 12:39 PM · System administration, Archive coverage, Origin-SourceForge
ardumont updated the task description for T3374: Ingest sourceforge repositories (origins of type git, svn, hg).
Jun 11 2021, 12:37 PM · System administration, Archive coverage, Origin-SourceForge
ardumont added a comment to T3374: Ingest sourceforge repositories (origins of type git, svn, hg).

Note: when this is (reasonably) done, we should document the addition of SourceForge to the archive coverage page at archive.s.o and also to the archive changelog.

Jun 11 2021, 12:31 PM · System administration, Archive coverage, Origin-SourceForge
ardumont updated the task description for T3374: Ingest sourceforge repositories (origins of type git, svn, hg).
Jun 11 2021, 12:31 PM · System administration, Archive coverage, Origin-SourceForge
ardumont updated the task description for T3374: Ingest sourceforge repositories (origins of type git, svn, hg).
Jun 11 2021, 12:29 PM · System administration, Archive coverage, Origin-SourceForge
ardumont updated the task description for T3374: Ingest sourceforge repositories (origins of type git, svn, hg).
Jun 11 2021, 12:28 PM · System administration, Archive coverage, Origin-SourceForge
zack added a project to T3374: Ingest sourceforge repositories (origins of type git, svn, hg): Archive coverage.

Note: when this is (reasonably) done, we should document the addition of SourceForge to the archive coverage page at archive.s.o and also to the archive changelog.

Jun 11 2021, 12:27 PM · System administration, Archive coverage, Origin-SourceForge
ardumont closed T3350: Deploy sourceforge lister in production as Resolved.
Jun 11 2021, 12:24 PM · Archive coverage, System administration, Origin-SourceForge
ardumont added a comment to T3350: Deploy sourceforge lister in production.

It's deployed and the ingestion is ongoing.
Monitoring of the ingestion will be moved to a dedicated task [1]
Closing this now.

Jun 11 2021, 12:23 PM · Archive coverage, System administration, Origin-SourceForge

Jun 9 2021

borisbaldassari added a comment to T1724: Maven Central repository support.

Update for the Maven Indexer prototype: it works! (finally)

Jun 9 2021, 9:03 AM · Maven loader, Maven lister, GSoC 2019, Archive coverage

Jun 8 2021

zack added a project to T3350: Deploy sourceforge lister in production: Archive coverage.
Jun 8 2021, 11:45 AM · Archive coverage, System administration, Origin-SourceForge
borisbaldassari added a comment to T1724: Maven Central repository support.

Some more information about the maven indexer. Beware people it's a bit dirty, and you're not going to like it infra-wise.

Jun 8 2021, 9:26 AM · Maven loader, Maven lister, GSoC 2019, Archive coverage
borisbaldassari added a comment to T1724: Maven Central repository support.

So, to sum up the options we have.. Basically we "just" need all artifacts coordinates. From there for each artifact we can:

Jun 8 2021, 8:40 AM · Maven loader, Maven lister, GSoC 2019, Archive coverage

Jun 5 2021

borisbaldassari added a comment to T1724: Maven Central repository support.

Few more cents in the bucket..

  • scrapping is explicitly forbidden, see https://repo1.maven.org/terms.html -- however making contact first will help us go through most of the abuse-limiting rules I guess.
  • regarding fasten, there are indeed some bits that could be useful. However most of our difficulties are in getting a list of projects, whereas this information is already provided by the user in the case of fasten. So, interesting and useful, but not a game changer regarding the difficult part of our job.
Jun 5 2021, 2:51 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage

May 31 2021

vlorentz updated the task description for T3353: Answer all items in the suggestion box.
May 31 2021, 4:33 PM · Archive coverage
vlorentz triaged T3353: Answer all items in the suggestion box as High priority.
May 31 2021, 4:31 PM · Archive coverage

May 7 2021

zack added a subtask for T3315: archive SourceForge: T735: SourceForge lister.
May 7 2021, 5:25 PM · Archive coverage
zack triaged T3315: archive SourceForge as Normal priority.
May 7 2021, 5:25 PM · Archive coverage

May 6 2021

zack added a comment to T3311: Use .gitmodules to discover origins.

I think the only issue with (3) is not being retroactive

May 6 2021, 6:49 PM · Archive coverage, Git loader
vlorentz added a project to T3311: Use .gitmodules to discover origins: Archive coverage.
May 6 2021, 6:34 PM · Archive coverage, Git loader

Apr 12 2021

vlorentz added a comment to T3235: Add archival of bug tracker databases as well as an unofficial bug tracker per-project.

You are likely doing a git pull on a periodic basis. Just add git bug bridge pull [<name>] next to it.

Apr 12 2021, 3:37 PM · Archive coverage, Data Model
libEqualizer added a comment to T3235: Add archival of bug tracker databases as well as an unofficial bug tracker per-project.

However, this would require considerable work

Apr 12 2021, 2:48 PM · Archive coverage, Data Model
vlorentz triaged T3235: Add archival of bug tracker databases as well as an unofficial bug tracker per-project as Wishlist priority.

Hi, thanks for the suggestion.

Apr 12 2021, 11:31 AM · Archive coverage, Data Model

Mar 30 2021

zack added a comment to T2833: cpan.loader - archive Perl modules from CPAN.

awesome, thanks @joenio ! you can also drop by our other devel communication channel if you want to discuss this in other ways: https://www.softwareheritage.org/community/developers/

Mar 30 2021, 3:29 PM · CPAN lister, Archive coverage
joenio added a comment to T2833: cpan.loader - archive Perl modules from CPAN.

Thanks @zack for the info, I'll start learning the SWH dev stack following the instructions I found in the wiki[1].

Mar 30 2021, 2:27 PM · CPAN lister, Archive coverage
zack renamed T2833: cpan.loader - archive Perl modules from CPAN from [feature request] cpan.loader - preserver Perl modules from CPAN to cpan.loader - preserver Perl modules from CPAN.
Mar 30 2021, 8:22 AM · CPAN lister, Archive coverage
zack raised the priority of T2833: cpan.loader - archive Perl modules from CPAN from Wishlist to Normal.
Mar 30 2021, 8:22 AM · CPAN lister, Archive coverage
zack added a comment to T2833: cpan.loader - archive Perl modules from CPAN.

Hey, yes, we want to have one, but nobody is working it at the moment, and we rather have someone knowledgeable with that ecosystem to work on it. So, if you're interested, you're more than welcome to help there! (And thank you in advance.)

Mar 30 2021, 8:21 AM · CPAN lister, Archive coverage
joenio added a comment to T2833: cpan.loader - archive Perl modules from CPAN.

Hi SWH devs,

Mar 30 2021, 1:56 AM · CPAN lister, Archive coverage

Mar 17 2021

rdicosmo added a comment to T1724: Maven Central repository support.

After recent exchanges with @hboutemy and Charles Sabourdin, here is a clarification of the scope of this task.
We need a Maven repository lister that addresses the following issues:

Mar 17 2021, 10:40 AM · Maven loader, Maven lister, GSoC 2019, Archive coverage

Mar 15 2021

ardumont added a comment to T3095: Add LIP6 gitlab instance to regular crawling list.

Listing deployed in production:

swhscheduler@saatchi:~$ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ task add list-gitlab-incremental url="https://gitlab.lip6.fr/api/v4/" instance=lip6
Created 1 tasks
Mar 15 2021, 11:52 AM · Scientific Community Building, Archive coverage
ardumont added a comment to T3095: Add LIP6 gitlab instance to regular crawling list.

Everything went fine:

worker1.internal.staging.swh.network: Mar 15 10:37:01 worker1 python3[2277003]: [2021-03-15 10:37:01,800: INFO/MainProcess] Received task: swh.lister.gitlab.tasks.IncrementalGitLabLister[86f12806-f321-4ea1-8438-83c6fd0c457b]
worker1.internal.staging.swh.network: Mar 15 10:37:06 worker1 python3[2277067]: [2021-03-15 10:37:06,017: INFO/ForkPoolWorker-4] Task swh.lister.gitlab.tasks.IncrementalGitLabLister[86f12806-f321-4ea1-8438-83c6fd0c457b] succeeded in 4.2116
25785101205s: {'pages': 5, 'origins': 64}
Mar 15 2021, 11:42 AM · Scientific Community Building, Archive coverage
ardumont added a comment to T3095: Add LIP6 gitlab instance to regular crawling list.

Checking in staging first, with:

Mar 15 2021, 11:39 AM · Scientific Community Building, Archive coverage

Mar 11 2021

rdicosmo added a comment to T1724: Maven Central repository support.

@hboutemy : I wonder if you are aware that we have now in place a grant program that allows to fund development of listers like this one.
All the information is available at https://www.softwareheritage.org/grants and you can mail me for more info if needed.

Mar 11 2021, 8:32 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage

Mar 8 2021

rdicosmo renamed T3095: Add LIP6 gitlab instance to regular crawling list from Ad LIP6 gitlab instance to regular crawling list to Add LIP6 gitlab instance to regular crawling list.
Mar 8 2021, 7:02 PM · Scientific Community Building, Archive coverage
rdicosmo updated subscribers of T3095: Add LIP6 gitlab instance to regular crawling list.
Mar 8 2021, 7:02 PM · Scientific Community Building, Archive coverage
rdicosmo raised the priority of T3095: Add LIP6 gitlab instance to regular crawling list from Normal to High.

We would like to see this in prod as soon as reasonably possible.

Mar 8 2021, 5:43 PM · Scientific Community Building, Archive coverage
rdicosmo updated the task description for T3098: Save VLC's forge/repositories.
Mar 8 2021, 4:03 PM · Archive coverage
anlambert added a comment to T3098: Save VLC's forge/repositories.

There is also the VideoLAN Gitlab instance (that will replace the cgit forge) to archive located at https://code.videolan.org/.

Mar 8 2021, 3:42 PM · Archive coverage
anlambert added a project to T3098: Save VLC's forge/repositories: Archive coverage.
Mar 8 2021, 3:39 PM · Archive coverage

Mar 7 2021

rdicosmo triaged T3095: Add LIP6 gitlab instance to regular crawling list as Normal priority.
Mar 7 2021, 8:40 AM · Scientific Community Building, Archive coverage

Feb 10 2021

ardumont added a comment to T376: ingest git.eclipse.org repositories.

new listers

Feb 10 2021, 1:38 PM · Archive coverage
rdicosmo added a comment to T376: ingest git.eclipse.org repositories.

Note that does not mean this is or will be ingested anytime soon though.
We are still missing at least the one cog to actually schedule those listed origins.

More details in T2345#58247

Feb 10 2021, 12:31 PM · Archive coverage
ardumont placed T376: ingest git.eclipse.org repositories up for grabs.
Feb 10 2021, 9:20 AM · Archive coverage
ardumont added a comment to T376: ingest git.eclipse.org repositories.

Note that does not mean this is or will be ingested anytime soon though.
We are still missing at least the one cog to actually schedule those listed origins.

Feb 10 2021, 9:20 AM · Archive coverage

Feb 8 2021

olasd added a comment to T2345: Improve handling of recurrent loading tasks in scheduler.

Here's my understanding of the status of the migration to the next generation scheduler as of today:

Feb 8 2021, 12:01 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
vlorentz reassigned T2973: Implement a scheduler simulator from vlorentz to olasd.
Feb 8 2021, 12:00 PM · Sprint 2021 01, Archive coverage, Scheduling utilities

Feb 4 2021

ardumont added a comment to T376: ingest git.eclipse.org repositories.

Instance cgit scheduled [1]

Feb 4 2021, 9:29 AM · Archive coverage

Feb 2 2021

anlambert closed T2442: Provide a unified API for listers to interact with the scheduler, a subtask of T2345: Improve handling of recurrent loading tasks in scheduler, as Resolved.
Feb 2 2021, 4:08 PM · Sprint 2021 01, Archive coverage, Scheduling utilities

Feb 1 2021

rdicosmo added a comment to T376: ingest git.eclipse.org repositories.

Thanks @ardumont , that's great! If you think this does not need any more support on the Eclipse side, may you let them know?

Feb 1 2021, 5:59 PM · Archive coverage
rdicosmo added a comment to T376: ingest git.eclipse.org repositories.

Thanks @ardumont , that's great! If you think this does not need any more support on the Eclipse side, may you let them know?

Feb 1 2021, 5:58 PM · Archive coverage
ardumont added a comment to T376: ingest git.eclipse.org repositories.

With the latest improvment, we listed the instance in one request [1]

Feb 1 2021, 5:25 PM · Archive coverage

Jan 29 2021

ardumont closed T2999: Optimize the number of HTTP requests sent by the cgit lister, a subtask of T376: ingest git.eclipse.org repositories, as Resolved.
Jan 29 2021, 5:36 PM · Archive coverage
ardumont added a revision to T376: ingest git.eclipse.org repositories: D4968: cgit: Compute origin urls out of a base git url when provided..
Jan 29 2021, 12:21 PM · Archive coverage
ardumont added a comment to T376: ingest git.eclipse.org repositories.

The 500 seems normal

Jan 29 2021, 11:51 AM · Archive coverage
ardumont added a comment to T376: ingest git.eclipse.org repositories.

yes, agreed.

Jan 29 2021, 10:34 AM · Archive coverage
rdicosmo added a comment to T376: ingest git.eclipse.org repositories.

Thanks @ardumont for experimenting with this. The 500 seems normal: we need to tell Eclipse about us first, I'll put you in touch. So maybe it's still a no-brainer, and we just need to document the "contant the owner to get whitelisted" human step :-)

Jan 29 2021, 10:04 AM · Archive coverage
ardumont added a subtask for T376: ingest git.eclipse.org repositories: T2999: Optimize the number of HTTP requests sent by the cgit lister.
Jan 29 2021, 9:24 AM · Archive coverage

Jan 28 2021

ardumont added a comment to T376: ingest git.eclipse.org repositories.

In the context of deploying the next gen lister in staging (T2998), i also tried the eclipse cgit instance

Jan 28 2021, 5:09 PM · Archive coverage

Jan 25 2021

rdicosmo assigned T376: ingest git.eclipse.org repositories to ardumont.
Jan 25 2021, 9:03 PM · Archive coverage
rdicosmo raised the priority of T376: ingest git.eclipse.org repositories from Low to High.

Now that we have a cgit lister, this should be a no brainer.
If that's the case, we need it up and running quickly.

Jan 25 2021, 9:03 PM · Archive coverage
ardumont closed T2443: Implement a bulk-queryable cache of latest visits for use by the recurrent visit scheduler, a subtask of T2345: Improve handling of recurrent loading tasks in scheduler, as Resolved.
Jan 25 2021, 8:42 AM · Sprint 2021 01, Archive coverage, Scheduling utilities

Jan 20 2021

vlorentz added a comment to T2974: Define (and implement) scheduler performance metrics.
  • "'outdatedest' origin": excluding disabled origins and origins visited after their last_activity (if any), the min(current_time - last_visit) (lower is better)
Jan 20 2021, 5:33 PM · Sprint 2021 01, Archive coverage, Scheduling utilities

Jan 18 2021

douardda added a comment to T2974: Define (and implement) scheduler performance metrics.

thanks, looks a good starting point.

Jan 18 2021, 4:36 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
olasd added a comment to T2974: Define (and implement) scheduler performance metrics.
  • "origins with pending changes": Number of origins where last_visit < last_activity (lower is better)
Jan 18 2021, 2:29 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
olasd added a comment to T2974: Define (and implement) scheduler performance metrics.

Some potentially interesting and "easy" metrics:

Jan 18 2021, 2:27 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
olasd moved T2974: Define (and implement) scheduler performance metrics from Backlog to in-progress on the Sprint 2021 01 board.
Jan 18 2021, 2:17 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
olasd changed the status of T2974: Define (and implement) scheduler performance metrics from Open to Work in Progress.
Jan 18 2021, 2:17 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
olasd added a revision to T2973: Implement a scheduler simulator: D4856: Introduce scaffolding for a scheduler simulator.
Jan 18 2021, 2:13 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
olasd changed the status of T2973: Implement a scheduler simulator, a subtask of T2345: Improve handling of recurrent loading tasks in scheduler, from Open to Work in Progress.
Jan 18 2021, 2:12 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
olasd changed the status of T2973: Implement a scheduler simulator from Open to Work in Progress.
Jan 18 2021, 2:12 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
olasd triaged T2973: Implement a scheduler simulator as High priority.
Jan 18 2021, 2:12 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
vlorentz changed the status of T2444: Implement the scheduling policy for the recurrent visit scheduler, a subtask of T2345: Improve handling of recurrent loading tasks in scheduler, from Open to Work in Progress.
Jan 18 2021, 2:08 PM · Sprint 2021 01, Archive coverage, Scheduling utilities

Jan 11 2021

anarcat added a comment to T1799: ingest Tor git repositories.

great, thanks! :)

Jan 11 2021, 5:30 PM · Archive coverage
ardumont triaged T2954: ingest gitlab.torproject.org as Normal priority.
Jan 11 2021, 5:17 PM · Archive coverage
ardumont closed T1799: ingest Tor git repositories, a subtask of T1798: ingest Tor project source code (meta task), as Resolved.
Jan 11 2021, 5:15 PM · Archive coverage
ardumont closed T1799: ingest Tor git repositories as Resolved.
Jan 11 2021, 5:15 PM · Archive coverage
ardumont added a comment to T1799: ingest Tor git repositories.

It was (and thanks for the ping).

Jan 11 2021, 5:15 PM · Archive coverage
anarcat added a comment to T1799: ingest Tor git repositories.

was this completed? i suspect this issue can be closed if the gitweb is being regularly crawled.

Jan 11 2021, 4:51 PM · Archive coverage
ardumont changed the status of T2443: Implement a bulk-queryable cache of latest visits for use by the recurrent visit scheduler, a subtask of T2345: Improve handling of recurrent loading tasks in scheduler, from Open to Work in Progress.
Jan 11 2021, 2:03 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont added a project to T2345: Improve handling of recurrent loading tasks in scheduler: Sprint 2021 01.
Jan 11 2021, 12:36 PM · Sprint 2021 01, Archive coverage, Scheduling utilities

Dec 11 2020

zack renamed T2793: add notable past events to the archive changelog from add to the journal of archive changes notable past events to add notable past events to the archive changelog.
Dec 11 2020, 4:24 PM · Archive coverage, Documentation
zack closed T2793: add notable past events to the archive changelog as Resolved by committing rDDOCde792f8fd73a: archive changelog: remove WIP mark.
Dec 11 2020, 4:23 PM · Archive coverage, Documentation
zack updated the task description for T2793: add notable past events to the archive changelog.
Dec 11 2020, 4:21 PM · Archive coverage, Documentation
zack updated the task description for T2793: add notable past events to the archive changelog.
Dec 11 2020, 3:57 PM · Archive coverage, Documentation
ardumont closed T1352: ingest Guix (SD) packages as Resolved.
Dec 11 2020, 3:49 PM · Archive coverage
ardumont closed T1991: Implement a Guix/Nix loader, a subtask of T1352: ingest Guix (SD) packages, as Resolved.
Dec 11 2020, 3:47 PM · Archive coverage
zack updated the task description for T2793: add notable past events to the archive changelog.
Dec 11 2020, 3:07 PM · Archive coverage, Documentation
zack closed T1139: ingest major gitlab instances as Resolved.

All instances listed in this task have been added, so I'm closing this. Other instances can be added in the future, submitting matching task for tracking reasons as needed.

Dec 11 2020, 2:35 PM · Archive coverage, Origin-GitLab

Dec 10 2020

zack added a comment to T2793: add notable past events to the archive changelog.

I've completed a bunch of these in 86f8b213e23970feb9f9bda8ab87fc7d6851abf0

Dec 10 2020, 4:55 PM · Archive coverage, Documentation
zack updated the task description for T2793: add notable past events to the archive changelog.
Dec 10 2020, 4:54 PM · Archive coverage, Documentation
zack updated the task description for T2793: add notable past events to the archive changelog.
Dec 10 2020, 4:21 PM · Archive coverage, Documentation
zack updated the task description for T2793: add notable past events to the archive changelog.
Dec 10 2020, 11:30 AM · Archive coverage, Documentation
zack updated the task description for T2793: add notable past events to the archive changelog.
Dec 10 2020, 11:28 AM · Archive coverage, Documentation
zack updated the task description for T2793: add notable past events to the archive changelog.
Dec 10 2020, 11:17 AM · Archive coverage, Documentation
zack updated the task description for T2793: add notable past events to the archive changelog.
Dec 10 2020, 11:14 AM · Archive coverage, Documentation
zack updated the task description for T2793: add notable past events to the archive changelog.
Dec 10 2020, 11:12 AM · Archive coverage, Documentation
zack updated the task description for T2793: add notable past events to the archive changelog.
Dec 10 2020, 11:10 AM · Archive coverage, Documentation