Page MenuHomeSoftware Heritage
Feed Advanced Search

Jun 29 2022

franckbret added a comment to T4104: Ingest crates.io (Rust).

Hello,
The crates lister (stateless) and loader have landed.
I just solved some discovered issues while running lister and loader on the Docker env ( D8049 ).

Jun 29 2022, 3:03 PM · Crates loader, Crates lister, Archive coverage

Jun 21 2022

olasd changed the status of T4335: Archive repo.or.cz from Open to Work in Progress.

I've scheduled the archival of the 7377 repos in one of the leftover one-shot queues.

Jun 21 2022, 10:07 PM · Archive coverage
bchauvet moved T4233: Ingest Arch Linux from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Jun 21 2022, 2:37 PM · Arch loader, Arch Lister, Archive coverage

Jun 19 2022

vlorentz updated the task description for T4335: Archive repo.or.cz.
Jun 19 2022, 9:25 AM · Archive coverage
vlorentz triaged T4335: Archive repo.or.cz as Unbreak Now! priority.
Jun 19 2022, 9:15 AM · Archive coverage

Jun 17 2022

ardumont updated the task description for T4233: Ingest Arch Linux.
Jun 17 2022, 9:53 AM · Arch loader, Arch Lister, Archive coverage

Jun 16 2022

ardumont updated the task description for T4233: Ingest Arch Linux.
Jun 16 2022, 2:47 PM · Arch loader, Arch Lister, Archive coverage
ardumont updated the task description for T4330: Deploy maven stack in production.
Jun 16 2022, 11:13 AM · System administration, Maven loader, Maven lister, GSoC 2019, Archive coverage
ardumont triaged T4330: Deploy maven stack in production as Normal priority.
Jun 16 2022, 10:21 AM · System administration, Maven loader, Maven lister, GSoC 2019, Archive coverage
ardumont updated the task description for T4233: Ingest Arch Linux.
Jun 16 2022, 10:02 AM · Arch loader, Arch Lister, Archive coverage
franckbret added a revision to T4233: Ingest Arch Linux: D7995: Arch Linux loader.
Jun 16 2022, 9:38 AM · Arch loader, Arch Lister, Archive coverage

Jun 15 2022

bchauvet closed T4326: Archive the pom file additionally to the source folder, a subtask of T3746: staging: Deploy maven indexer/lister/loader, as Invalid.
Jun 15 2022, 5:20 PM · Maven loader, Maven lister, System administration, Archive coverage
bchauvet closed T4326: Archive the pom file additionally to the source folder as Invalid.
Jun 15 2022, 5:20 PM · Maven loader, Maven lister, System administration, Archive coverage
borisbaldassari added a comment to T4326: Archive the pom file additionally to the source folder.

Yesss! \o/

Jun 15 2022, 4:33 PM · Maven loader, Maven lister, System administration, Archive coverage
anlambert added a comment to T4326: Archive the pom file additionally to the source folder.

So in the end, the conclusion is that the loader already does the right thing so it's a noop, right?

Jun 15 2022, 4:26 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont renamed T4326: Archive the pom file additionally to the source folder from archive the pom file additionally to the source folder to Archive the pom file additionally to the source folder.
Jun 15 2022, 4:09 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont renamed T4326: Archive the pom file additionally to the source folder from archive the pom file additionnaly to the source folder to archive the pom file additionally to the source folder.
Jun 15 2022, 4:09 PM · Maven loader, Maven lister, System administration, Archive coverage
borisbaldassari added a comment to T4326: Archive the pom file additionally to the source folder.

Good news *can* happen, ahah! Thanks for notifying me.

Jun 15 2022, 3:42 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont updated subscribers of T4326: Archive the pom file additionally to the source folder.

To summarize, the initial intent was to adapt the jar loaded (as extracted directory) to append the pom.xml so we do not lose that reference.

Jun 15 2022, 3:33 PM · Maven loader, Maven lister, System administration, Archive coverage
bchauvet added a comment to T4326: Archive the pom file additionally to the source folder.

You're right boris, indeed it's already stored as extrinsic metadata, we hadn't checked properly :)
Thank you for your answer !

Jun 15 2022, 3:33 PM · Maven loader, Maven lister, System administration, Archive coverage
borisbaldassari added a comment to T4326: Archive the pom file additionally to the source folder.

I'm not sure to understand the intent, as we already keep the pom in the extrinsic metadata (don't we?).
Double-checking in the SWH codebase, I believe you could build upon this: see [1] lines 166-180.

Jun 15 2022, 3:07 PM · Maven loader, Maven lister, System administration, Archive coverage
borisbaldassari added a comment to T4326: Archive the pom file additionally to the source folder.

Congrats on the work done! I think that downloading the pom file from the same folder is indeed the way to go.

Jun 15 2022, 2:54 PM · Maven loader, Maven lister, System administration, Archive coverage
anlambert added a comment to T4326: Archive the pom file additionally to the source folder.

I think the simplest way to get the pom file associated to a specific release of a maven package is to download it from the folder where we can find the source jar.

Jun 15 2022, 1:48 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont updated the task description for T4233: Ingest Arch Linux.
Jun 15 2022, 10:47 AM · Arch loader, Arch Lister, Archive coverage

Jun 13 2022

bchauvet updated the task description for T4326: Archive the pom file additionally to the source folder.
Jun 13 2022, 2:21 PM · Maven loader, Maven lister, System administration, Archive coverage
bchauvet added a comment to T4326: Archive the pom file additionally to the source folder.
Jun 13 2022, 2:16 PM · Maven loader, Maven lister, System administration, Archive coverage
bchauvet updated the task description for T4326: Archive the pom file additionally to the source folder.
Jun 13 2022, 1:36 PM · Maven loader, Maven lister, System administration, Archive coverage
bchauvet updated the task description for T4326: Archive the pom file additionally to the source folder.
Jun 13 2022, 1:30 PM · Maven loader, Maven lister, System administration, Archive coverage
bchauvet added a comment to T4326: Archive the pom file additionally to the source folder.

in a source.jar, the pom is not inculded by default but can be if specified :

Jun 13 2022, 1:27 PM · Maven loader, Maven lister, System administration, Archive coverage
bchauvet added a comment to T4326: Archive the pom file additionally to the source folder.
Jun 13 2022, 1:25 PM · Maven loader, Maven lister, System administration, Archive coverage
bchauvet updated the task description for T4326: Archive the pom file additionally to the source folder.
Jun 13 2022, 12:11 PM · Maven loader, Maven lister, System administration, Archive coverage
bchauvet triaged T4326: Archive the pom file additionally to the source folder as Normal priority.
Jun 13 2022, 12:10 PM · Maven loader, Maven lister, System administration, Archive coverage

Jun 3 2022

ardumont added a comment to T3874: staging: Analyze result of the maven listing and ingestion.

There remains git and other dvcs typed origins [1] listed by maven but not github ones [2].

Jun 3 2022, 4:11 PM · Maven loader, Maven lister, Archive coverage
ardumont closed T4232: Listers: Canonicalize listed github origins, a subtask of T3874: staging: Analyze result of the maven listing and ingestion, as Resolved.
Jun 3 2022, 3:19 PM · Maven loader, Maven lister, Archive coverage
ardumont closed T3874: staging: Analyze result of the maven listing and ingestion as Resolved.
Jun 3 2022, 3:18 PM · Maven loader, Maven lister, Archive coverage
ardumont closed T3874: staging: Analyze result of the maven listing and ingestion, a subtask of T3746: staging: Deploy maven indexer/lister/loader, as Resolved.
Jun 3 2022, 3:18 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Jun 3 2022, 3:18 PM · Maven loader, Maven lister, Archive coverage
ardumont added a subtask for T3874: staging: Analyze result of the maven listing and ingestion: T4232: Listers: Canonicalize listed github origins.
Jun 3 2022, 3:18 PM · Maven loader, Maven lister, Archive coverage
ardumont added a comment to T3874: staging: Analyze result of the maven listing and ingestion.

status: triggered 2 full-maven lister runs on maven central and jboss [1]
And no more exotic github urls are popping up [2].

Jun 3 2022, 2:18 PM · Maven loader, Maven lister, Archive coverage
ardumont added a comment to T3874: staging: Analyze result of the maven listing and ingestion.

Yesterday, i had fixed, diffed, released and pushed the diff [1] fixing the
canonicalization of remaining exotic urls, cleaned up 'git' (out of a maven listing)
origins and triggered back a listing. Today, checking back those origins (staging
scheduler), there was still noise which should no longer have been there...

Jun 3 2022, 9:35 AM · Maven loader, Maven lister, Archive coverage

Jun 2 2022

ardumont added a revision to T3874: staging: Analyze result of the maven listing and ingestion: D7946: github/utils: Deal with exotic urls to canonicalize.
Jun 2 2022, 3:14 PM · Maven loader, Maven lister, Archive coverage
ardumont added a comment to T3874: staging: Analyze result of the maven listing and ingestion.

Full listing is not finished yet but still there remains origins with exotic starting urls which are not canonicalized.
I'd say the issue lies with the canonicalize swh.core implementation code which only deals with https:// and git:// urls.
So some improvments are needed there.

Jun 2 2022, 2:08 PM · Maven loader, Maven lister, Archive coverage

Jun 1 2022

ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Jun 1 2022, 3:06 PM · Maven loader, Maven lister, Archive coverage
ardumont added a comment to T3874: staging: Analyze result of the maven listing and ingestion.

Plan:

  • P1369: Listing status after first round listing
  • Clean up maven github origins listing [1]
  • Trigger maven full run [2]
  • Wait for listing to finish
  • Listing status after new maven lister round of listing
  • Ping in mailing list discussion with data!
Jun 1 2022, 3:05 PM · Maven loader, Maven lister, Archive coverage
ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Jun 1 2022, 3:01 PM · Maven loader, Maven lister, Archive coverage
ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Jun 1 2022, 10:50 AM · Maven loader, Maven lister, Archive coverage
ardumont added a comment to T3874: staging: Analyze result of the maven listing and ingestion.

Old maven behavior results in origins like git://github.com, ... [1]
The new maven lister behavior should now result in canonical github urls http://github.com/user/repo.
Analysis ongoing and report will go after that comment.

Jun 1 2022, 10:50 AM · Maven loader, Maven lister, Archive coverage
ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Jun 1 2022, 10:47 AM · Maven loader, Maven lister, Archive coverage

May 25 2022

ardumont added a revision to T4233: Ingest Arch Linux: D7894: Add arch lister module (origins from archives)..
May 25 2022, 3:32 PM · Arch loader, Arch Lister, Archive coverage
ardumont updated the task description for T4233: Ingest Arch Linux.
May 25 2022, 3:32 PM · Arch loader, Arch Lister, Archive coverage

May 13 2022

anlambert added projects to T4143: staging: Deploy maven stack fixes: Maven lister, Maven loader.
May 13 2022, 4:55 PM · Maven loader, Maven lister, System administration, Archive coverage
anlambert added projects to T3746: staging: Deploy maven indexer/lister/loader: Maven lister, Maven loader.
May 13 2022, 4:54 PM · Maven loader, Maven lister, System administration, Archive coverage
anlambert added projects to T1724: Maven Central repository support: Maven lister, Maven loader.
May 13 2022, 4:54 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
anlambert added projects to T4215: staging: Deploy latest maven stack: Maven lister, Maven loader.
May 13 2022, 4:54 PM · Maven loader, Maven lister, System administration, Archive coverage
anlambert added projects to T3874: staging: Analyze result of the maven listing and ingestion: Maven lister, Maven loader.
May 13 2022, 4:53 PM · Maven loader, Maven lister, Archive coverage
anlambert added a project to T1423: Add .crate (Rust) loader: Crates loader.
May 13 2022, 4:20 PM · Crates loader, Archive coverage, Sprint 2018 12, Restricted Project
anlambert added a project to T1424: Add crates.io (Rust) lister: Crates lister.
May 13 2022, 4:20 PM · Crates lister, Archive coverage, Restricted Project, Sprint 2018 12
anlambert added projects to T4104: Ingest crates.io (Rust): Crates lister, Crates loader.
May 13 2022, 4:20 PM · Crates loader, Crates lister, Archive coverage

May 12 2022

bchauvet added a subtask for T4104: Ingest crates.io (Rust): T4167: Document crates lister.
May 12 2022, 10:32 AM · Crates loader, Crates lister, Archive coverage
bchauvet removed a parent task for T1423: Add .crate (Rust) loader: Unknown Object (Maniphest Task).
May 12 2022, 10:32 AM · Crates loader, Archive coverage, Sprint 2018 12, Restricted Project
bchauvet added a parent task for T4104: Ingest crates.io (Rust): Unknown Object (Maniphest Task).
May 12 2022, 10:32 AM · Crates loader, Crates lister, Archive coverage

May 11 2022

ardumont updated subscribers of T4233: Ingest Arch Linux.
May 11 2022, 3:26 PM · Arch loader, Arch Lister, Archive coverage
ardumont shifted T4233: Ingest Arch Linux from the Restricted Space space to the S1 Public space.
May 11 2022, 3:26 PM · Arch loader, Arch Lister, Archive coverage
ardumont added a revision to T4233: Ingest Arch Linux: D7812: [WIP] Add arch lister module..
May 11 2022, 3:26 PM · Arch loader, Arch Lister, Archive coverage
ardumont triaged T4233: Ingest Arch Linux as Normal priority.
May 11 2022, 3:17 PM · Arch loader, Arch Lister, Archive coverage

May 4 2022

ghenry added a comment to T2833: cpan.loader - archive Perl modules from CPAN.

Sounds good.

May 4 2022, 2:03 PM · CPAN lister, Archive coverage
joenio added a comment to T2833: cpan.loader - archive Perl modules from CPAN.
In T2833#83806, @ghenry wrote:

Hi @joenio

I've just joined SWH as an ambassador and wondered how you are getting on with the cpan.loader? Or maybe metacpan.loader now?

Thanks,
Gavin.

May 4 2022, 11:31 AM · CPAN lister, Archive coverage

May 3 2022

anlambert added a revision to T3874: staging: Analyze result of the maven listing and ingestion: D7730: loader: Handle origin URL redirection in SvnLoaderFromRemoteDump.
May 3 2022, 2:15 PM · Maven loader, Maven lister, Archive coverage

May 2 2022

anlambert added a revision to T3874: staging: Analyze result of the maven listing and ingestion: D7728: maven: Use most recent release of a package as default version.
May 2 2022, 4:10 PM · Maven loader, Maven lister, Archive coverage
anlambert added a revision to T3874: staging: Analyze result of the maven listing and ingestion: D7721: maven: Prevent UnicodeDecodeError when processing pom file.
May 2 2022, 12:13 PM · Maven loader, Maven lister, Archive coverage

Apr 29 2022

ardumont closed T4215: staging: Deploy latest maven stack, a subtask of T3874: staging: Analyze result of the maven listing and ingestion, as Resolved.
Apr 29 2022, 4:17 PM · Maven loader, Maven lister, Archive coverage
ardumont closed T4215: staging: Deploy latest maven stack as Resolved.
Apr 29 2022, 4:17 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont added a comment to T4215: staging: Deploy latest maven stack.

We can now see new maven origins [1] with multiple lists of artifacts [2]

Apr 29 2022, 4:17 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont moved T4215: staging: Deploy latest maven stack from in-progress to deployed/landed/monitoring on the System administration board.
Apr 29 2022, 3:42 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont added a comment to T4215: staging: Deploy latest maven stack.

And ingested:

Apr 29 13:38:51 worker3 python3[215979]: [2022-04-29 13:38:51,458: INFO/ForkPoolWorker-1] Task swh.loader.package.maven.tasks.LoadMaven[4ac12ab2-7749-4152-baa1-6bf06a587cad] succeeded in 74.83338255400304s: {'status': 'eventful', 'snapshot_id': '571e40dbc59f5f5238d981986211bcabde2d73d8'}
Apr 29 2022, 3:39 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont added a comment to T4215: staging: Deploy latest maven stack.

New schemed maven origins are getting listed now:

Apr 29 2022, 3:37 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont updated the task description for T4215: staging: Deploy latest maven stack.
Apr 29 2022, 3:29 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont updated the task description for T4215: staging: Deploy latest maven stack.
Apr 29 2022, 3:12 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont changed the status of T4215: staging: Deploy latest maven stack from Open to Work in Progress.
Apr 29 2022, 2:53 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont changed the status of T4215: staging: Deploy latest maven stack, a subtask of T3874: staging: Analyze result of the maven listing and ingestion, from Open to Work in Progress.
Apr 29 2022, 2:53 PM · Maven loader, Maven lister, Archive coverage
ardumont updated the task description for T4215: staging: Deploy latest maven stack.
Apr 29 2022, 2:53 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont triaged T4215: staging: Deploy latest maven stack as Normal priority.
Apr 29 2022, 2:53 PM · Maven loader, Maven lister, System administration, Archive coverage
ardumont added a parent task for T1724: Maven Central repository support: Unknown Object (Maniphest Task).
Apr 29 2022, 2:26 PM · Maven loader, Maven lister, GSoC 2019, Archive coverage
ardumont added a revision to T3874: staging: Analyze result of the maven listing and ingestion: D7716: maven: Handle null mtime value in index for jar archive.
Apr 29 2022, 1:57 PM · Maven loader, Maven lister, Archive coverage
ardumont added a parent task for T1423: Add .crate (Rust) loader: Unknown Object (Maniphest Task).
Apr 29 2022, 11:17 AM · Crates loader, Archive coverage, Sprint 2018 12, Restricted Project
franckbret added a revision to T4104: Ingest crates.io (Rust): D7713: crates: rework to take advantage of data returned by the crates lister.
Apr 29 2022, 10:14 AM · Crates loader, Crates lister, Archive coverage

Apr 28 2022

anlambert added a revision to T3874: staging: Analyze result of the maven listing and ingestion: D7712: package/maven: Fix jar archive download after changes in lister.
Apr 28 2022, 6:23 PM · Maven loader, Maven lister, Archive coverage
anlambert added a revision to T3874: staging: Analyze result of the maven listing and ingestion: D7710: maven: Create one origin per package instead of one per package version.
Apr 28 2022, 3:54 PM · Maven loader, Maven lister, Archive coverage

Apr 27 2022

ardumont added a revision to T4104: Ingest crates.io (Rust): D7654: crates: create one origin per package instead of per version.
Apr 27 2022, 10:04 AM · Crates loader, Crates lister, Archive coverage

Apr 22 2022

pratyush added a comment to T2833: cpan.loader - archive Perl modules from CPAN.

CPANTS These indicators might be helpful in categorizing incoming data. So that the mining process is significantly easier

Apr 22 2022, 1:16 PM · CPAN lister, Archive coverage
zack renamed T2833: cpan.loader - archive Perl modules from CPAN from cpan.loader - preserver Perl modules from CPAN to cpan.loader - archive Perl modules from CPAN.
Apr 22 2022, 11:26 AM · CPAN lister, Archive coverage
ghenry added a comment to T2833: cpan.loader - archive Perl modules from CPAN.

I've just joined SWH as an ambassador and wondered how you are getting on with the cpan.loader? Or maybe metacpan.loader now?

Apr 22 2022, 11:23 AM · CPAN lister, Archive coverage

Apr 15 2022

ardumont changed the status of T3874: staging: Analyze result of the maven listing and ingestion from Open to Work in Progress.
Apr 15 2022, 5:12 PM · Maven loader, Maven lister, Archive coverage
ardumont changed the status of T3874: staging: Analyze result of the maven listing and ingestion, a subtask of T3746: staging: Deploy maven indexer/lister/loader, from Open to Work in Progress.
Apr 15 2022, 5:12 PM · Maven loader, Maven lister, System administration, Archive coverage

Apr 14 2022

ardumont added a comment to T3874: staging: Analyze result of the maven listing and ingestion.

And now we have some origins referenced in the (staging) archive [1]

Apr 14 2022, 5:00 PM · Maven loader, Maven lister, Archive coverage
ardumont added a comment to T3874: staging: Analyze result of the maven listing and ingestion.

Another round of deployment occured with swh.lister v2.8.1 occurred.
clojars repository got listed again (ongoing) and the lister is no longer crashing for that one.

Apr 14 2022, 4:45 PM · Maven loader, Maven lister, Archive coverage
ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Apr 14 2022, 4:36 PM · Maven loader, Maven lister, Archive coverage
ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Apr 14 2022, 3:36 PM · Maven loader, Maven lister, Archive coverage
ardumont added a revision to T3874: staging: Analyze result of the maven listing and ingestion: D7584: maven: Fix argument of type 'NoneType' is not iterable.
Apr 14 2022, 3:35 PM · Maven loader, Maven lister, Archive coverage
ardumont updated the task description for T3874: staging: Analyze result of the maven listing and ingestion.
Apr 14 2022, 3:26 PM · Maven loader, Maven lister, Archive coverage