Page MenuHomeSoftware Heritage

ingest git.eclipse.org repositories
Closed, MigratedEdits Locked

Description

We want to ingest all Git repositories of the Eclipse foundation (and they want us to do that too!).

They are using cgit, and the full repo listing is here: https://git.eclipse.org/c/
It also comes with a "idle" column, which is a good substitute for an actual push feed.

The only annoying thing about that cgit listing is that it's paginated and html based, we might want to fix that and work with them to deploy the change (and push it upstream).

Event Timeline

olasd changed the visibility from "All Users" to "Public (No Login Required)".May 13 2016, 5:09 PM
zack lowered the priority of this task from Normal to Low.Dec 27 2018, 10:43 AM
rdicosmo raised the priority of this task from Low to High.Jan 25 2021, 9:03 PM
rdicosmo added a subscriber: rdicosmo.

Now that we have a cgit lister, this should be a no brainer.
If that's the case, we need it up and running quickly.

In the context of deploying the next gen lister in staging (T2998), i also tried the eclipse cgit instance

That's not ready for that instance yet:

swhworker@worker0:~$ SWH_CONFIG_FILENAME=lister.yml swh lister run --lister cgit url=https://git.eclipse.org/c/ instance=eclipse

WARNING:swh.lister.cgit.lister:Unexpected HTTP status code 500 on https://git.eclipse.org/c/osbp/org.eclipse.osbp.runtime.functionlibrary.validation.git/  # <- on their side
Traceback (most recent call last):
...
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))   # <- most probably the lister is too agressive

(P931 for the full stacktrace)

Listed only 900 origins:

swh-scheduler=> select count(*) from listed_origins lo inner join listers l on lo.lister_id=l.id and l.name='cgit' and l.instance_name='eclipse';
 count
-------
   900
(1 row)

(I'm not sold yet on the no-brainer ;)

We have also T2999 to attend to which should definitely help.

Thanks @ardumont for experimenting with this. The 500 seems normal: we need to tell Eclipse about us first, I'll put you in touch. So maybe it's still a no-brainer, and we just need to document the "contant the owner to get whitelisted" human step :-)

The 500 seems normal

yes, it can happen and the lister is able to deal with it.

Listed only 900 origins:

By the way, after implementing T2999, the test revealed that we did 2/3 of the listing
in the listing (900 origins out of 1340)..

So I think after D4968 is deployed, we should be able to list it in one request. We'll
now ending up with only 1 http request.

So from the lister standpoint, we'll be good.

With the latest improvment, we listed the instance in one request [1]

[1] T3013#57809

Thanks @ardumont , that's great! If you think this does not need any more support on the Eclipse side, may you let them know?

Thanks @ardumont , that's great! If you think this does not need any more support on the Eclipse side, may you let them know?

Oh, you did it already, faster than thought :-)

Instance cgit scheduled [1]

And listed:

softwareheritage-scheduler=> \conninfo
You are connected to database "softwareheritage-scheduler" as user "guest" on host "belvedere.internal.softwareheritage.org" (address "192.168.100.210") at port "5432".
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
softwareheritage-scheduler=> select count(*) from listed_origins where lister_id='7a775770-2b2f-4139-aacb-ad715c022b9d';
 count
-------
  1340
(1 row)

Note that does not mean this is or will be ingested anytime soon though.
We are still missing at least the one cog to actually schedule those listed origins.

[1] T3024#58094

Note that does not mean this is or will be ingested anytime soon though.
We are still missing at least the one cog to actually schedule those listed origins.

More details in T2345#58247

ardumont added a subscriber: ardumont.

Note that does not mean this is or will be ingested anytime soon though.
We are still missing at least the one cog to actually schedule those listed origins.

More details in T2345#58247

If I understand well, everything is ready, and eclipse will be ingested when the new listers will be in production: thanks!

new listers

new scheduler, yes (new listers are already deployed and running in production)

ardumont claimed this task.

It's been done for a while. We can see it appearing in the main archive page under cgit.