Page MenuHomeSoftware Heritage

Archive repo.or.cz
Started, Work in Progress, Unbreak Now!Public

Description

It looks like it will go down before the end of the month.

It seems it uses a customized cgit, so eg. it has forks, which are probably not supported by our cgit lister.

They have a public project list that should be easy to parse, though:

<mackyle> Is this what you had in mind: https://repo.or.cz/archive/repolist.txt
<mackyle> Or did you want URLs too?
<mackyle> Forks have "/" in the name

Full logs from their IRC channel:

<pasky> hmm seems like rover is down?
<mackyle> pasky: hmmm, seems that way, from my traces before my connections died, it looks like it was mostly idle at the time too
<pasky> ok so we have a larger outage, or possibly an end of life event - the machine was completely dark without any leds, and couldn't be powered on
<pasky> after power reconnect, we have ethernet LED blinking but still cannot be turned on
<pasky> and as it happens I'm leaving for an 8 day trip just tomorrow morning, so the timing was perfect to maximize outage length; i'll ask the netadmins if they can think of something easy to do themselves, otherwise it's up to me, but frankly I'm not sure if I'm up to some larger resuscitation session - even if I replace the PSU, I suspect we are nearing MTBF for too many other components
<pasky> (*if* the PSU is the culprit)
<mackyle> dang, well, we will just have to wait and see what happens
<mackyle> all those "repo.or.cz is down" emails to the mailing list will just be piling up with nowhere to go
<pasky_> rover is back up (seems like moving it around and uncasing it alone helped...), but producing horrible noises from its fans so it's probably a very short-term matter now
<pasky_> my proposal is to EOL the current hardware by end of June; imho the responsible thing to do is at least semi-orderly controlled phaseout rather than just crossing our fingers until it crashes for good
<pasky_> I don't think we have the volunteer manpower here to get ~sponsored hosting, migrate the service and keep running it, do we? If not, within a couple of days I'd mass-mail all users with an EOL announcement (and a last call for someone to take it over). What do you think?
<pabs3> pasky_: if you can get the hardware up or the hard drives in another machine, it would be nice to have ArchiveTeam do an emergency backup of the repo.or.cz server to archive.org
<pabs3> ah, so the hardware is back up. let me know if a full backup to archive.org would be helpful
<pabs3> a couple of questions from ArchiveTeam folks:
<pabs3>  <JAA> Do we have any idea about the total size of all repos?
<pabs3>  <JAA> And a list of all repos (*including* forks, which are not directly listed on https://repo.or.cz/?a=project_list ) would be great.
<mackyle> pabs3 all repos currently occupy approximately 247 GiB of disk space, there are 7,375 total repositories at this time
<pabs3> thanks, passing that on
<pabs3> mackyle: is there a way to get a list of all repos, including the forks?
<mackyle> yup, all 7377 of them (went up since the report I quoted before)
<mackyle> are you looking for a file?
<pabs3> I think a file would help if it included the forks, project_list doesn't include them directly
<pabs3> aha, found a list with the forks
<pabs3> https://repo.or.cz/?a=project_index
<pabs3> mackyle, pasky_: shall I tell ArchiveTeam folks to go ahead and clone everything?
<mackyle> Is this what you had in mind: https://repo.or.cz/archive/repolist.txt
<mackyle> Or did you want URLs too?
<mackyle> Forks have "/" in the name
<pabs3> that works too
<pabs3> just the names should be enough
<mackyle> We do have bundle download support, but I'm not sure that will substantially speed things up and the bundles are only updated once a week meaning a further clone would be needed to top it off with the latest stuff
<mackyle> For example, fetching https://repo.or.cz/git.git/clone.bundle will get you the latest Git bundle
<mackyle> But it's just as large as a fresh clone, it's just easier to restart if interrupted
<pabs3> I think AT folks have good enough network they will likely go with clones, not sure. passed on the bundle info
--> JAA (JAA from ~JAA@user/jaa) has joined #repo.or.cz
<pabs3> pasky_, mackyle: ^ JAA is from ArchiveTeam. welcome :)
<mackyle> JAA I will be happy to facilitate, let me know what you need
<JAA> Hi, thanks. I think I have all I need, will get this running later today. :-)
<mackyle> Thanks :)
<pabs3> yay :)
<pasky_> very nice! thanks for arranging this!
<pabs3> I'm also asking a couple of Software Heritage folks if they would do a bulk import too
<pabs3> they pointed me at https://www.softwareheritage.org/2019/01/10/save_code_now/
<pabs3> I'm submitting all the URLs there, should take a few days based on current rate limits
<JAA> Apologies for the delay, real life interfered. I have things set up and ready, but I'd like to actively monitor it at least at the start, but I'm very tired right now. Will launch it in the morning.

Event Timeline

vlorentz triaged this task as Unbreak Now! priority.Jun 19 2022, 9:15 AM
vlorentz created this task.
olasd changed the task status from Open to Work in Progress.Jun 21 2022, 10:07 PM
olasd claimed this task.
olasd added subscribers: ardumont, olasd.

I've scheduled the archival of the 7377 repos in one of the leftover one-shot queues.

Progress indicator : https://grafana.softwareheritage.org/d/dI3GLwkZz/rabbitmq-metrics?orgId=1&refresh=1m&var-cluster=rabbit@saatchi&var-node=rabbit@saatchi&var-queue=oneshot3:swh.loader.git.tasks.UpdateGitRepository&viewPanel=2&from=now-6h&to=now

(I'm not quite sure what other processes are still putting tasks in this queue. @ardumont?)