Page MenuHomeSoftware Heritage

List/Ingest major cgit instances
Closed, MigratedEdits Locked

Description

Number of origins reached the number of listed repositories, so ingestion is done.

|--------+---------------------------------------------+-----------------------------------+----------------+----------------+----------------|
| status | url                                         | url-prefix                        | instance       | # repos listed | # origins      |
|--------+---------------------------------------------+-----------------------------------+----------------+----------------+----------------|
| done   | https://git.kernel.org/                     | X                                 | git-kernel     |           1002 |           1002 |
| done   | https://gitweb.torproject.org/              | https://git.torproject.org/       | tor            |            492 |            501 |
| done   | https://fedorapeople.org/cgit/              | X                                 | fedora         |            751 |            841 |
| done   | https://cgit.kde.org/                       | https://anongit.kde.org/          | kde            |           2434 |           2434 |
| done   | https://www.happyassassin.net/cgit/         | X                                 | happyassassin  |              3 |              3 |
| done   | https://git.openembedded.org/               | X                                 | openembedded   |             15 |             15 |
| done   | https://git.zx2c4.com/                      | X                                 | zx2c4          |            140 |            140 |
| done   | http://git.gnu.org.ua/cgit/                 | http://git.gnu.org.ua/repo/       | git.gnu.org.ua |             50 |             50 |
| done   | https://git.alpinelinux.org/                | X                                 | alpinelinux    |            187 |            187 |
| done   | https://git.baserock.org/cgit/              | https://git.baserock.org/git/     | baserock       |           1456 |           1546 |
| done   | https://code.qt.io/cgit/                    | http://code.qt.io/                | qt.io          |            257 |            257 |
| done   | http://git.yoctoproject.org/clean/cgit.cgi/ | https://git.yoctoproject.org/git/ | yoctoproject   |            170 |            170 |
| done   | http://hdiff.luite.com/cgit/                | X                                 | hdiff.luite    |          13722 |          13722 |
|--------+---------------------------------------------+-----------------------------------+----------------+----------------+----------------|

(- https://anonscm.debian.org/cgit/: down as it got migrated to a gitlab instance, there seems to be an archive at https://alioth-archive.debian.org/)

  • Initialize db's data model with new table
  • Schedule the ready cgit instance listing [1]

with @nahimilega's support

[1]

SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.savannah.gnu.org/cgit/ instance=gnu-savannah
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://gitweb.torproject.org/ url_prefix=https://git.torproject.org/ instance=tor
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.kernel.org/ instance=git-kernel
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://fedorapeople.org/cgit/ instance=fedora
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://cgit.kde.org/ url_prefix=https://anongit.kde.org/ instance=kde
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://www.happyassassin.net/cgit/ instance=happyassassin
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.openembedded.org/ instance=openembedded
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.zx2c4.com/ instance=zx2c4
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=http://git.gnu.org.ua/cgit/ url_prefix=http://git.gnu.org.ua/repo/ instance=git.gnu.org.ua
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://www.happyassassin.net/cgit/  instance=happyassassin
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.openembedded.org/ instance=openembedded
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.zx2c4.com/ instance=zx2c4
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.alpinelinux.org/ instance=alpinelinux
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=http://hdiff.luite.com/cgit/ instance=hdiff.luite
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=http://git.gnu.org.ua/cgit/                 url_prefix=http://git.gnu.org.ua/repo/       instance=git.gnu.org.ua
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.baserock.org/cgit/              url_prefix=https://git.baserock.org/git/     instance=baserock
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://code.qt.io/cgit/                    url_prefix=http://code.qt.io/                instance=qt.io
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=http://git.yoctoproject.org/clean/cgit.cgi/ url_prefix=https://git.yoctoproject.org/git/ instance=yoctoproject

Note:

  • The number of origins might be a tad superior to the number of listed repos because the count(*) on origin tables to compute the number might be not as accurate.
  • That means that the ingestion is either in progress or done by the way.

Event Timeline

ardumont triaged this task as Normal priority.Jun 20 2019, 1:37 PM
ardumont created this task.

http://git.upsilon.cc/ (@zack's git repositories) might be relevant.

@nahimilega mentioned to me on irc that the current cgit lister won't work as is though.

ardumont updated the task description. (Show Details)

Here we go, it's starting...

swh-lister=> select instance, count(*) from cgit_repo group by instance;
   instance   | count
--------------+-------
 gnu-kernel   |  1002
 gnu-savannah |  1018
 tor          |   492
(1 row)
ardumont changed the task status from Open to Work in Progress.Jun 28 2019, 7:33 PM
ardumont claimed this task.
ardumont changed the status of subtask T1451: ingest GNU Savannah Git repositories from Open to Work in Progress.
ardumont updated the task description. (Show Details)
nahimilega changed the task status from Work in Progress to Open.Jun 28 2019, 7:36 PM
nahimilega removed ardumont as the assignee of this task.
nahimilega updated the task description. (Show Details)
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)

The instances (all except freedesktop) are done being listed.

Here is the result of the cache:

20:05:55 swh-lister@belvedere:5432=> select instance, count(*) number_repos from cgit_repo group by instance order by number_repos;
┌────────────────┬──────────────┐
│    instance    │ number_repos │
├────────────────┼──────────────┤
│ happyassassin  │            3 │
│ openembedded   │           15 │
│ git.gnu.org.ua │           50 │
│ zx2c4          │          140 │
│ yoctoproject   │          170 │
│ alpinelinux    │          187 │
│ qt.io          │          257 │
│ tor            │          492 │
│ fedora         │          751 │
│ git-kernel     │         1002 │
│ gnu-savannah   │         1018 │
│ baserock       │         1456 │
│ kde            │         2434 │
│ hdiff.luite    │        13722 │
└────────────────┴──────────────┘
(14 rows)

Time: 17.741 ms
20:06:03 swh-lister@belvedere:5432=>

Here is an excerpt of the time needed to list each:.

Jun 28 17:21:46 worker01 python3[18334]: [2019-06-28 17:21:46,794: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[ffa756b0-f4c0-4dee-bd9f-a388902a5347] succeeded in 10.55229873280041s: None
Jun 28 17:32:40 worker01 python3[18333]: [2019-06-28 17:32:40,961: INFO/ForkPoolWorker-1] Task swh.lister.cgit.tasks.CGitListerTask[3a362f22-34f8-4b2c-a308-ef2421086aef] succeeded in 3.041213259799406s: None
Jun 28 18:00:12 worker02 python3[26294]: [2019-06-28 18:00:12,971: INFO/ForkPoolWorker-3] Task swh.lister.cgit.tasks.CGitListerTask[0f0e149a-935b-4b52-acf6-71c6fa0a9fe2] succeeded in 0.7731958050280809s: None
Jun 28 18:00:14 worker03 python3[19965]: [2019-06-28 18:00:14,856: INFO/ForkPoolWorker-5] Task swh.lister.cgit.tasks.CGitListerTask[92468ebe-2427-4d34-a02b-454252233ed3] succeeded in 2.654821573989466s: None
Jun 28 17:46:59 worker04 python3[10950]: [2019-06-28 17:46:59,206: INFO/ForkPoolWorker-5] Task swh.lister.cgit.tasks.CGitListerTask[ee4a36f4-159e-4bed-be51-060d655820e7] succeeded in 20.162375305080786s: None
Jun 28 17:44:45 worker05 python3[7012]: [2019-06-28 17:44:45,081: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[387a04e7-4b72-4a5e-b311-5c7630a19864] succeeded in 6.755359191214666s: None
Jun 28 18:00:13 worker06 python3[17966]: [2019-06-28 18:00:13,897: INFO/ForkPoolWorker-1] Task swh.lister.cgit.tasks.CGitListerTask[e557d1fa-0fb1-45dd-b7e3-5362089a0e5f] succeeded in 1.7187734139151871s: None
Jun 28 18:00:13 worker07 python3[9748]: [2019-06-28 18:00:13,166: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[e7f054e3-1688-40fe-90c5-867c3e836b68] succeeded in 1.0016695000231266s: None
Jun 28 18:00:25 worker08 python3[2135]: [2019-06-28 18:00:25,706: INFO/ForkPoolWorker-1] Task swh.lister.cgit.tasks.CGitListerTask[981a9835-d144-4aff-aa61-1788fb195a2a] succeeded in 13.58928604517132s: None
Jun 28 18:00:17 worker09 python3[13801]: [2019-06-28 18:00:17,641: INFO/ForkPoolWorker-1] Task swh.lister.cgit.tasks.CGitListerTask[831a97f1-434c-4128-88b6-089d122b6ef5] succeeded in 5.3197371491696686s: None
Jun 28 18:00:19 worker10 python3[20798]: [2019-06-28 18:00:19,929: INFO/ForkPoolWorker-5] Task swh.lister.cgit.tasks.CGitListerTask[f3e7bfab-d356-4f7d-bf45-57d37068d380] succeeded in 7.68826347310096s: None
Jun 28 18:05:09 worker12 python3[5078]: [2019-06-28 18:05:09,401: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[8bdce7f7-c99a-40f3-90a1-70cf712d31f7] succeeded in 297.0999980599154s: None
Jun 28 18:01:22 worker14 python3[29109]: [2019-06-28 18:01:22,507: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[4b547347-8978-45a8-9a7a-85a655e4899f] succeeded in 70.25172589509748s: None
Jun 28 18:00:18 worker15 python3[23590]: [2019-06-28 18:00:18,018: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[f09b13d8-be74-417d-a153-d3289c36a4bb] succeeded in 5.778567451052368s: None

Getting cgit.freedesktop.org out of the scope of this task.
See T1861 for it.

ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)

Getting gnu-savannah instance out of the task (T1451 exists for it already).

ardumont renamed this task from List major cgit instances to List/Ingest major cgit instances.Jul 1 2019, 4:52 PM
ardumont updated the task description. (Show Details)