Page MenuHomeSoftware Heritage

List/Ingest major cgit instances
Closed, ResolvedPublic

Description

Number of origins reached the number of listed repositories, so ingestion is done.

|--------+---------------------------------------------+-----------------------------------+----------------+----------------+----------------|
| status | url                                         | url-prefix                        | instance       | # repos listed | # origins      |
|--------+---------------------------------------------+-----------------------------------+----------------+----------------+----------------|
| done   | https://git.kernel.org/                     | X                                 | git-kernel     |           1002 |           1002 |
| done   | https://gitweb.torproject.org/              | https://git.torproject.org/       | tor            |            492 |            501 |
| done   | https://fedorapeople.org/cgit/              | X                                 | fedora         |            751 |            841 |
| done   | https://cgit.kde.org/                       | https://anongit.kde.org/          | kde            |           2434 |           2434 |
| done   | https://www.happyassassin.net/cgit/         | X                                 | happyassassin  |              3 |              3 |
| done   | https://git.openembedded.org/               | X                                 | openembedded   |             15 |             15 |
| done   | https://git.zx2c4.com/                      | X                                 | zx2c4          |            140 |            140 |
| done   | http://git.gnu.org.ua/cgit/                 | http://git.gnu.org.ua/repo/       | git.gnu.org.ua |             50 |             50 |
| done   | https://git.alpinelinux.org/                | X                                 | alpinelinux    |            187 |            187 |
| done   | https://git.baserock.org/cgit/              | https://git.baserock.org/git/     | baserock       |           1456 |           1546 |
| done   | https://code.qt.io/cgit/                    | http://code.qt.io/                | qt.io          |            257 |            257 |
| done   | http://git.yoctoproject.org/clean/cgit.cgi/ | https://git.yoctoproject.org/git/ | yoctoproject   |            170 |            170 |
| done   | http://hdiff.luite.com/cgit/                | X                                 | hdiff.luite    |          13722 |          13722 |
|--------+---------------------------------------------+-----------------------------------+----------------+----------------+----------------|

(- https://anonscm.debian.org/cgit/: down as it got migrated to a gitlab instance, there seems to be an archive at https://alioth-archive.debian.org/)

  • Initialize db's data model with new table
  • Schedule the ready cgit instance listing [1]

with @nahimilega's support

[1]

SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.savannah.gnu.org/cgit/ instance=gnu-savannah
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://gitweb.torproject.org/ url_prefix=https://git.torproject.org/ instance=tor
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.kernel.org/ instance=git-kernel
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://fedorapeople.org/cgit/ instance=fedora
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://cgit.kde.org/ url_prefix=https://anongit.kde.org/ instance=kde
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://www.happyassassin.net/cgit/ instance=happyassassin
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.openembedded.org/ instance=openembedded
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.zx2c4.com/ instance=zx2c4
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=http://git.gnu.org.ua/cgit/ url_prefix=http://git.gnu.org.ua/repo/ instance=git.gnu.org.ua
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://www.happyassassin.net/cgit/  instance=happyassassin
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.openembedded.org/ instance=openembedded
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.zx2c4.com/ instance=zx2c4
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.alpinelinux.org/ instance=alpinelinux
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=http://hdiff.luite.com/cgit/ instance=hdiff.luite
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=http://git.gnu.org.ua/cgit/                 url_prefix=http://git.gnu.org.ua/repo/       instance=git.gnu.org.ua
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://git.baserock.org/cgit/              url_prefix=https://git.baserock.org/git/     instance=baserock
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=https://code.qt.io/cgit/                    url_prefix=http://code.qt.io/                instance=qt.io
SCHEDULER_API_URL=http://saatchi.internal.softwareheritage.org:5008/; swh scheduler --url $SCHEDULER_API_URL task add list-cgit url=http://git.yoctoproject.org/clean/cgit.cgi/ url_prefix=https://git.yoctoproject.org/git/ instance=yoctoproject

Note:

  • The number of origins might be a tad superior to the number of listed repos because the count(*) on origin tables to compute the number might be not as accurate.
  • That means that the ingestion is either in progress or done by the way.

Event Timeline

ardumont created this task.Jun 20 2019, 1:37 PM
ardumont triaged this task as Normal priority.
nahimilega updated the task description. (Show Details)Jun 20 2019, 2:20 PM
ardumont added a comment.EditedJun 28 2019, 11:38 AM

http://git.upsilon.cc/ (@zack's git repositories) might be relevant.

@nahimilega mentioned to me on irc that the current cgit lister won't work as is though.

anlambert updated the task description. (Show Details)Jun 28 2019, 11:44 AM
ardumont updated the task description. (Show Details)Jun 28 2019, 11:59 AM
ardumont removed a project: CGit lister.
anlambert updated the task description. (Show Details)Jun 28 2019, 12:01 PM
ardumont updated the task description. (Show Details)Jun 28 2019, 12:02 PM
ardumont updated the task description. (Show Details)Jun 28 2019, 6:48 PM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)Jun 28 2019, 7:02 PM
nahimilega updated the task description. (Show Details)Jun 28 2019, 7:22 PM
ardumont updated the task description. (Show Details)EditedJun 28 2019, 7:23 PM

Here we go, it's starting...

swh-lister=> select instance, count(*) from cgit_repo group by instance;
   instance   | count
--------------+-------
 gnu-kernel   |  1002
 gnu-savannah |  1018
 tor          |   492
(1 row)
ardumont changed the status of subtask T1451: ingest GNU Savannah Git repositories from Open to Work in Progress.Jun 28 2019, 7:33 PM
ardumont changed the task status from Open to Work in Progress.
ardumont claimed this task.
ardumont updated the task description. (Show Details)
nahimilega removed ardumont as the assignee of this task.Jun 28 2019, 7:36 PM
nahimilega changed the task status from Work in Progress to Open.
nahimilega updated the task description. (Show Details)
ardumont claimed this task.Jun 28 2019, 7:37 PM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)Jun 28 2019, 7:54 PM
ardumont updated the task description. (Show Details)Jun 28 2019, 8:06 PM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)

The instances (all except freedesktop) are done being listed.

Here is the result of the cache:

20:05:55 swh-lister@belvedere:5432=> select instance, count(*) number_repos from cgit_repo group by instance order by number_repos;
┌────────────────┬──────────────┐
│    instance    │ number_repos │
├────────────────┼──────────────┤
│ happyassassin  │            3 │
│ openembedded   │           15 │
│ git.gnu.org.ua │           50 │
│ zx2c4          │          140 │
│ yoctoproject   │          170 │
│ alpinelinux    │          187 │
│ qt.io          │          257 │
│ tor            │          492 │
│ fedora         │          751 │
│ git-kernel     │         1002 │
│ gnu-savannah   │         1018 │
│ baserock       │         1456 │
│ kde            │         2434 │
│ hdiff.luite    │        13722 │
└────────────────┴──────────────┘
(14 rows)

Time: 17.741 ms
20:06:03 swh-lister@belvedere:5432=>

Here is an excerpt of the time needed to list each:.

Jun 28 17:21:46 worker01 python3[18334]: [2019-06-28 17:21:46,794: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[ffa756b0-f4c0-4dee-bd9f-a388902a5347] succeeded in 10.55229873280041s: None
Jun 28 17:32:40 worker01 python3[18333]: [2019-06-28 17:32:40,961: INFO/ForkPoolWorker-1] Task swh.lister.cgit.tasks.CGitListerTask[3a362f22-34f8-4b2c-a308-ef2421086aef] succeeded in 3.041213259799406s: None
Jun 28 18:00:12 worker02 python3[26294]: [2019-06-28 18:00:12,971: INFO/ForkPoolWorker-3] Task swh.lister.cgit.tasks.CGitListerTask[0f0e149a-935b-4b52-acf6-71c6fa0a9fe2] succeeded in 0.7731958050280809s: None
Jun 28 18:00:14 worker03 python3[19965]: [2019-06-28 18:00:14,856: INFO/ForkPoolWorker-5] Task swh.lister.cgit.tasks.CGitListerTask[92468ebe-2427-4d34-a02b-454252233ed3] succeeded in 2.654821573989466s: None
Jun 28 17:46:59 worker04 python3[10950]: [2019-06-28 17:46:59,206: INFO/ForkPoolWorker-5] Task swh.lister.cgit.tasks.CGitListerTask[ee4a36f4-159e-4bed-be51-060d655820e7] succeeded in 20.162375305080786s: None
Jun 28 17:44:45 worker05 python3[7012]: [2019-06-28 17:44:45,081: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[387a04e7-4b72-4a5e-b311-5c7630a19864] succeeded in 6.755359191214666s: None
Jun 28 18:00:13 worker06 python3[17966]: [2019-06-28 18:00:13,897: INFO/ForkPoolWorker-1] Task swh.lister.cgit.tasks.CGitListerTask[e557d1fa-0fb1-45dd-b7e3-5362089a0e5f] succeeded in 1.7187734139151871s: None
Jun 28 18:00:13 worker07 python3[9748]: [2019-06-28 18:00:13,166: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[e7f054e3-1688-40fe-90c5-867c3e836b68] succeeded in 1.0016695000231266s: None
Jun 28 18:00:25 worker08 python3[2135]: [2019-06-28 18:00:25,706: INFO/ForkPoolWorker-1] Task swh.lister.cgit.tasks.CGitListerTask[981a9835-d144-4aff-aa61-1788fb195a2a] succeeded in 13.58928604517132s: None
Jun 28 18:00:17 worker09 python3[13801]: [2019-06-28 18:00:17,641: INFO/ForkPoolWorker-1] Task swh.lister.cgit.tasks.CGitListerTask[831a97f1-434c-4128-88b6-089d122b6ef5] succeeded in 5.3197371491696686s: None
Jun 28 18:00:19 worker10 python3[20798]: [2019-06-28 18:00:19,929: INFO/ForkPoolWorker-5] Task swh.lister.cgit.tasks.CGitListerTask[f3e7bfab-d356-4f7d-bf45-57d37068d380] succeeded in 7.68826347310096s: None
Jun 28 18:05:09 worker12 python3[5078]: [2019-06-28 18:05:09,401: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[8bdce7f7-c99a-40f3-90a1-70cf712d31f7] succeeded in 297.0999980599154s: None
Jun 28 18:01:22 worker14 python3[29109]: [2019-06-28 18:01:22,507: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[4b547347-8978-45a8-9a7a-85a655e4899f] succeeded in 70.25172589509748s: None
Jun 28 18:00:18 worker15 python3[23590]: [2019-06-28 18:00:18,018: INFO/ForkPoolWorker-2] Task swh.lister.cgit.tasks.CGitListerTask[f09b13d8-be74-417d-a153-d3289c36a4bb] succeeded in 5.778567451052368s: None
ardumont updated the task description. (Show Details)Jun 28 2019, 8:14 PM

Getting cgit.freedesktop.org out of the scope of this task.
See T1861 for it.

ardumont updated the task description. (Show Details)Jul 1 2019, 4:01 PM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)Jul 1 2019, 4:04 PM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)

Getting gnu-savannah instance out of the task (T1451 exists for it already).

ardumont renamed this task from List major cgit instances to List/Ingest major cgit instances.Jul 1 2019, 4:52 PM
ardumont updated the task description. (Show Details)