Page MenuHomeSoftware Heritage

cgit lister: Adapt lister to deal with inconsistent `git clone uri` pattern
Closed, ResolvedPublic

Description

The first cgit implementation is done. And the largest cgit instances have been listed (T1835).

But, some cgit instances [1] expose inconsistently the repositories they host (in term of git clone uri).
Reflect on the proper way to adapt the cgit lister for it.

It has been entertained so far to (excerpt from irc discussion):

  1. 1 extra request per repo page identified in pages. Parse those pages' result to extract the git clone uri (first implementation from @nahimilega in D1610, asked by us to be modified to avoid extra load on the server we list)
  2. Try and determine git clone uri pattern per group (some information is sometimes at disposal in the main page we parse). That seems to match our case specifically for those instances [1]
  3. Whatever the choice, not being exclusive on the implemented solution. Allow to define a listing policy per cgit instance. Because the current default policy is good enough for most, it'd be great to keep it. Triggering another policy for the one really divergent from it [1].

Related T1835
Related T376 (eclipse's cgit)

[1] Cgit instances:

Event Timeline

ardumont triaged this task as Normal priority.Jun 28 2019, 8:11 PM
ardumont created this task.
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)Jun 28 2019, 8:36 PM
ardumont renamed this task from cgit lister adaptations to deal with cgit.freedesktop.org to cgit lister: Adapt lister to deal with inconsistent `git clone uri` pattern.Jun 29 2019, 9:10 AM
ardumont updated the task description. (Show Details)
ardumont added a comment.EditedJun 29 2019, 9:13 AM

I'd be inclined to use a composition of solution:

  • having the listing policy determined at cgit instance lister initialization (3.)
  • Since 1. has already been done in the past, use that instead as a fallback (eclipse and freedesktop might be popular enough to sustain the load for a tad more requests than the current bare listing we do).

I'd avoid 2. for now because i'm not sure if the next divergent cgit instances we encounter will have that grouping of pattern...
And if they don't we'll have to come up with yet another policy.

With 1. a priori, we won't have to.

Cheers,

ardumont updated the task description. (Show Details)Jun 29 2019, 9:15 AM
ardumont updated the task description. (Show Details)Jun 29 2019, 3:57 PM

D1929 took care of it ;)