Some objects from the original GitHub import have never actually been imported.
For around 20.000 origins that have been imported in the original github run, we failed to actually import the data, but we still created occurrences referencing commits (that are now dangling).

When doing incremental updates over these repositories, the git loader assumed that the revisions the occurrences pointed to were indeed imported, and therefore we have never filled the gaps by reimporting the data.

We should reimport it from the original git clones.

List of revisions with no parents (1259):

\copy (select id from revision_history where not exists (select 1 from revision where = parent_id)) to 'revisions_missing_parent';

List of origin sha1s containing orphan revisions (according to swh-graph) (255 origins. feels a bit low).

: > walks
sort revisions_missing_parent | cut -c 4- | while read rev; do
  (GET $url; echo; echo) >> walks
grep swh:1:ori walks | sort | uniq > origins

Origin URLs:

reloading (by hand) the origins with missing revisions.

for url in open('origin_urls').readlines():
    url = url.strip()
    ret = None
        l = GitLoader(url=url, ignore_history=True)
        ret = l.load()
    except Exception as e:
        ret = e
    print(url, ret)

... currently in progress