Page MenuHomeSoftware Heritage

swh.lister.gnu: Change origin type to tar
ClosedPublic

Authored by nahimilega on Jun 19 2019, 1:51 PM.

Details

Summary

Change origin type from 'gnu' to 'tar'

Rationale, there is no need for differentiating origin-type 'gnu' from other origin-type 'tar'.
It's actually already mentioned in the origin's entry (through its url).

Furthermore, from an historic point of view, we shall try to not diverge too much from original gnu loading.
Even though, those were typed ftp at the time...
(We were young and foolish back then.
Personally, I still am but i like to believe i'm a little less foolish now ;)

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

nahimilega added inline comments.
swh/lister/gnu/lister.py
34–36

I have only changed origin_type, but if we are using tarball loader, then I suppose we have to change this according to that

ardumont added inline comments.
swh/lister/gnu/lister.py
34–36

I have only changed origin_type, but if we are using tarball loader, then I suppose we have to change this according to that

Yes, but there is no need for another code change

In swh-lister, my understanding of the current code is that now the task-type will change to load-tar.
Thus the migration i hinted at yesterday in irc.

13:36:01 +ardumont | archit_agrawal[m: could you change the lister-gnu's origin-type from gnu to tar btw?
13:36:38 +ardumont | it will also change the scheduler's task type from load-gnu to load-tar but i'll deal with 
                     that part later (impact is in the scheduler)
13:36:54 +ardumont | (as the output tasks do not load yet, that's not an issue)

For the loader-tar in question, indeed, currently it does not support the contract.
But that will be part of the refactoring of the other task i mentioned in irc in regards to the "distribution" or "package" loader (T1389).

As this runs in production, I need to:

  • migrate the actual data in swh-lister's db
update gnu_repo set origin_type='tar' where origin_type='gnu';
  • migrate the scheduler's data

From the top of my head:

# insert new task-type 'load-tar' (which does not exist yet IIRC)
insert into task_type(name, ...) values ('load-tar'...); # or some such
update task set task_type='load-tar' where task_type='load-gnu';
delete from task_type where task_type='load-gnu';
This revision is now accepted and ready to land.Jun 20 2019, 9:58 AM
This revision was automatically updated to reflect the committed changes.