Page MenuHomeSoftware Heritage

swh.lister.gitlab: Make the lister instance parametric
ClosedPublic

Authored by ardumont on Jul 6 2018, 4:17 PM.

Details

Reviewers
olasd
Group Reviewers
Reviewers
Summary

This impacts:

  • the credentials reading which needs to be indexed per instance
  • the models since a new instance column needs to referenced and indexed

Related T989

Diff Detail

Repository
rDLS Listers
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 1306
Build 1650: arc lint + arc unit

Event Timeline

olasd requested changes to this revision.Jul 6 2018, 5:16 PM
olasd added a subscriber: olasd.

I'm not convinced about overloading the lister_name attribute to the GitLabLister class to store the name of the instance. I think the lister name should be a generic value, that works across all instances, and therefore it'd be more appropriate to have an explicit instance argument at lister class initialization.

For instance, overloading the lister_name attribute will change the path of the configuration file per instance (which you fix with D370). It also changes the default database.

swh/lister/core/models.py
45–51 ↗(On Diff #1146)

This should be in the GitLab model instantiation rather than here, unless we refactor all listers to have the instance field.

This revision now requires changes to proceed.Jul 6 2018, 5:16 PM

I'm not convinced about overloading the lister_name attribute to the GitLabLister class to store the name of the instance.
I think the lister name should be a generic value, that works across all instances, and therefore it'd be more appropriate to have an explicit instance argument at lister class initialization.

I'm not certain i understood you completely. What do you mean by generic value, do you have an example in mind?

I'm not sure i understand what is the benefit of separating those (lister-name, instance).
The instance column's purpose is to discrimininate between origins coming from different gitlab instances (when reading origins). So it needs to be different.
The lister name will be different by gitlab instances (so far in my manual tests, i named it according to the main domain's instance, gitlab.com, salsa.debian.org, etc...).
So, in my mind they can serve for that purpose.

Note:
I don't know if that helps or clarify my actual train of thought, still.
I have in mind to change the actual primary key from a unique sequential id to a composite key using the instance (as you proposed in D352).
Since, at the moment, the actual primary key won't help too much for existence test purposes.

swh/lister/core/models.py
45–51 ↗(On Diff #1146)

This should be in the GitLab model instantiation rather than here

Quite!

swh/lister/gitlab/lister.py
34

This is wrong!
I did not understand completely your main remark, but that sure helped me find this one.

I have in mind to use the same lister for all gitlab instances, so one db for all instances as well.

(I realized indeed and fixed in D370 what hit me as wrong but i missed that part altogether).

I think i will merge in this commit the content of D370 as well.

Fix according my understanding of the latest exchange

  • swh.lister.gitlab: Make the lister instance parametric
  • swh.lister.gitlab: Use one configuration for the gitlab lister
In D363#7190, @ardumont wrote:

I'm not convinced about overloading the lister_name attribute to the GitLabLister class to store the name of the instance.
I think the lister name should be a generic value, that works across all instances, and therefore it'd be more appropriate to have an explicit instance argument at lister class initialization.

I'm not certain i understood you completely. What do you mean by generic value, do you have an example in mind?

I think the lister_name for the GitLab lister should be gitlab, which allows us to use the default values for the config, the database name, etc. In my mind it should even be a class attribute rather than an instance attribute (that's not the case currently, but I think we should move towards that).

Maybe @fiendish can clarify his thoughts on the matter?

I'm not sure i understand what is the benefit of separating those (lister-name, instance).
The instance column's purpose is to discrimininate between origins coming from different gitlab instances (when reading origins). So it needs to be different.
The lister name will be different by gitlab instances (so far in my manual tests, i named it according to the main domain's instance, gitlab.com, salsa.debian.org, etc...).
So, in my mind they can serve for that purpose.

The need to override the default config and the database name to support several instances make me think that the lister name should really be a class attribute :)

Note:
I don't know if that helps or clarify my actual train of thought, still.
I have in mind to change the actual primary key from a unique sequential id to a composite key using the instance (as you proposed in D352).
Since, at the moment, the actual primary key won't help too much for existence test purposes.

Summary:

  • swh.lister: Make LISTER_NAME a class attribute
  • swh.lister.gitlab: make the 'instance' a constructor parameter

I think the lister_name for the GitLab lister should be gitlab, which allows us to use the default values for the config, the database name, etc. In my mind it should even be a class attribute rather than an instance attribute (that's not the case currently, but I think we should move towards that).

Ok.

LISTER_NAME is now a class attribute of the base lister class:

  • 'github.com' for the github lister (kept for compatibility with the current one)
  • 'bitbucket.com' for the bitbucket lister (kept for compatibility with the current one)
  • 'debian' for the debian lister
  • 'gitlab' for the gitlab lister

What's an instance attribute is the 'instance' property of the gitlab lister (As such, it's now a constructor parameter for it).

The need to override the default config and the database name to support several instances make me think that the lister name should really be a class attribute :)

Sold.


Note that slightly touches the D364 parameter now (due to the new 'instance' constructor parameter of the gitlab lister instance).

Add missing gitlab lister code (stored in stash)

swh/lister/gitlab/lister.py
85–86

'instance' is now an instance attribute initialized in constructor.

This revision is now accepted and ready to land.Jul 16 2018, 11:13 AM