Page MenuHomeSoftware Heritage

Unify loader instantiation
ClosedPublic

Authored by ardumont on Feb 12 2021, 4:11 PM.

Details

Summary

This unifies and centralizes the instantiation the same way the lister does.

This introduces a new base class swh.loader.core.loader.Loader for all loaders whose only
concern for now is to instantiate loaders from either a configuration dict or a
configuration file.

This simplify instantiation in celery task code and avoids duplicating the configuration
load in each loader constructor.

The end goal is to simplify the future refactoring on configuration. With the following,
we will only have to adapt the Loader class when we finally start simplifying uniformly
the configuration.

Also note that I mostly reused the equivalent swh.lister.pattern.Lister.from_config*.
I did not refactor the common behavior (to avoid throwing another dependency in the
mix). That could always be refactored later.

Unfortunately the diff is a "tag" big but we can't really work around it...

(inspired by both the work on listers and the configuration system work)

Impacts:

Related to T1410

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Branch
update-config
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 19263
Build 29874: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 29873: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D5071 (id=18096)

Rebasing onto 7ea69e2b70...

Current branch diff-target is up to date.
Changes applied before test
commit ee61ed44e4a5608d11ce7f7cac34a292625875cc
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 09:04:17 2021 +0100

    Unify loader instantiation
    
    This unifies and centralizes the instantiation the same way the lister does.
    
    This introduces a new base class `swh.loader.pattern.Loader` for all loaders whose only
    concern for now is to instantiate loaders from either a configuration dict or a
    configuration file.
    
    This simplify instantiation in celery task code and avoids duplicating the configuration
    load in each loader constructor.
    
    The end goal is to simplify the future refactoring on configuration. Now, we will only
    have to adapt the Loader class when we finally start simplifying uniformly the
    configuration.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/371/ for more details.

This revision is now accepted and ready to land.Feb 15 2021, 4:43 PM
  • Add missing test on Deposit.from_configfile
  • Drop unneeded conditional in cli

Build has FAILED

Patch application report for D5071 (id=18127)

Rebasing onto 7ea69e2b70...

Current branch diff-target is up to date.
Changes applied before test
commit 4dd956dcae9ba861b098e1ceba2e502233ea4057
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 09:04:17 2021 +0100

    Unify loader instantiation
    
    This unifies and centralizes the instantiation the same way the lister does.
    
    This introduces a new base class `swh.loader.pattern.Loader` for all loaders whose only
    concern for now is to instantiate loaders from either a configuration dict or a
    configuration file.
    
    This simplify instantiation in celery task code and avoids duplicating the configuration
    load in each loader constructor.
    
    The end goal is to simplify the future refactoring on configuration. Now, we will only
    have to adapt the Loader class when we finally start simplifying uniformly the
    configuration.

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/372/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/372/console

Build is green

Patch application report for D5071 (id=18128)

Rebasing onto 7ea69e2b70...

Current branch diff-target is up to date.
Changes applied before test
commit c0301f5365e01275d6175cdcc2cd540102d2b73a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 09:04:17 2021 +0100

    Unify loader instantiation
    
    This unifies and centralizes the instantiation the same way the lister does.
    
    This introduces a new base class `swh.loader.pattern.Loader` for all loaders whose only
    concern for now is to instantiate loaders from either a configuration dict or a
    configuration file.
    
    This simplify instantiation in celery task code and avoids duplicating the configuration
    load in each loader constructor.
    
    The end goal is to simplify the future refactoring on configuration. Now, we will only
    have to adapt the Loader class when we finally start simplifying uniformly the
    configuration.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/373/ for more details.

Add missing test on cli run edge case

Build is green

Patch application report for D5071 (id=18132)

Rebasing onto 7ea69e2b70...

Current branch diff-target is up to date.
Changes applied before test
commit 0781cdfd9ba4b9432869201299abc4572b7b00c5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 09:04:17 2021 +0100

    Unify loader instantiation
    
    This unifies and centralizes the instantiation the same way the lister does.
    
    This introduces a new base class `swh.loader.pattern.Loader` for all loaders whose only
    concern for now is to instantiate loaders from either a configuration dict or a
    configuration file.
    
    This simplify instantiation in celery task code and avoids duplicating the configuration
    load in each loader constructor.
    
    The end goal is to simplify the future refactoring on configuration. Now, we will only
    have to adapt the Loader class when we finally start simplifying uniformly the
    configuration.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/374/ for more details.

This is great, thanks!

Are the new pattern module / pattern.Loader class really needed? It looks like these methods could live in the BaseLoader class directly.

Are the new pattern module / pattern.Loader class really needed? It looks like these methods could live in the BaseLoader class directly.

PackageLoader does not actually inherit from BaseLoader so that's why i did not go that way.

@ardumont points out that the base PackageLoader doesn't inherit from BaseLoader, which explains the new (common) base class. I think the new class could just as well be next to BaseLoader, and doesn't warrant the introduction of a pattern module.

I wonder what would break if the new methods were just put in BaseLoader and the PackageLoader was made to inherit BaseLoader (now that the ABC has been converted to NotImplemented methods) instead of deepening the dependency graph for all loaders.

swh/loader/package/loader.py
128–129

You're dropping the only use of these parameters in tests, so you might as well drop them altogether.

@ardumont points out that the base PackageLoader doesn't inherit from BaseLoader, which explains the new (common) base class. I think the new class could just as well be next to BaseLoader, and doesn't warrant the introduction of a pattern module.

Oh, yes, totally, that could go in core.loader indeed.

I wonder what would break if the new methods were just put in BaseLoader and the PackageLoader was made to inherit BaseLoader

I don't think anything would break. I'm just not sure the "indirection" would be clear in terms of code readability...

(now that the ABC has been converted to NotImplemented methods) instead of deepening the dependency graph for all loaders.

Funny how I did not see it as deepening the graph (i see the point now though ;)

For me it creates indirection instead. The base behavior (BaseLoader) is not shared by the package loader. (Granted, that's already the case for DVCSLoader inheriting from BaseLoader...).
So i thought it was inappropriate to do so.

Like i said, i'll try.

Adapt according to review:

  • Drop swh.loader.pattern and move class Loader in swh.loader.core.loader module
  • Drop unneeded self.create_authorities, self.create_fetchers

Thanks.

(I'll try next the inheritance)

Build is green

Patch application report for D5071 (id=18159)

Rebasing onto 7ea69e2b70...

Current branch diff-target is up to date.
Changes applied before test
commit 0c36690f10b26a9916150a5606d3de2ad2d97cb6
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 09:04:17 2021 +0100

    Unify loader instantiation
    
    This unifies and centralizes the instantiation the same way the lister does.
    
    This introduces a new base class `swh.loader.pattern.Loader` for all loaders whose only
    concern for now is to instantiate loaders from either a configuration dict or a
    configuration file.
    
    This simplify instantiation in celery task code and avoids duplicating the configuration
    load in each loader constructor.
    
    The end goal is to simplify the future refactoring on configuration. Now, we will only
    have to adapt the Loader class when we finally start simplifying uniformly the
    configuration.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/375/ for more details.

I wonder what would break if the new methods were just put in BaseLoader and the PackageLoader was made to inherit BaseLoader

I don't think anything would break. I'm just not sure the "indirection" would be clear in terms of code readability...

I was wrong. mypy is not happy.

Some signatures would need changing, notably the load, prepare, prepare_origin_visit to drop the spurious (i think) {*args, **kwargs} we are declaring.
(I don't think they are still used any more now, aside for the extra logging arguments)

But still, i'd be confortable if we go that way to do it in another diff (maybe).

$ tox -e mypy
GLOB sdist-make: /home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/setup.py
...
mypy run-test: commands[0] | mypy swh
swh/loader/package/loader.py:289: error: Signature of "load" incompatible with supertype "BaseLoader"
swh/loader/package/loader.py:381: error: Argument "date" to "OriginVisit" has incompatible type "Optional[datetime]"; expected "datetime"
swh/loader/package/loader.py:570: error: Argument "discovery_date" to "RawExtrinsicMetadata" has incompatible type "Optional[datetime]"; expected "datetime"
swh/loader/package/loader.py:692: error: Argument "discovery_date" to "RawExtrinsicMetadata" has incompatible type "Optional[datetime]"; expected "datetime"
swh/loader/package/loader.py:727: error: Argument "discovery_date" to "RawExtrinsicMetadata" has incompatible type "Optional[datetime]"; expected "datetime"
swh/loader/package/loader.py:756: error: Argument "discovery_date" to "RawExtrinsicMetadata" has incompatible type "Optional[datetime]"; expected "datetime"
swh/loader/package/pypi/loader.py:151: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/npm/loader.py:177: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/nixguix/loader.py:218: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/deposit/loader.py:208: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/deposit/loader.py:244: error: Signature of "load" incompatible with supertype "BaseLoader"
swh/loader/package/debian/loader.py:237: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/cran/loader.py:129: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/archive/loader.py:170: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
Found 14 errors in 8 files (checked 71 source files)

I wonder what would break if the new methods were just put in BaseLoader and the PackageLoader was made to inherit BaseLoader

I don't think anything would break. I'm just not sure the "indirection" would be clear in terms of code readability...

I was wrong. mypy is not happy.

Some signatures would need changing, notably the load, prepare, prepare_origin_visit to drop the spurious (i think) {*args, **kwargs} we are declaring.
(I don't think they are still used any more now, aside for the extra logging arguments)

But still, i'd be confortable if we go that way to do it in another diff (maybe).

$ tox -e mypy
GLOB sdist-make: /home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/setup.py
...
mypy run-test: commands[0] | mypy swh
swh/loader/package/loader.py:289: error: Signature of "load" incompatible with supertype "BaseLoader"
swh/loader/package/loader.py:381: error: Argument "date" to "OriginVisit" has incompatible type "Optional[datetime]"; expected "datetime"
swh/loader/package/loader.py:570: error: Argument "discovery_date" to "RawExtrinsicMetadata" has incompatible type "Optional[datetime]"; expected "datetime"
swh/loader/package/loader.py:692: error: Argument "discovery_date" to "RawExtrinsicMetadata" has incompatible type "Optional[datetime]"; expected "datetime"
swh/loader/package/loader.py:727: error: Argument "discovery_date" to "RawExtrinsicMetadata" has incompatible type "Optional[datetime]"; expected "datetime"
swh/loader/package/loader.py:756: error: Argument "discovery_date" to "RawExtrinsicMetadata" has incompatible type "Optional[datetime]"; expected "datetime"
swh/loader/package/pypi/loader.py:151: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/npm/loader.py:177: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/nixguix/loader.py:218: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/deposit/loader.py:208: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/deposit/loader.py:244: error: Signature of "load" incompatible with supertype "BaseLoader"
swh/loader/package/debian/loader.py:237: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/cran/loader.py:129: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
swh/loader/package/archive/loader.py:170: error: Item "None" of "Optional[datetime]" has no attribute "isoformat"
Found 14 errors in 8 files (checked 71 source files)

This is actually manageable in D5088 (kept separate from this).
I'm not sure we want to land this though.
you tell me ;)

I wonder what would break if the new methods were just put in BaseLoader and the PackageLoader was made to inherit BaseLoader

I don't think anything would break. I'm just not sure the "indirection" would be clear in terms of code readability...

I was wrong. mypy is not happy.

Some signatures would need changing, notably the load, prepare, prepare_origin_visit to drop the spurious (i think) {*args, **kwargs} we are declaring.
(I don't think they are still used any more now, aside for the extra logging arguments)

But still, i'd be confortable if we go that way to do it in another diff (maybe).

Yeah, that makes sense.

I'm fine with this standalone change, we can do the base class refactoring in another diff.

Rework commit message (aligns with diff)

Build is green

Patch application report for D5071 (id=18167)

Rebasing onto 7ea69e2b70...

Current branch diff-target is up to date.
Changes applied before test
commit 7116bb75897ae878f68679e86aaa5aec6cc507be
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Feb 8 09:04:17 2021 +0100

    Unify loader instantiation
    
    This unifies and centralizes the instantiation the same way the lister does.
    
    This introduces a new base class swh.loader.core.loader.Loader for all loaders whose
    only concern for now is to instantiate loaders from either a configuration dict or a
    configuration file.
    
    This simplifies instantiation in celery task code and avoids duplicating the
    configuration load in each loader constructor.
    
    The end goal is to simplify the future refactoring on configuration. With the following,
    we will only have to adapt the Loader class when we start simplifying uniformly the
    configuration.
    
    Also note that I mostly reused the equivalent `swh.lister.pattern.Lister.from_config*`.
    I did not refactor the common behavior (to avoid throwing another dependency in the
    mix). That could always be refactored later.
    
    (inspired by both the work on listers and the configuration system work)
    
    Related to T1410

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/378/ for more details.

This revision was automatically updated to reflect the committed changes.