Page MenuHomeSoftware Heritage

core.loader: Adapt according to latest storage api change
ClosedPublic

Authored by ardumont on Apr 3 2019, 2:19 PM.

Details

Summary
  • core.loader: Count only the new objects ingested
  • core.loader: Migrate to latest snapshot_add, origin_visit_update api

Build will fail because storage not tagged yet (133 does not exist yet).

Depends on D1329
Depends on D1321

Test Plan

pytest
(tox won't work as swh-storage is not tagged yet)

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

I wonder if it wouldn't make sense to just turn the counter attribute into a dict compatible with the output of the storage functions...

swh/loader/core/loader.py
286

Why this change? I don't think it will play well with a non-remote storage, because they return iterators.

swh/loader/core/loader.py
286

See D1321 which transformed the generator into a list to unify with other _add endpoints.
That simplifies the metrics extraction (counting length).

In D1337#28947, @olasd wrote:

I wonder if it wouldn't make sense to just turn the counter attribute into a dict compatible with the output of the storage functions...

Yes, i wonder as well, that's why i opened now that diff ;)
To discuss this.

snapshot_add now takes a list of snapshot as input

swh/loader/core/loader.py
487

That should be snapshot['id']

487

also needs an explicit keyword argument

ardumont marked 2 inline comments as done.

Fix api call to origin_visit_update api

It feels a little weird to have different counters in the dict and in the logs, but this is after filtering out the existing objects so the difference should be minimal anyway.

In a further step, it'd be nicer to make the loader-core counters compatible with the parsing done in swh.storage, so we can get a breakdown of operations per loader without jumping through too many hoops

swh/loader/core/loader.py
431

I'm not sure this was intended?

This revision is now accepted and ready to land.Apr 4 2019, 5:23 PM
swh/loader/core/loader.py
431

no, it was not ;)

Your 'weird' feeling about the counter discrepancy was shared and i started something.
But then i stopped myself because it's out of scope (first migration) and did not want to go there.

That's a left over!
Thanks for pointing it out.

Remove badly named counter

In a further step, it'd be nicer to make the loader-core counters compatible with the parsing done in swh.storage, so we can get a breakdown of operations per loader without jumping through too many hoops

i have opened T1626 to track this by the way.

Cheers,

This revision was automatically updated to reflect the committed changes.