Page MenuHomeSoftware Heritage

swh.deposit.parsers: Do not lose information during parsing
ClosedPublic

Authored by ardumont on Jul 5 2018, 6:55 PM.

Details

Summary

The current behavior of the internal libraries that parses the xml into dict merges entries with the same name.

In effect, we are losing information (e.g author).
That's not a correct behavior (e.g author, license, ...) and the diff fixes that.

Close T1131

Test Plan

Tests ok

Diff Detail

Repository
rDDEP Push deposit
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 1274
Build 1618: arc lint + arc unit

Event Timeline

swh/deposit/parsers.py
42

I don't like that much but i think that will do for the moment.

swh/deposit/tests/loader/test_loader.py
278

We apparently miss the data loss here.

swh/deposit/tests/loader/test_loader.py
278

Indeed, the source is ok with me ;)

  • swh.deposit.parsers: Update docstring
swh/deposit/parsers.py
42

I don't like it either,
There are many more entries that can be lists and I don't know exactly which..
If you want I can check, but I feel like doing so we will always have the risk of loosing other data.
And sometime there is only one entry per tag that will be forced into a list element.

swh/deposit/parsers.py
42

Yes, that's one of the reason i don't like it.

losing other data...

affiliation comes to mind for example.


For information, I'm currently working on another implementation that drops the XMLParser altogether.
It takes care of the current limitation.
It also permits to drop the extra repetitive namespace in keys.

I'll update the diff as soon possible.

Move away from django_restframework_xml

Improve implementation to be generic and complete

  • swh.deposit.parsers: Update docstring
  • swh.deposit.parsers: Simplify current xml parsing
swh/deposit/tests/api/test_parser.py
42

Namespaces are not lost, they are just new key entries.

This revision is now accepted and ready to land.Jul 6 2018, 3:00 PM
This revision was automatically updated to reflect the committed changes.