Paths

Table of Contentst

Diffusion Push deposit 55ae87b13c39

server: Use xml.etree.ElementTree instead of nested dicts internally
55ae87b13c39
Actions

Tags

None

Subscribers

None

Description

server: Use xml.etree.ElementTree instead of nested dicts internally

This commit does not touch the external API though; ie. metadata_dict
is still present in the JSON API, and the equivalent jsonb field remains
in the database. They will probably be removed in a future commit
because they are not very useful, though.

Rationale:

I find xmltodict's approach of translating XML tree to native structures
to be intrinsically flawed for non-trivial handling of XML, because the
data structure is:

implementation-defined (by xmltodict, which is python-only) and it may change across versions
does not intrinsically store namespaces, and relies on an internal prefix map (though it isn't much of an issue right now, as we do not need composability and all the changed APIs are private)
not stable; for example, <a><b>foo</b></a> and <a><b>foo</b><b>bar</b></a> are encoded completely differently (the former is a Dict[str, str], the latter is Dict[str, list].

And every operation manipulating this data structure needs to check
presence, number *and* type on every access. Consider this part of this
commit for example:

-    swh_deposit = metadata.get("swh:deposit")
-    if not swh_deposit:
-        return None
-
-    swh_reference = swh_deposit.get("swh:reference")
-    if not swh_reference:
-        return None
-
-    swh_origin = swh_reference.get("swh:origin")
-    if swh_origin:
-        url = swh_origin.get("@url")
-        if url:
-            return url
+    ref_origin = metadata.find(
+        "swh:deposit/swh:reference/swh:origin[@url]", namespaces=NAMESPACES
+    )
+    if ref_origin is not None:
+        return ref_origin.attrib["url"]

the use of XPath makes it considerably shorter; and the original version
did not even check number/type (ie. it would crash if an element was
duplicated).

Details

Provenance

vlorentz	Authored on Feb 21 2022, 5:55 PM
vlorentz	Pushed on Feb 22 2022, 3:40 PM

Differential Revision

D7215: server: Use xml.etree.ElementTree instead of nested dicts internally

Parents

rDDEPb9f565aaa34c: deposit.cli.client: Allow user to define the metadata provenance url

Branches

Unknown

Tags

Unknown

Build Status

Buildable 27051
Build 42300: test-and-build	Jenkins console · Jenkins

Event Timeline

vlorentz committed rDDEP55ae87b13c39: server: Use xml.etree.ElementTree instead of nested dicts internally (authored by vlorentz).Feb 22 2022, 3:25 PM

vlorentz added an edge: D7215: server: Use xml.etree.ElementTree instead of nested dicts internally.Feb 22 2022, 3:40 PM

Harbormaster completed building B27051: rDDEP55ae87b13c39: server: Use xml.etree.ElementTree instead of nested dicts internally.Feb 22 2022, 3:43 PM

swh-public-ci mentioned this in D7222: Deduplicate parsing of add_to_origin/create_to_origin + add error logging in the client.Feb 22 2022, 3:46 PM

swh-public-ci mentioned this in D7225: Add a 'py3-clientonly' tox environment.Feb 22 2022, 5:49 PM

swh-public-ci mentioned this in D7226: deposit-list: Allow listing of deposit with their raw metadata if any.Feb 22 2022, 6:48 PM

swh-public-ci mentioned this in D7227: cli: Warn when metadata-only deposit without metadata provenance.Feb 23 2022, 10:28 AM

swh-public-ci mentioned this in D7174: Specify a new element to describe the provenance of deposit metadata.Feb 23 2022, 10:55 AM

Changes (14)

Path

Size

swh/

deposit/

api/

private/

deposit_check.py

deposit_read.py

cli/

tests/

api/

test_collection_add_to_origin.py

test_collection_post_multipart.py

test_deposit_private_check.py

cli/

rDDEP55ae87b13c39

swh/deposit/api/checks.py

Loading...

swh/deposit/api/common.py

Loading...

swh/deposit/api/edit.py

Loading...

swh/deposit/api/private/init.py

Loading...

swh/deposit/api/private/deposit_check.py

Loading...

swh/deposit/api/private/deposit_read.py

Loading...

swh/deposit/cli/client.py

Loading...

swh/deposit/tests/api/test_checks.py

Loading...

swh/deposit/tests/api/test_collection_add_to_origin.py

Loading...

swh/deposit/tests/api/test_collection_post_multipart.py

Loading...

swh/deposit/tests/api/test_deposit_private_check.py

Loading...

swh/deposit/tests/cli/test_client.py

Loading...

swh/deposit/tests/test_utils.py

Loading...

swh/deposit/utils.py

Loading...