HomeSoftware Heritage

server: Use xml.etree.ElementTree instead of nested dicts internally

This commit no longer exists in the repository. It may have been part of a branch which was deleted.

Description

server: Use xml.etree.ElementTree instead of nested dicts internally

This commit does not touch the external API though; ie. metadata_dict
is still present in the JSON API, and the equivalent jsonb field remains
in the database. They will probably be removed in a future commit
because they are not very useful, though.

Rationale:

I find xmltodict's approach of translating XML tree to native structures
to be intrinsically flawed for non-trivial handling of XML, because the
data structure is:

  • implementation-defined (by xmltodict, which is python-only) and it may change across versions
  • does not intrinsically store namespaces, and relies on an internal prefix map (though it isn't much of an issue right now, as we do not need composability and all the changed APIs are private)
  • not stable; for example, <a><b>foo</b></a> and <a><b>foo</b><b>bar</b></a> are encoded completely differently (the former is a Dict[str, str], the latter is Dict[str, list].

And every operation manipulating this data structure needs to check
presence, number *and* type on every access. Consider this part of this
commit for example:

-    swh_deposit = metadata.get("swh:deposit")
-    if not swh_deposit:
-        return None
-
-    swh_reference = swh_deposit.get("swh:reference")
-    if not swh_reference:
-        return None
-
-    swh_origin = swh_reference.get("swh:origin")
-    if swh_origin:
-        url = swh_origin.get("@url")
-        if url:
-            return url
+    ref_origin = metadata.find(
+        "swh:deposit/swh:reference/swh:origin[@url]", namespaces=NAMESPACES
+    )
+    if ref_origin is not None:
+        return ref_origin.attrib["url"]

the use of XPath makes it considerably shorter; and the original version
did not even check number/type (ie. it would crash if an element was
duplicated).

Details

Provenance
vlorentzAuthored on Feb 21 2022, 5:55 PM
vlorentzPushed on Feb 22 2022, 3:40 PM
Differential Revision
D7215: server: Use xml.etree.ElementTree instead of nested dicts internally
Build Status
Buildable 27051
Build 42300: test-and-buildJenkins console · Jenkins

Commit No Longer Exists

This commit no longer exists in the repository.