I did some requests on Spark to export tables I had already exported on Amazon, and I found a lot of weird discrepancies. It seems that the data exported from Spark systematically has some amount of rows missing, when I compare it to the corresponding dataset exported from Amazon.
1st example, exporting all the nodes in a single query that does a UNION of all the relevant tables yields on Spark:
4671443206 cnt 4422303776 dir 9907464 rel 1125083793 rev 57144153 snp
The counts are good for everything except the content table, with exactly 410820000 contents missing.
2nd example, exporting the "edges" by unnesting the directory layer:
dir_to_rev Spark: 434459032 Amazon: 481829426 (2827695890 missing -- 9.831%) dir_to_dir Spark: 45488558029 Amazon: 48316253919 (47370394 missing -- 5.852%) dir_to_file Spark: 91186016707 Amazon: 112363058067 (21177041360 missing -- 18.847%)