Page MenuHomeSoftware Heritage

Improve test coverage and type coverage for copy_to
ClosedPublic

Authored by olasd on Jul 2 2020, 1:32 PM.

Details

Summary

Instead of the brittle csv we used to use, use postgresql text mode to transfer
data in the copy operation.

To check that this change makes sense, add test coverage for bytea[], bytea[][]
and timestamptz.

Test Plan

tox tests extended

Diff Detail

Repository
rDCORE Foundations and core functionalities
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D3394 (id=12037)

Rebasing onto 3a612cca29...

Current branch diff-target is up to date.
Changes applied before test
commit 89211fbd61a49a488555f770c3e503e7e9d6fdce
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jul 2 13:30:33 2020 +0200

    Improve test coverage and type coverage for copy_to
    
    Instead of the brittle csv we used to use, use postgresql text mode to transfer
    data in the `copy` operation.
    
    To check that this change makes sense, add test coverage for bytea[], bytea[][]
    and timestamptz.

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/15/ for more details.

douardda requested changes to this revision.Jul 2 2020, 2:17 PM

Looks globally fine to me, but I have a few comments/requests.

Also, not sure I can "see" that every path in escape() is actually tested here. Maybe via the strategies?

Are there edge cases that need attention? (like "patterns" (e.g. \N), embedded in string or bytes, or so; more or less any combination of calls to escape being used as raw input data, typically).

swh/core/db/__init__.py
73–74

while at it, this should also accept tuples (tuples are about to be the norm here with T2421).

78

Not sure I understand this. When do we expect a "NULL" value to reach this function? Is it only coming from a former call to escape?

In any case, this worth a comment IMHO.

swh/core/db/tests/test_db.py
42

why a tuple? or more precisely, why not make COLUMNS and ROW_OUT tuples also?

This revision now requires changes to proceed.Jul 2 2020, 2:17 PM
olasd marked 2 inline comments as done.

Reimplement copy_to escaping from the ground up.

  • Reference the PostgreSQL docs for every choice made
  • Add test coverage for more data types
  • Apply @douardda's comments on variable types and corner cases
olasd marked an inline comment as done.Jul 3 2020, 1:25 AM

Build is green

Patch application report for D3394 (id=12048)

Rebasing onto 3a612cca29...

Current branch diff-target is up to date.
Changes applied before test
commit 7124063e497a30b181b7b85dae4b61f9e4fe5120
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Jul 2 13:30:33 2020 +0200

    Reimplement PostgreSQL COPY escaping from first principles
    
    Instead of the brittle csv we used to use, use postgresql text mode to transfer
    data in the `copy` operation. Reference all relevant bits of the PostgreSQL
    documentation inline with the code.
    
    This also adds test coverage for most supported types, as well as making sure
    all escaping corner cases are covered.

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/16/ for more details.

Thx a lot, LGTM (can't say I've reviewed the tests very carefully though.)

This revision is now accepted and ready to land.Jul 3 2020, 10:41 AM