Page MenuHomeSoftware Heritage

Add a schema type for handling binary strings
ClosedPublic

Authored by jayeshv on May 20 2022, 3:32 PM.

Details

Summary

A type with two fields, text and base64, is returned for every binary string.
Any decode error in the text field will be ignored.

Diff Detail

Repository
rDGQL GraphQL API
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7874 (id=28423)

Rebasing onto d5010fd75f...

Current branch diff-target is up to date.
Changes applied before test
commit e03f4cac16f004ee03920aae032c429ef0b208e1
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon May 16 12:22:49 2022 +0200

    Add a schema type for handling binary strings
    
    A type with two fields, text and base64, is returned for every binary string.
    Any decode error in the text field will be ignored.

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/25/ for more details.

Build is green

Patch application report for D7874 (id=28424)

Rebasing onto d5010fd75f...

Current branch diff-target is up to date.
Changes applied before test
commit cb1f34271867484b8c25fe5d5dfdcb624d6ea369
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon May 16 12:22:49 2022 +0200

    Add a schema type for handling binary strings
    
    A type with two fields, text and base64, is returned for every binary string.
    Any decode error in the text field will be ignored.

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/26/ for more details.

vlorentz added inline comments.
swh/graphql/schema/schema.graphql
51
113

btw, they are actually IRIs, not URLs; but we can fix that later

379

how does it work to filter on non-unicode strings?

swh/graphql/schema/schema.graphql
379

@vlorentz We can accept the same object here. User can either use 'text' or a 'base64' string.
Is that really needed, how often we get such filters, should we add complexity for a rather rare case?

anlambert added a subscriber: anlambert.
anlambert added inline comments.
swh/graphql/resolvers/resolvers.py
251

Use swh.core.utils.decode_with_escape instead.

This is what is currently used in REST API to attempt decoding UTF-8 strings from bytes.

It has the advantage to escape bytes of invalid utf-8 sequences instead of discarding them so users can see
something went wrong while attempting decoding such strings.

swh/graphql/utils/utils.py
11

I do not think you need a variable for this, just put the "utf-8" literal directly in the code.

17

.decode("ascii")

This revision now requires changes to proceed.May 20 2022, 4:35 PM
swh/graphql/schema/schema.graphql
379

I don't know; probably not.

On the other hand, it forces users to think about that case.

swh/graphql/resolvers/resolvers.py
251

so users can see something went wrong

we might as well use .decode(utils.ENCODING, "replace") then; that's what it is for

swh/graphql/resolvers/resolvers.py
251

correct me if I am wrong but that error handler replaces invalid Unicode character by the U+FFFD character, it will not backslash escape the bytes

251

oh OK got it, it is more visual with that character that something went wrong

Addressing review comments

jayeshv added inline comments.
swh/graphql/utils/utils.py
11

This is used in resolvers.py as well.

Build is green

Patch application report for D7874 (id=28435)

Rebasing onto d5010fd75f...

Current branch diff-target is up to date.
Changes applied before test
commit 0cabd8ee94b7100b8ea3d3949f81f8529d34d971
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon May 16 12:22:49 2022 +0200

    Add a schema type for handling binary strings
    
    A type with two fields, text and base64, is returned for every binary string.
    Any decode error in the text field will be ignored.

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/27/ for more details.

This revision was not accepted when it landed; it landed in state Needs Review.May 24 2022, 1:55 PM
This revision was automatically updated to reflect the committed changes.