Page MenuHomeSoftware Heritage

client: Do not depend on how the server chunks the response
ClosedPublic

Authored by vlorentz on Mar 1 2022, 3:14 PM.

Details

Reviewers
olasd
Group Reviewers
Reviewers
Maniphest Tasks
Restricted Maniphest Task
Summary

Currently, the server uses HTTP chunked encoding to send object ids
one by one, each in its own HTTP frame.

I think this is a mistake to rely on such a detail of the HTTP protocol
in a high-level API like this.

Additionally, a future commit will rewrite the server to use Flask
instead of aiohttp, which does not allow this kind of fine-grained
control about HTTP chunks.

Diff Detail

Event Timeline

Build is green

Patch application report for D7268 (id=26327)

Could not rebase; Attempt merge onto 76a5f36b41...

Updating 76a5f36..ff319b6
Fast-forward
 swh/objstorage/api/client.py              | 15 ++++++-------
 swh/objstorage/api/server.py              | 36 -------------------------------
 swh/objstorage/backends/pathslicing.py    | 16 ++++++++++++++
 swh/objstorage/backends/seaweedfs/http.py |  4 +---
 4 files changed, 24 insertions(+), 47 deletions(-)
Changes applied before test
commit ff319b60b2221333074d16659f8222a0d19bfe66
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 1 15:13:43 2022 +0100

    client: Do not depend on how the server chunks the response
    
    Currently, the server uses HTTP chunked encoding to send object ids
    one by one, each in its own HTTP frame.
    
    I think this is a mistake to rely on such a detail of the HTTP protocol
    in a high-level API like this.
    
    Additionally, a future commit will rewrite the server to use Flask
    instead of aiohttp, which does not allow this kind of fine-grained
    control about HTTP chunks.

commit b159f04b8d591947003cf3a4cd15fbcf0f3cb31e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 1 13:36:02 2022 +0100

    Remove method add_stream from the RPC API.
    
    Rationale:
    
    1. Only the pathslicing backend implements it
    2. No other package uses it (besides a dead code path in the vault)
    3. A future commit will rewrite the RPC server to use Flask instead of
       aiohttp, and rewriting this view correctly is going to be hard
       (though possible)

See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/120/ for more details.

Does the get_stream method guarantee the size of the received chunks? By default I would assume that it chunks contents using the passed size as a maximum.

Build is green

Patch application report for D7268 (id=26343)

Rebasing onto 76a5f36b41...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-127-D7268.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/127/ for more details.

@vlorentz looks like you didn't push the right commit in this diff update

ahahah what

looks like arc diff --update D7268 -m "fix chunking" HEAD^ while resolving a git-rebase todo error doesn't do what I thought it would do

Build has FAILED

Patch application report for D7268 (id=26344)

Could not rebase; Attempt merge onto 76a5f36b41...

Updating 76a5f36..a2424b7
Fast-forward
 swh/objstorage/api/client.py              | 16 +++++++-------
 swh/objstorage/api/server.py              | 36 -------------------------------
 swh/objstorage/backends/pathslicing.py    | 16 ++++++++++++++
 swh/objstorage/backends/seaweedfs/http.py |  4 +---
 4 files changed, 25 insertions(+), 47 deletions(-)
Changes applied before test
commit a2424b7a13e9a696329a9e9efa6c86f9864dd60b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 1 15:13:43 2022 +0100

    client: Do not depend on how the server chunks the response
    
    Currently, the server uses HTTP chunked encoding to send object ids
    one by one, each in its own HTTP frame.
    
    I think this is a mistake to rely on such a detail of the HTTP protocol
    in a high-level API like this.
    
    Additionally, a future commit will rewrite the server to use Flask
    instead of aiohttp, which does not allow this kind of fine-grained
    control about HTTP chunks.

commit b159f04b8d591947003cf3a4cd15fbcf0f3cb31e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 1 13:36:02 2022 +0100

    Remove method add_stream from the RPC API.
    
    Rationale:
    
    1. Only the pathslicing backend implements it
    2. No other package uses it (besides a dead code path in the vault)
    3. A future commit will rewrite the RPC server to use Flask instead of
       aiohttp, and rewriting this view correctly is going to be hard
       (though possible)

Link to build: https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/128/
See console output for more information: https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/128/console

Build has FAILED

Patch application report for D7268 (id=26345)

Could not rebase; Attempt merge onto 76a5f36b41...

Updating 76a5f36..ebfdab5
Fast-forward
 requirements-swh.txt                      |  2 +-
 swh/objstorage/api/client.py              | 16 +++++++-------
 swh/objstorage/api/server.py              | 36 -------------------------------
 swh/objstorage/backends/pathslicing.py    | 16 ++++++++++++++
 swh/objstorage/backends/seaweedfs/http.py |  4 +---
 5 files changed, 26 insertions(+), 48 deletions(-)
Changes applied before test
commit ebfdab59fd28a18ebef09d398b128ea043eaf0e9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 1 15:13:43 2022 +0100

    client: Do not depend on how the server chunks the response
    
    Currently, the server uses HTTP chunked encoding to send object ids
    one by one, each in its own HTTP frame.
    
    I think this is a mistake to rely on such a detail of the HTTP protocol
    in a high-level API like this.
    
    Additionally, a future commit will rewrite the server to use Flask
    instead of aiohttp, which does not allow this kind of fine-grained
    control about HTTP chunks.

commit b159f04b8d591947003cf3a4cd15fbcf0f3cb31e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 1 13:36:02 2022 +0100

    Remove method add_stream from the RPC API.
    
    Rationale:
    
    1. Only the pathslicing backend implements it
    2. No other package uses it (besides a dead code path in the vault)
    3. A future commit will rewrite the RPC server to use Flask instead of
       aiohttp, and rewriting this view correctly is going to be hard
       (though possible)

Link to build: https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/129/
See console output for more information: https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/129/console

(build is failing because of the dep on D7275)

Build has FAILED

Patch application report for D7268 (id=26345)

Could not rebase; Attempt merge onto 76a5f36b41...

Updating 76a5f36..ebfdab5
Fast-forward
 requirements-swh.txt                      |  2 +-
 swh/objstorage/api/client.py              | 16 +++++++-------
 swh/objstorage/api/server.py              | 36 -------------------------------
 swh/objstorage/backends/pathslicing.py    | 16 ++++++++++++++
 swh/objstorage/backends/seaweedfs/http.py |  4 +---
 5 files changed, 26 insertions(+), 48 deletions(-)
Changes applied before test
commit ebfdab59fd28a18ebef09d398b128ea043eaf0e9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 1 15:13:43 2022 +0100

    client: Do not depend on how the server chunks the response
    
    Currently, the server uses HTTP chunked encoding to send object ids
    one by one, each in its own HTTP frame.
    
    I think this is a mistake to rely on such a detail of the HTTP protocol
    in a high-level API like this.
    
    Additionally, a future commit will rewrite the server to use Flask
    instead of aiohttp, which does not allow this kind of fine-grained
    control about HTTP chunks.

commit b159f04b8d591947003cf3a4cd15fbcf0f3cb31e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 1 13:36:02 2022 +0100

    Remove method add_stream from the RPC API.
    
    Rationale:
    
    1. Only the pathslicing backend implements it
    2. No other package uses it (besides a dead code path in the vault)
    3. A future commit will rewrite the RPC server to use Flask instead of
       aiohttp, and rewriting this view correctly is going to be hard
       (though possible)

Link to build: https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/130/
See console output for more information: https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/130/console

Build is green

Patch application report for D7268 (id=26345)

Could not rebase; Attempt merge onto 76a5f36b41...

Updating 76a5f36..ebfdab5
Fast-forward
 requirements-swh.txt                      |  2 +-
 swh/objstorage/api/client.py              | 16 +++++++-------
 swh/objstorage/api/server.py              | 36 -------------------------------
 swh/objstorage/backends/pathslicing.py    | 16 ++++++++++++++
 swh/objstorage/backends/seaweedfs/http.py |  4 +---
 5 files changed, 26 insertions(+), 48 deletions(-)
Changes applied before test
commit ebfdab59fd28a18ebef09d398b128ea043eaf0e9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 1 15:13:43 2022 +0100

    client: Do not depend on how the server chunks the response
    
    Currently, the server uses HTTP chunked encoding to send object ids
    one by one, each in its own HTTP frame.
    
    I think this is a mistake to rely on such a detail of the HTTP protocol
    in a high-level API like this.
    
    Additionally, a future commit will rewrite the server to use Flask
    instead of aiohttp, which does not allow this kind of fine-grained
    control about HTTP chunks.

commit b159f04b8d591947003cf3a4cd15fbcf0f3cb31e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 1 13:36:02 2022 +0100

    Remove method add_stream from the RPC API.
    
    Rationale:
    
    1. Only the pathslicing backend implements it
    2. No other package uses it (besides a dead code path in the vault)
    3. A future commit will rewrite the RPC server to use Flask instead of
       aiohttp, and rewriting this view correctly is going to be hard
       (though possible)

See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/134/ for more details.

Cool, thanks (see inline comment).

swh/objstorage/api/client.py
12–13

This should probably be derived from swh.objstorage.objstorage.ID_HASH_LENGTH (or at least set there)

This revision is now accepted and ready to land.Mar 2 2022, 12:59 PM
swh/objstorage/api/client.py
12–13

Done in D7277

a2424b7a13e9a696329a9e9efa6c86f9864dd60b

vlorentz added a task: Restricted Maniphest Task.Mar 7 2022, 9:25 AM