Page MenuHomeSoftware Heritage

Create a production read-only objstorage
Closed, ResolvedPublic

Description

As for T2682, a read-only objstorage is needed in production in order to retrieve contents from mirrors or for specific needs

The same configuration as declared for the r/o access of the webapp will be used:
By order of priority:

  • azure
  • banco
  • saam

Event Timeline

vsellier triaged this task as Normal priority.Sep 29 2021, 5:39 PM
vsellier created this task.
vsellier changed the task status from Open to Work in Progress.Oct 6 2021, 6:02 PM
vsellier moved this task from Backlog to in-progress on the System administration board.

rSENV646f62805ef564bceed4d3a4d84d8fb6890f2d19 declares the new certificate for the vagrant tests (wrong task on the commit message)

Hmm, do we really want this to be open to the world with no authentication whatsoever? (which is what D6448 seems to be doing)

Sure, we should have authentication / rate limit on this.
But if I'm not wrong, the target is to test the mirroring with ENEA.
If we add authentication, we need to improve the objstorage-replayer / objstorage to support it.

One improvment could also to move it from moma to avoid possible impacts on the webapp in case of request burst (which also means configuring a new public ip for this service).

@douardda could you explain the future milestones with ENEA?

For ENEA I'd llike to test different scenarios for the source objstorage:

  • from S3 using unauth access (for the sake of knowing how it goes),
  • from S3 using the libcloud based backend (aka with auth)
  • from a read-only objstorage RPC server within our infra (which is what this ticket is about).

Adding support for authentication in objstorage should not be a problem. I don't think objstorage-replayer is involved in this.

About the timing; for now I am blocked by ENEA's IT (blocking port 9093) but I expect this to be fixed any time soon now.

Aside from the specific needs of the mirroring stack, the question at hand is whether the read-only object storage should be by default open to the public or not.

As this is the first time we open it up, I think we should start from a conservative approach of not having it open up to the public without authentication and/or rate limiting, which would put it up to par with the Web API.
Later on we can consider to relax requirements, if our resources permit to do so.

So the main practical question is if we do have already a mechanism in place for having authentication (and/or rate limiting), ideally relying on keycloak, and if that authentication mechanism is compatible with the needs of the various use cases (including mirroring). Do we?

The diff (D6448) has been updated to support a basic authentication for the public part. The internal access will remain possible without any authentication.

@zack we have chosen the easy way to have the solution quickly deployed for this POC. It will probably be possible to implement the authentication through keycloack later but it will need some developments in the objstorage code base (I don't know the complexity).

@vsellier sure, and thanks! "Basic auth" is in the HTTP sense, right? So username/password pairs that we can add on demand, correct?

That's fine as a start.
Please file a separate task (for the future) about keyclock integration.

vsellier moved this task from in-progress to done on the System administration board.

T3702 and T3703 were created for the basic auth and keycloak authentication support

Thanks!

Where is the documentation on how to access the new read-only object storage?
(hint hint :-))

Good point :) I forgot to update the service urls.