Page MenuHomeSoftware Heritage

Deploy deposit v0.13 in staging
Closed, MigratedEdits Locked

Description

New deposit authentication is ready (v0.13.*)

Still found and fixed a couple of issues regarding exception rendering server and client
side now landed.

So it's time to deploy it, this will likely revealed some other paper cuts. They will be
fixed along the way.

Plan:

  • Package v0.13.1
  • staging: Install existing users into keycloak
  • Deploy configuration change (staging: authentication scheme with keycloak, prod: basic authentication as before for now)
  • staging: Ensure integration test is still happy
  • further checks (service document access with auth, refuse access, ...)
  • Deploy memcached
  • Checks
  • Wait a bit for complaints if any (there should be none as it's transparent for deposit clients)

Event Timeline

ardumont triaged this task as Normal priority.Mar 23 2021, 11:55 AM
ardumont created this task.

New Package built [1]

staging keycloak: Install users [2]

$ cd snippets/ardumont/keycloak
$ python -m migrate_users_to_keycloak --server-url https://auth.softwareheritage.org/auth/ --realm-name SoftwareHeritageStaging --admin-user ardumont --admin-pass $(pass ls ardumont/swh/keycloak-staging | head -1) --credentials-path ./staging-users.yaml
INFO:__main__:keycloak server: https://auth.softwareheritage.org/auth/ ; realm: SoftwareHeritageStaging
INFO:__main__:Client: swh-deposit ; client role: swh.deposit.api
INFO:__main__:User 'hal' installed.
INFO:__main__:User 'hal-preprod' installed.
INFO:__main__:User 'hal-test' installed.
INFO:__main__:User 'intel' installed.
INFO:__main__:User 'ipol' installed.
INFO:__main__:User 'swh' installed.
INFO:__main__:User 'cottagelabs-alexsdutton' installed.

[1] https://jenkins.softwareheritage.org/job/debian/job/packages/job/DDEP/job/gbp-buildpackage/107/

[2] staging-users.yaml: P983

  • Deploy configuration change (staging: authentication scheme with keycloak, prod: basic authentication as before for now)

Ensure puppet ran either forcing it manually through puppet agent --test or waiting for puppet to do its bidding.

  • staging: Ensure integration test is still happy

Connect to icinga as icinga user (that's the admin user of sort) and trigger the staging check [1]
Still green after it ran:

Plugin Output
DEPOSIT OK - Deposit took 39.34s and succeeded.
DEPOSIT OK - Deposit Metadata update took 2.35s and succeeded.

[1] https://icinga.softwareheritage.org/monitoring/list/services?service_state=0&(service=%2Adeposit%2A%7Cservice_display_name%3D%2Adeposit%2A)&sort=service_last_state_change&dir=desc#!/monitoring/service/show?host=pergamon.softwareheritage.org&service=staging%20Check%20deposit%20end-to-end

  • Further checks (service document access with auth, refuse access, ...)
# Reading the service document for the swh user
$ USER=swh
$ PASS=$(swhpass ls operations/deposit.softwareheritage.org/http-auth/swh | head -1)
$ curl -u "$USER:$PASS" https://deposit.staging.swh.network/1/servicedocument/
<?xml version="1.0" ?>
<service xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:sword="http://purl.org/net/sword/terms/"
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns="http://www.w3.org/2007/app">

    <sword:version>2.0</sword:version>
    <sword:maxUploadSize>209715200</sword:maxUploadSize>

    <workspace>
        <atom:title>The Software Heritage (SWH) Archive</atom:title>
        <collection href="https://deposit.staging.swh.network/1/swh/">
            <atom:title>swh Software Collection</atom:title>
            <accept>application/zip</accept>
            <accept>application/x-tar</accept>
            <sword:collectionPolicy>Collection Policy</sword:collectionPolicy>
            <dcterms:abstract>Software Heritage Archive</dcterms:abstract>
            <sword:treatment>Collect, Preserve, Share</sword:treatment>
            <sword:mediation>false</sword:mediation>
            <sword:metadataRelevantHeader>false</sword:metadataRelevantHeader>
            <sword:acceptPackaging>http://purl.org/net/sword/package/SimpleZip</sword:acceptPackaging>
            <sword:service>https://deposit.staging.swh.network/1/swh/</sword:service>
            <sword:name>swh</sword:name>
        </collection>
    </workspace>
</service>
# Check the status of deposit already done
$ swh deposit status --url https://deposit.staging.swh.network/1 \
  --username $USER \
  --password $PASS \
  --deposit-id 126 \
  --format json | jq .
{
  "deposit_id": "126",
  "deposit_status": "done",
  "deposit_status_detail": "The deposit has been successfully loaded into the Software Heritage archive",
  "deposit_swh_id": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9",
  "deposit_swh_id_context": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=https://www.softwareheritage.org/check-deposit-2021-03-23T13:07:20.601407;visit=swh:1:snp:3d44f9d26449f17f02a3f7f6af02edf3d304ae30;anchor=swh:1:rev:146696115c907ff9a24f6f457aef6c2d127d0b33;path=/",
  "deposit_external_id": "check-deposit-2021-03-23T13:07:20.601407"
}
# or inexistent
$ swh deposit status --url https://deposit.staging.swh.network/1 \
  --username $USER \
  --password $PASS \
  --deposit-id 128 \
  --format json | jq .
{
  "summary": "Deposit 128 does not exist",
  "detail": "",
  "sword:verboseDescription": "",
  "deposit_status": null,
  "deposit_status_detail": null,
  "deposit_swh_id": null,
  "status": 404
}
# And failure to identify through keycloak expectedly raise
$ swh deposit status --url https://deposit.staging.swh.network/1 \
  --username $USER-inexistent \
  --password $PASS \
  --deposit-id 128
ERROR:swh.deposit.cli.client:Problem during parsing options: Service document retrieval: invalid_grant: Invalid user credentials

Note: This ^ uses the credential store to retrieve the http auth credentials for said user.

Next step is actually deploy a memcache so the cache is shared amongst deposit server threads.
In the current deployment configuration, it's not the case, thus not shared.

ardumont renamed this task from Deploy deposit v0.13.1 to Deploy deposit v0.13.Mar 23 2021, 5:45 PM
ardumont changed the task status from Open to Work in Progress.Mar 23 2021, 6:41 PM
ardumont moved this task from Backlog to in-progress on the System administration board.
ardumont renamed this task from Deploy deposit v0.13 to Deploy deposit v0.13 in staging.Mar 24 2021, 10:41 AM
ardumont updated the task description. (Show Details)
  • Deploy memcached

It seems to be working, but it seems a bit flaky.

Checks are fine and the cache is used:

root@deposit:~# echo "stats" | nc 127.0.0.1 11211 | grep hits
STAT get_hits 37
STAT delete_hits 0
STAT incr_hits 0
STAT decr_hits 0
STAT cas_hits 0
STAT touch_hits 0

Checking in ui interface for the staging realm SoftwareHeritageStaging, we can see the session logged in there.

$ curl -u $USER:$PASS https://deposit.staging.swh.network/1/servicedocument/
<?xml version="1.0" ?>
<service xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:sword="http://purl.org/net/sword/terms/"
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns="http://www.w3.org/2007/app">

    <sword:version>2.0</sword:version>
    <sword:maxUploadSize>209715200</sword:maxUploadSize>

    <workspace>
        <atom:title>The Software Heritage (SWH) Archive</atom:title>
        <collection href="https://deposit.staging.swh.network/1/hal-preprod/">
            <atom:title>hal-preprod Software Collection</atom:title>
            <accept>application/zip</accept>
            <accept>application/x-tar</accept>
            <sword:collectionPolicy>Collection Policy</sword:collectionPolicy>
            <dcterms:abstract>Software Heritage Archive</dcterms:abstract>
            <sword:treatment>Collect, Preserve, Share</sword:treatment>
            <sword:mediation>false</sword:mediation>
            <sword:metadataRelevantHeader>false</sword:metadataRelevantHeader>
            <sword:acceptPackaging>http://purl.org/net/sword/package/SimpleZip</sword:acceptPackaging>
            <sword:service>https://deposit.staging.swh.network/1/hal-preprod/</sword:service>
            <sword:name>hal-preprod</sword:name>
        </collection>
    </workspace>
</service>

But from time to time, issues arose keycloak side [1]:

Mar 24 11:37:30 kelvingrove standalone.sh[946]: 11:37:30,352 ERROR [org.keycloak.services.error.KeycloakErrorHandler] (default task-216) Uncaught server error: org.keycloak.authentication.AuthenticationFlowException: authenticator: direct-grant-validate-password

[1] P985

  • Checks (see comment from yesterday)

Also, i see deposit staging user (besides our swh bot) which deposited successfully stuff since the new deployment without complaint ;)
So I think we are getting there.

ardumont updated the task description. (Show Details)

no complaint, checks green.
Time to do the same on production.