Page MenuHomeSoftware Heritage

staging infra: Reproduce existing production setup in a compact way
Closed, MigratedEdits Locked

Description

Dependency order to follow

  • gateway (routing)
  • swh-storage (db, swh-storage service)
  • swh-objstorage (disks, swh-objstorage service)
  • swh-indexer storage (db, service)
  • swh-web
  • swh-scheduler (db, rabbitmq, swh-scheduler services)
  • swh-deposit (db, service)
  • 1 worker with at least 1 loader (worker0 & worker1 with loader-git)
  • swh-vault
  • update workers with checker/loader deposit
  • update workers with 1 lister (forge=gitlab, instance=inria)
  • update workers with 1 indexer
  • Make icinga checks green on the expected origins (parmap, cpython, etc...)

Note:

  • Should be a matter of creating the right branch ('staging' for example) in swh-site's repository
  • calling the right environment when `puppet agent --test --noop --environment='staging'
  • production code should only be modified if there are issues identified (there was ;)
  • in the end the staging branch should be merged into production (without any impact on production, that's somehow has been my implicit plan)

Plan:
From provisioning node (creation in our hypervisor) to delegating puppet for the configuration/installation services
For some of our services, this is ambitious:

  • postgres: puppetized the db creation and user done
  • postgres: bootstrap db schema (-> this one needs work upfront in our different services to have a common use interface) [1]
  • rabbitmq: server installation with users setup
  • rabbitmq: improve the configuration option to be in sync with our current instance (P493)

Tests:

  • from webapp: save code now -> browsing ok
  • push a deposit via deposit client cli -> browsing in webapp ok
  • from webapp: request vault cooking -> download in webapp ok
  • listing one forge and see new origins from that forge [2]
  • origin-content-metadata running

[1] I consider this out of scope (i need to draw a line somewhere ;)
Also we as a team started this, that needs to be finalized IIRC.

[2] https://webapp.internal.staging.swh.network/browse/search/?q=gitlab.inria.fr&with_visit&with_content

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)

Progress on the deposit part:

$ swh deposit upload --url http://deposit.internal.staging.swh.network \
    --username hal-preprod \
    --password <pass> \
    --archive jesuisgpl.tgz \
    --name jesuisgpl \
    --author zack
INFO:swh.deposit.cli.client:{'deposit_id': '2', 'deposit_status': 'deposited', 'deposit_status_detail': None, 'deposit_date': 'Aug. 30, 2019, 2:08 p.m.'}

# deposit-id 1 fails because the metadata we incomplete 
$ swh deposit status --url http://deposit.internal.staging.swh.network \
    --username hal-preprod \
    --password <pass> \
    --deposit-id 1
INFO:swh.deposit.cli.client:{'deposit_id': '1', 'deposit_status': 'rejected', 'deposit_status_detail': '- Mandatory fields are missing (author)\n- Mandatory alternate fields are missing (name or title)', 'deposit_swh_id': None, 'deposit_swh_id_context': None, 'deposit_swh_anchor_id': None, 'deposit_swh_anchor_id_context': None, 'deposit_external_id': '8740a2d3-d11c-4daf-8968-a6554dacbbc2'}

# deposit-id 2 is complete (so should work)
$ swh deposit status --url http://deposit.internal.staging.swh.network \
    --username hal-preprod \
    --password <pass> \
    --deposit-id 2
INFO:swh.deposit.cli.client:{'deposit_id': '2', 'deposit_status': 'verified', 'deposit_status_detail': 'Deposit is fully received, checked, and ready for loading', 'deposit_swh_id': None, 'deposit_swh_id_context': None, 'deposit_swh_anchor_id': None, 'deposit_swh_anchor_id_context': None, 'deposit_external_id': '8a60afdb-f424-46ec-abc6-49a03ccefce3'}

# but it fails for storage migration reasons
$ swh deposit status --url http://deposit.internal.staging.swh.network \
    --username hal-preprod \
    --password <pass> \
    --deposit-id 2
INFO:swh.deposit.cli.client:{'deposit_id': '2', 'deposit_status': 'failed', 'deposit_status_detail': 'The deposit loading into the Software Heritage archive failed', 'deposit_swh_id': None, 'deposit_swh_id_context': None, 'deposit_swh_anchor_id': None, 'deposit_swh_anchor_id_context': None, 'deposit_external_id': '8a60afdb-f424-46ec-abc6-49a03ccefce3'}

Now the deposit loading fails because we have some changes in the storage layer (origin-id are no longer ids but url):

Aug 30 14:08:34 worker1 python3[11461]: [2019-08-30 14:08:34,917: ERROR/ForkPoolWorker-1] Loading failure, updating to `partial` status
                                        Traceback (most recent call last):
                                          File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 876, in load
                                            self.store_metadata()
                                          File "/usr/lib/python3/dist-packages/swh/deposit/loader/loader.py", line 93, in store_metadata
                                            tool_id, metadata)
                                          File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f
                                            return Retrying(*dargs, **dkw).call(f, *args, **kw)
                                          File "/usr/lib/python3/dist-packages/retrying.py", line 206, in call
                                            return attempt.get(self._wrap_exception)
                                          File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get
                                            six.reraise(self.value[0], self.value[1], self.value[2])
                                          File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
                                            raise value
                                          File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call
                                            attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
                                          File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 352, in send_origin_metadata
                                            self.origin['url'], visit_date, provider_id, tool_id, metadata)
                                          File "/usr/lib/python3/dist-packages/swh/storage/api/client.py", line 241, in origin_metadata_add
                                            'metadata': metadata})
                                          File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 205, in post
                                            return self._decode_response(response)
                                          File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 237, in _decode_response
                                            raise pickle.loads(decode_response(response))
                                        psycopg2.DataError: invalid input syntax for integer: "https://inria.halpreprod.archives-ouvertes.fr/8a60afdb-f424-46ec-abc6-49a03ccefce3"
                                        LINE 2: ...          provider_id, tool_id, metadata) values ('https://i...

@vlorentz (stacktrace) ^

I think we missed the storage origin_metadata_add endpoint during the migration out of the origin id (both in-memory and pg storages).
The loader-core (0.0.44) has been changed to provide the url instead of the id but the storage endpoint still expects the origin-id.
That's my understanding, do you concur?

TIA

Indeed, I did not change it because I expected to do it at the same time as a refactoring of origin_metadata_* (aka implementing D1614).
I'll fix it asap

Finally, after updating plenty of our modules (model, storage, ...), loader-deposit is fixed, so i can finally cross that box! \m/

Sep 05 08:31:03 worker1 python3[11272]: [2019-09-05 08:31:03,260: INFO/ForkPoolWorker-1] Task swh.deposit.loader.tasks.LoadDepositArchiveTsk[da9f066e-4c44-4053-81a1-e38db35d6149] succeeded in 2.102082097902894s: {'status': 'eventful'}

Cheers,

Stand-by as package-loader priority T1389 took over.
Staging infra will be used when package-loader land (which will need some work debian packaging + configuration updates).

And then starting back the work to finalize it (one indexer which pulls a kafka).

ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)

It's fairly complete already.

For the remaining indexer which pulls a kafka, we can always improve on this later.