Page MenuHomeSoftware Heritage

staging infra: Reproduce existing production setup in a compact way
Started, Work in Progress, HighPublic

Description

Dependency order to follow

  • gateway (routing)
  • swh-storage (db, swh-storage service)
  • swh-objstorage (disks, swh-objstorage service)
  • swh-indexer storage (db, service)
  • swh-web
  • swh-scheduler (db, rabbitmq, swh-scheduler services)
  • swh-deposit (db, service)
  • 1 worker with at least 1 loader (worker0 & worker1 with loader-git)
  • swh-vault
  • update workers with checker/loader deposit
  • update workers with 1 lister (forge=gitlab, instance=inria)
  • update workers with 1 indexer
  • Make icinga checks green on the expected origins (parmap, cpython, etc...)

Note:

  • Should be a matter of creating the right branch ('staging' for example) in swh-site's repository
  • calling the right environment when `puppet agent --test --noop --environment='staging'
  • production code should only be modified if there are issues identified (there was ;)
  • in the end the staging branch should be merged into production (without any impact on production, that's somehow has been my implicit plan)

Plan:
From provisioning node (creation in our hypervisor) to delegating puppet for the configuration/installation services
For some of our services, this is ambitious:

  • postgres: puppetized the db creation and user done
  • postgres: bootstrap db schema (-> this one needs work upfront in our different services to have a common use interface) [1]
  • rabbitmq: server installation with users setup
  • rabbitmq: improve the configuration option to be in sync with our current instance (P493)

Tests:

  • from webapp: save code now -> browsing ok
  • push a deposit via deposit client cli -> browsing in webapp ok
  • from webapp: request vault cooking -> download in webapp ok
  • listing one forge and see new origins from that forge [2]
  • origin-intrinsic-metadata running

[1] I consider this out of scope (i need to draw a line somewhere ;)
Also we as a team started this, that needs to be finalized IIRC.

[2] https://webapp.internal.staging.swh.network/browse/search/?q=gitlab.inria.fr&with_visit&with_content

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ardumont changed the task status from Open to Work in Progress.Aug 1 2019, 12:56 PM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)Aug 7 2019, 4:59 PM
ardumont updated the task description. (Show Details)Aug 7 2019, 6:41 PM
ardumont updated the task description. (Show Details)Aug 8 2019, 4:49 AM
ardumont updated the task description. (Show Details)Aug 8 2019, 3:15 PM
ardumont updated the task description. (Show Details)Aug 8 2019, 5:10 PM
ardumont updated the task description. (Show Details)Aug 8 2019, 7:28 PM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)Aug 8 2019, 7:47 PM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)Aug 8 2019, 7:50 PM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)Aug 8 2019, 10:17 PM
ardumont claimed this task.Aug 8 2019, 11:52 PM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)Aug 29 2019, 2:27 PM
ardumont updated the task description. (Show Details)Aug 29 2019, 3:57 PM
ardumont updated the task description. (Show Details)Aug 29 2019, 4:05 PM

Progress on the deposit part:

$ swh deposit upload --url http://deposit.internal.staging.swh.network \
    --username hal-preprod \
    --password <pass> \
    --archive jesuisgpl.tgz \
    --name jesuisgpl \
    --author zack
INFO:swh.deposit.cli.client:{'deposit_id': '2', 'deposit_status': 'deposited', 'deposit_status_detail': None, 'deposit_date': 'Aug. 30, 2019, 2:08 p.m.'}

# deposit-id 1 fails because the metadata we incomplete 
$ swh deposit status --url http://deposit.internal.staging.swh.network \
    --username hal-preprod \
    --password <pass> \
    --deposit-id 1
INFO:swh.deposit.cli.client:{'deposit_id': '1', 'deposit_status': 'rejected', 'deposit_status_detail': '- Mandatory fields are missing (author)\n- Mandatory alternate fields are missing (name or title)', 'deposit_swh_id': None, 'deposit_swh_id_context': None, 'deposit_swh_anchor_id': None, 'deposit_swh_anchor_id_context': None, 'deposit_external_id': '8740a2d3-d11c-4daf-8968-a6554dacbbc2'}

# deposit-id 2 is complete (so should work)
$ swh deposit status --url http://deposit.internal.staging.swh.network \
    --username hal-preprod \
    --password <pass> \
    --deposit-id 2
INFO:swh.deposit.cli.client:{'deposit_id': '2', 'deposit_status': 'verified', 'deposit_status_detail': 'Deposit is fully received, checked, and ready for loading', 'deposit_swh_id': None, 'deposit_swh_id_context': None, 'deposit_swh_anchor_id': None, 'deposit_swh_anchor_id_context': None, 'deposit_external_id': '8a60afdb-f424-46ec-abc6-49a03ccefce3'}

# but it fails for storage migration reasons
$ swh deposit status --url http://deposit.internal.staging.swh.network \
    --username hal-preprod \
    --password <pass> \
    --deposit-id 2
INFO:swh.deposit.cli.client:{'deposit_id': '2', 'deposit_status': 'failed', 'deposit_status_detail': 'The deposit loading into the Software Heritage archive failed', 'deposit_swh_id': None, 'deposit_swh_id_context': None, 'deposit_swh_anchor_id': None, 'deposit_swh_anchor_id_context': None, 'deposit_external_id': '8a60afdb-f424-46ec-abc6-49a03ccefce3'}

Now the deposit loading fails because we have some changes in the storage layer (origin-id are no longer ids but url):

Aug 30 14:08:34 worker1 python3[11461]: [2019-08-30 14:08:34,917: ERROR/ForkPoolWorker-1] Loading failure, updating to `partial` status
                                        Traceback (most recent call last):
                                          File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 876, in load
                                            self.store_metadata()
                                          File "/usr/lib/python3/dist-packages/swh/deposit/loader/loader.py", line 93, in store_metadata
                                            tool_id, metadata)
                                          File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f
                                            return Retrying(*dargs, **dkw).call(f, *args, **kw)
                                          File "/usr/lib/python3/dist-packages/retrying.py", line 206, in call
                                            return attempt.get(self._wrap_exception)
                                          File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get
                                            six.reraise(self.value[0], self.value[1], self.value[2])
                                          File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
                                            raise value
                                          File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call
                                            attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
                                          File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 352, in send_origin_metadata
                                            self.origin['url'], visit_date, provider_id, tool_id, metadata)
                                          File "/usr/lib/python3/dist-packages/swh/storage/api/client.py", line 241, in origin_metadata_add
                                            'metadata': metadata})
                                          File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 205, in post
                                            return self._decode_response(response)
                                          File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 237, in _decode_response
                                            raise pickle.loads(decode_response(response))
                                        psycopg2.DataError: invalid input syntax for integer: "https://inria.halpreprod.archives-ouvertes.fr/8a60afdb-f424-46ec-abc6-49a03ccefce3"
                                        LINE 2: ...          provider_id, tool_id, metadata) values ('https://i...

@vlorentz (stacktrace) ^

I think we missed the storage origin_metadata_add endpoint during the migration out of the origin id (both in-memory and pg storages).
The loader-core (0.0.44) has been changed to provide the url instead of the id but the storage endpoint still expects the origin-id.
That's my understanding, do you concur?

TIA

Indeed, I did not change it because I expected to do it at the same time as a refactoring of origin_metadata_* (aka implementing D1614).
I'll fix it asap

Finally, after updating plenty of our modules (model, storage, ...), loader-deposit is fixed, so i can finally cross that box! \m/

Sep 05 08:31:03 worker1 python3[11272]: [2019-09-05 08:31:03,260: INFO/ForkPoolWorker-1] Task swh.deposit.loader.tasks.LoadDepositArchiveTsk[da9f066e-4c44-4053-81a1-e38db35d6149] succeeded in 2.102082097902894s: {'status': 'eventful'}

Cheers,

ardumont updated the task description. (Show Details)Sep 5 2019, 10:39 AM
ardumont updated the task description. (Show Details)Sep 11 2019, 3:18 PM

Stand-by as package-loader priority T1389 took over.
Staging infra will be used when package-loader land (which will need some work debian packaging + configuration updates).

And then starting back the work to finalize it (one indexer which pulls a kafka).