ClearlyDefined[1] project could help to generate extrinsic metadata of the archive content.
The service can be accessed via a rest api[2] but there is a rate limiting in place.
To get rid of this limit, a miror can be installed on our infrastructure and keep in sync with a ingestion proxy [3]
An intern will work on this subject. We should provide a VM with enough disk space and memory to create it.
a preconfigured postgresql could be useful too,
The disk space needed is estimated to 2To (cf homepage of the github project[3])
Steps:
- T2890: Onboard tg19999
- D4745: Create db instance on staging for now [4]
- D4753, D4754 : Create vm instance with db access
[1] https://clearlydefined.io/
[2] https://api.clearlydefined.io/api-docs/#/definitions/get_definitions
[3] https://github.com/nexB/clearcode-toolkit
[4] uffizi has some disk limitations in the end