Page MenuHomeSoftware Heritage

Prepare an environment to test the ClearlyDefined integration
Closed, MigratedEdits Locked

Description

ClearlyDefined[1] project could help to generate extrinsic metadata of the archive content.
The service can be accessed via a rest api[2] but there is a rate limiting in place.
To get rid of this limit, a miror can be installed on our infrastructure and keep in sync with a ingestion proxy [3]

An intern will work on this subject. We should provide a VM with enough disk space and memory to create it.
a preconfigured postgresql could be useful too,

The disk space needed is estimated to 2To (cf homepage of the github project[3])

Steps:

  • T2890: Onboard tg19999
  • D4745: Create db instance on staging for now [4]
  • D4753, D4754 : Create vm instance with db access

[1] https://clearlydefined.io/
[2] https://api.clearlydefined.io/api-docs/#/definitions/get_definitions
[3] https://github.com/nexB/clearcode-toolkit
[4] uffizi has some disk limitations in the end

Event Timeline

vsellier triaged this task as Normal priority.Dec 8 2020, 5:57 PM
vsellier created this task.
ardumont changed the task status from Open to Work in Progress.Dec 16 2020, 6:38 PM
ardumont updated the task description. (Show Details)
  • vm access with db access:
root@clearly-defined:~# su - tg1999
tg1999@clearly-defined:~$ psql service=clearly-defined
psql (13.1 (Debian 13.1-1.pgdg100+1), server 12.4 (Debian 12.4-1.pgdg100+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

clearly-defined=> \conninfo
You are connected to database "clearly-defined" as user "guest" on host "db1.internal.staging.swh.network" (address "192.168.130.11") at port "5432".
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
clearly-defined=> \q
tg1999@clearly-defined:~$ psql service=admin-clearly-defined
psql (13.1 (Debian 13.1-1.pgdg100+1), server 12.4 (Debian 12.4-1.pgdg100+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

clearly-defined=> \conninfo
You are connected to database "clearly-defined" as user "guest" on host "db1.internal.staging.swh.network" (address "192.168.130.11") at port "5432".
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)

Following the onboarding task of tg1999, we checked he could connect to the node clearly-defined and access the db.
So it's all good now.

The main part of this task is done.

Note that:

  • the disk on the vm clearly-defined [1] may need disk increase [2]
  • there is no current limitation for the db instance regarding the size though (we have more than 2To of space there, it's also shared with other staging db instances)

[1] clearly-defined.internal.staging.swh.network

[2] looking at the clearsync tool, there is a way to sync directly to db
afawui so it should be fine

clearsync --save-to-db  --verbose -n3
ardumont claimed this task.