Collect metadata about software from ScanR
Closed, MigratedEdits Locked
Actions

Description

ScanR is a platform that indexes outputs from French research. It has recently started to create links between research articles and software archived in Software Heritage.

We plan to collect the metadata generated by ScanR about software origins archived in Software Heritage, as software origin metadata.

This can be done in two ways:

[preferred] as a deposit from ScanR into SWH (like the metadata only deposit planned for HAL, see T1021)
[fragile] as a pull operation, with SWH extracting information from the ScanR API

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T2201 Indexing / mining
Migrated	gitlab-migration	T2202 Collect extrinsic metadata
Migrated	gitlab-migration	T2328 Collect metadata about software from ScanR
Migrated	gitlab-migration	T2306 Generic storage for extrinsic, qualified metadata related to any node of the swh archive

Event Timeline

moranegg triaged this task as Normal priority.Mar 20 2020, 3:28 PM

moranegg created this task.

rdicosmo renamed this task from ScanR -SWH Collaboration to Collect metadata about software from ScanR.Mar 20 2020, 6:56 PM

rdicosmo updated the task description. (Show Details)

rdicosmo added projects: Metadata workflow, SWORD deposit.

rdicosmo updated the task description. (Show Details)

rdicosmo added a subtask: T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive.Mar 20 2020, 6:59 PM

rdicosmo added a parent task: T2202: Collect extrinsic metadata.Mar 20 2020, 7:02 PM

Here is a sample request provided by the ScanR developers to get all the metadata entries that can be found about software in ScanR

url="https://scanr-api.enseignementsup-recherche.gouv.fr/api/v2/publications/search"
params = {"query":"","sourceFields":["authors","oaEvidence","id","title","summary","domains","affiliations","links","productionType","publicationDate","isOa"],"pageSize":10000,"filters":{"links.type":{"type":"MultiValueSearchFilter","op":"any","values":["software_heritage"]}}}
r = requests.post(url, json=params)

And this is an excerpt from the result (that is returned in JSON format):

"id": "doi10.1002/bimj.201500098",
"links": [{'type': 'software_heritage',
'url': 'https://archive.softwareheritage.org/browse/origin/https://github.com/masedki/MHTrajectoryR'}],
"title":{'default': 'Bayesian model selection in logistic regression for the detection of adverse drug reactions'},
"summary": {'default': 'Spontaneous adverse event reports have a high potential for detecting adverse drug reactions. However, due to their dimension, the analysis of such databases requires statistical methods. In this context, disproportionality measures can be used. Their main idea is to project the data onto contingency tables in order to measure the strength of associations between drugs and adverse events. However, due to the data projection, these methods are sensitive to the problem of coprescriptions and masking effects. Recently, logistic regressions have been used with a Lasso type penalty to perform the detection of associations between drugs and adverse events. On different examples, this approach limits the drawbacks of the disproportionality methods, but the choice of the penalty value is open to criticism while it strongly influences the results. In this paper, we propose to use a logistic regression whose sparsity is viewed as a model selection challenge. Since the model space is huge, a Metropolis-Hastings algorithm carries out the model selection by maximizing the BIC criterion. Thus, we avoid the calibration of penalty or threshold. During our application on the French pharmacovigilance database, the proposed method is compared to well-established approaches on a reference dataset, and obtains better rates of positive and negative controls. However, many signals (i.e., specific drug-event associations) are not detected by the proposed method. So, we conclude that this method should be used in parallel to existing measures in pharmacovigilance. Code implementing the proposed method is available at the following url: https://github.com/masedki/MHTrajectoryR.'}

ardumont changed the status of subtask T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive from Open to Work in Progress.Jul 1 2020, 3:50 PM

vlorentz closed subtask T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive as Resolved.Jul 29 2020, 7:33 PM

This task has been migrated to GitLab.

gitlab-migration changed the status of subtask T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive from Resolved to Migrated.Jan 8 2023, 10:00 PM

Collect metadata about software from ScanRClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Collect metadata about software from ScanR
Closed, MigratedEdits Locked
Actions

Related Objects
Search...