Page MenuHomeSoftware Heritage

rdicosmo (Roberto Di Cosmo)
UserAdministrator

User Details

User Since
Sep 9 2015, 9:17 PM (307 w, 2 d)
Roles
Administrator

Recent Activity

Mon, Jul 26

rdicosmo added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

Thanks for looking into this.
What about sending logs to a separate dedicated logging machine instead of storing them locally?

Mon, Jul 26, 5:57 PM · System administration

Thu, Jul 22

rdicosmo added a comment to T3127: Compute and display distribution of origins by forge.

I am a bit puzzled by the numbers shown: eeally we have only 200k origins for GitLab.com.?

Indeed there is something weird here as we have more than one million gitlab.com origins in database.

softwareheritage=> select count(*) from origin where url like 'https://gitlab.com/%';
  count  
---------
 1023499
(1 row)

Looks like something was missed when computing lister metrics from scheduler database, this needs further investigations.

Thu, Jul 22, 9:01 AM · Metrics/monitoring, Web app, Roadmap 2021, meta-task

Wed, Jul 21

rdicosmo added a comment to T3127: Compute and display distribution of origins by forge.

I am a bit puzzled by the numbers shown: eeally we have only 200k origins for GitLab.com.?
And we know we had some 1.5m origins for Google code, why only 700k shown here?

Wed, Jul 21, 3:40 PM · Metrics/monitoring, Web app, Roadmap 2021, meta-task

Tue, Jul 20

rdicosmo moved T3162: Services page iconography from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Tue, Jul 20, 11:27 AM · Unknown Object (Project)

Jun 26 2021

rdicosmo committed rMSLD246f33a20d92: Fix spacing and images for TAUmus (authored by rdicosmo).
Fix spacing and images for TAUmus
Jun 26 2021, 11:09 AM

Jun 22 2021

rdicosmo added a comment to T3127: Compute and display distribution of origins by forge.

Nice to see this moving forward!

Jun 22 2021, 1:59 PM · Metrics/monitoring, Web app, Roadmap 2021, meta-task

Jun 18 2021

rdicosmo committed rMSLD84f0c2a244f4: Add tech bits (authored by rdicosmo).
Add tech bits
Jun 18 2021, 5:15 PM
rdicosmo committed rMSLD6d01fa8cd328: DigiHum lecture (authored by rdicosmo).
DigiHum lecture
Jun 18 2021, 5:01 PM
rdicosmo committed rMSLD33ff1dd30e98: Fix date for RESAW talk (authored by rdicosmo).
Fix date for RESAW talk
Jun 18 2021, 5:01 PM

Jun 14 2021

rdicosmo committed rMSLD2680082474ff: RESAW (authored by rdicosmo).
RESAW
Jun 14 2021, 7:39 PM
rdicosmo committed rMSLD9b4a3fde97cb: Update list of disappearing forges (authored by rdicosmo).
Update list of disappearing forges
Jun 14 2021, 7:39 PM

Jun 11 2021

rdicosmo added a comment to T3365: save code now: Failure to ingest new 'archives' type when head response is incomplete.

Great, it seems we are getting there :-)

Jun 11 2021, 5:40 PM · Save Code Now
rdicosmo committed rMSLD074721362556: Add CACM 2021 02 image (authored by rdicosmo).
Add CACM 2021 02 image
Jun 11 2021, 11:50 AM
rdicosmo added a comment to T3365: save code now: Failure to ingest new 'archives' type when head response is incomplete.

@ardumont, @rdicosmo, I just figured out that data we are missing (Content-Length, Last-Modified) from tarballs archived by the Internet archive are in fact available in x-archive-orig-* HTTP response headers,

Jun 11 2021, 11:39 AM · Save Code Now

Jun 7 2021

rdicosmo added a comment to T3365: save code now: Failure to ingest new 'archives' type when head response is incomplete.

Thanks @ardumont for investigating this. The fact that the IA does not provide the LastModified information may make sense for their specific case (it is possible that they do not have kept the LastModified info from the original location).

Jun 7 2021, 11:08 PM · Save Code Now

May 29 2021

rdicosmo committed rMSLD9b248f35fa75: SWHID animation images (authored by rdicosmo).
SWHID animation images
May 29 2021, 4:40 PM

May 28 2021

rdicosmo added a comment to T3213: Enable save code now of software source code archives for specific users.

The feature has been implemented and looks ready for production use.

I just tested it using the Web API and the docker environment for a real world example: the Kermit Software Source Code Archive.

May 28 2021, 1:55 PM · Save Code Now, Web app

May 25 2021

rdicosmo added a comment to T3313: Web API: per-user accounting.

That will be helpful in general (to answer questions likes: which endpoint is over/underused for specific use cases) and also in view of seeing who over/underuses rate limits (e.g., to identify the need of having more generous rate limits for specific use cases).

May 25 2021, 7:25 PM · System administration, Web app
rdicosmo created T3342: Collect material for software stories.
May 25 2021, 9:14 AM · Unknown Object (Project)

May 20 2021

rdicosmo committed rMSLD0aea320d43f7: Final SIF (authored by rdicosmo).
Final SIF
May 20 2021, 12:32 PM
rdicosmo committed rMSLD1fdbf56e9431: Biden EO (authored by rdicosmo).
Biden EO
May 20 2021, 12:32 PM

May 19 2021

rdicosmo added a comment to T3202: Help new users discover the features available in the archive browsing view.

If we want non staff users to give a try to the tour before official release, we could take advantage of authentication here and activate the guided tour only for users with a dedicated permission.

May 19 2021, 5:53 PM · Web app
rdicosmo added a comment to T3202: Help new users discover the features available in the archive browsing view.

Is the Help page linked from some other place? (i.e.: are we risking 404s if we dump it?)

I mean dumping the link not the page but I could move it in the footer to still reach the page.

May 19 2021, 4:04 PM · Web app
rdicosmo added a comment to T3202: Help new users discover the features available in the archive browsing view.

After some brainstorming on the subject, I was thinking to launch the guided tour through the Help link in the left sidebar and thus dump the Help page.

May 19 2021, 3:31 PM · Web app

May 12 2021

rdicosmo added a comment to T3202: Help new users discover the features available in the archive browsing view.

So we have a winner here.

May 12 2021, 12:13 PM · Web app

May 11 2021

rdicosmo merged task T3321: Add section "meet our ambassadors" on the ambassadors page into Unknown Object (Maniphest Task).
May 11 2021, 10:46 PM · Ambassadors, Website

May 10 2021

rdicosmo added a comment to T1226: Save code now email notification for submitter.

Is this feature still needed?

I think so, some origins can be long to load into the archive (huge svn repo for instance),
having a mail notification would be of interest here.

If yes, is it easy to implement it now?

Not at the moment, we need to resolve T3286 first.

May 10 2021, 6:31 PM · Save Code Now, Web app
rdicosmo added a comment to T1226: Save code now email notification for submitter.

A lot has changed since this was opened:

May 10 2021, 6:19 PM · Save Code Now, Web app

May 8 2021

rdicosmo moved T2194: Archive Integration (Web API) from Backlog to Work in progress on the Roadmap 2021 board.
May 8 2021, 11:14 AM · Roadmap 2021, meta-task
rdicosmo moved T3118: Documentation for users and ambassadors from Backlog to Work in progress on the Roadmap 2021 board.
May 8 2021, 11:14 AM · Scientific Community Building, Community Building, Roadmap 2021, meta-task
rdicosmo moved T2912: Next generation archive counters from Pending validation to Done on the Roadmap 2021 board.
May 8 2021, 11:13 AM · Roadmap 2021, System administration, Monitoring, Web app
rdicosmo moved T3082: Improve Save Code Now handling from Backlog to Work in progress on the Roadmap 2021 board.
May 8 2021, 11:12 AM · System administration, Save Code Now, meta-task, Roadmap 2021, Web app

May 7 2021

rdicosmo added a comment to T3312: web API rate limit: 10x more quota for authenticated users.

If we need to tune rate limit for specific type of users, this could be easily added in the new throttling
code I am currently working on.

May 7 2021, 11:45 AM · Web app
rdicosmo committed rMSLD71dc9ed81cfb: Add eLife logo (authored by rdicosmo).
Add eLife logo
May 7 2021, 11:16 AM
rdicosmo committed rMSLD0ad820cf9a76: Update growth in SIF slides (authored by rdicosmo).
Update growth in SIF slides
May 7 2021, 11:09 AM
rdicosmo committed rMSLDa38474327e31: Update growth figures (authored by rdicosmo).
Update growth figures
May 7 2021, 11:08 AM
rdicosmo added a comment to T3312: web API rate limit: 10x more quota for authenticated users.

@zack, @rdicosmo yes this is totally feasible by adding a dedicated Django REST Framework throttling handler for authenticated users.

Let's work on that then.

May 7 2021, 11:05 AM · Web app
rdicosmo committed rMSLD04062c8f1869: Structure (authored by rdicosmo).
Structure
May 7 2021, 10:28 AM
rdicosmo committed rMSLDbd56776e3d4d: Slides SIF reproductibilite (authored by rdicosmo).
Slides SIF reproductibilite
May 7 2021, 10:20 AM
rdicosmo committed rMSLDd4b4e016ea3c: Add pillar of OS entry in ARDC module (authored by rdicosmo).
Add pillar of OS entry in ARDC module
May 7 2021, 10:20 AM
rdicosmo added a comment to T3312: web API rate limit: 10x more quota for authenticated users.

@anlambert ; ping me when this is done, so we can answer some pending requests :-)

May 7 2021, 9:44 AM · Web app

Apr 29 2021

rdicosmo added a comment to T3298: Consider making SWHID handling case insensitive.

So for SWHID v1, the resolver should turn the core part into lowercase , am I right ?

Apr 29 2021, 1:16 PM · Data Model, Web app
rdicosmo added a comment to T3298: Consider making SWHID handling case insensitive.
In T3298#64426, @zack wrote:

This is going to be an interesting challenge/trade-off for SWHIDv2. Because I was considering there to use more compact encodings than hex, in order to shorten the SWHID length, like base58, but those are case-sensitive in order to be more dense.

So, as a counter argument above the "SHOULD" idea, we need to be careful about promoting a practice now that might change when switching from SWHIDv1 to SWHIDv2.

Apr 29 2021, 12:19 PM · Data Model, Web app
rdicosmo updated the task description for T3298: Consider making SWHID handling case insensitive.
Apr 29 2021, 12:03 PM · Data Model, Web app
rdicosmo triaged T3298: Consider making SWHID handling case insensitive as Normal priority.
Apr 29 2021, 12:02 PM · Data Model, Web app
rdicosmo created T3298: Consider making SWHID handling case insensitive.
Apr 29 2021, 12:02 PM · Data Model, Web app

Apr 28 2021

rdicosmo added a comment to T2912: Next generation archive counters.

> I also recall now that vincent added a graph [1] recently enough.

This to try and compare a bit the counter approaches together.

So that's still using the old plumbing at least for that part.

[1] https://grafana.softwareheritage.org/goto/BlkwHorMz

Apr 28 2021, 5:23 PM · Roadmap 2021, System administration, Monitoring, Web app

Apr 27 2021

rdicosmo created T3295: Archive the Kermit historical source code collection.
Apr 27 2021, 10:41 AM · Community Building

Apr 26 2021

rdicosmo added a comment to T2912: Next generation archive counters.

Last bits deployed on archive.s.o (including the author counters).

Apr 26 2021, 1:33 PM · Roadmap 2021, System administration, Monitoring, Web app
rdicosmo moved T2912: Next generation archive counters from Work in progress to Pending validation on the Roadmap 2021 board.
Apr 26 2021, 10:50 AM · Roadmap 2021, System administration, Monitoring, Web app
rdicosmo closed T3163: Call For Participation Grants as Resolved.
Apr 26 2021, 9:23 AM · Unknown Object (Project)

Apr 24 2021

rdicosmo added a comment to T3213: Enable save code now of software source code archives for specific users.

I recall it's part of creating a primary key (of sort) composed of all the properties mentioned
above (when the artifact does not provide some hashes already).
This to bypass fetching all other again things already fetched.

Apr 24 2021, 3:20 PM · Save Code Now, Web app
rdicosmo added a comment to T3213: Enable save code now of software source code archives for specific users.

Currently users only provide an url in the save code now, the loader expects a bit more
[1] (recall it's the lister which actually provide those).

The loader expects to be provided with a list of artifacts (could be only 1 in our
case). Still, such artifacts are described through the following:

  • artifact url
  • time
  • length (could be derived from the url when discussing with the server but not all server provides it...)
  • version (could be derived with heuristic from the url as well but that's regexp-hell-ish and prone to error)
  • filename (could be derived from the url without too much risk i think...)

I gather the save code now ui could be enriched (and displayed according to chosen visit
type) but that becomes more involved for people in general.

Another road would be to make some of those properties optional...

Thoughts?

[1]

 "url": "https://ftp.gnu.org/old-gnu/emacs/",
 "artifacts": [{"url": "https://ftp.gnu.org/old-gnu/emacs/elib-1.0.tar.gz",
                "time": "1995-12-12T08:00:00+00:00",
                "length": 58335,
                "version": "1.0",
                "filename": "elib-1.0.tar.gz",
                },
                ...
               ]
...
Apr 24 2021, 9:53 AM · Save Code Now, Web app

Apr 21 2021

rdicosmo added a comment to T3213: Enable save code now of software source code archives for specific users.

Thanks @ardumont ... so it appears that adapting the logic is easy... may you do it?
@anlambert may you look into the needed modification of the UI, to enable the new type of save code now payloads for selected authenticated users?

Apr 21 2021, 6:58 PM · Save Code Now, Web app
rdicosmo added a comment to T3087: Implement support for takedown notices (infra, admin tools, workflow).

So what about exports of the archive available on git-annex?

Apr 21 2021, 6:53 PM · meta-task, Roadmap 2021, Web app

Apr 20 2021

rdicosmo added a comment to T3278: Check older pending save code now requests apparently stuck and reschedule those.

Thanks, this is quite useful indeed.

Apr 20 2021, 7:28 PM · System administration, Save Code Now
rdicosmo added a comment to T3278: Check older pending save code now requests apparently stuck and reschedule those.

Thanks for looking into this. If I look at https://grafana.softwareheritage.org/d/WXRVVc_Mz/save-code-now?viewPanel=4&orgId=1&from=1617954242247&to=1617975842247&var-environment=production&var-instance=moma.internal.softwareheritage.org&var-status=All&var-load_task_status=All&var-visit_type=All it seems there are also some 255 requests "not yet scheduled". Maybe it's the same issue?

Apr 20 2021, 11:00 AM · System administration, Save Code Now

Apr 19 2021

rdicosmo committed rMSLDd525b5e493d9: RDA Data granularity WG presentation (authored by rdicosmo).
RDA Data granularity WG presentation
Apr 19 2021, 8:17 PM
rdicosmo added a comment to T3234: Handle gracefully trailing slashes when resolving SWHID in search box.

Thanks, it is indeed an urgent matter, as various journals depend on this!

Apr 19 2021, 6:46 PM · Web app
rdicosmo reopened T3234: Handle gracefully trailing slashes when resolving SWHID in search box as "Open".

Well, it seems we have been hit by this again, in a different form:

Apr 19 2021, 6:10 PM · Web app
rdicosmo added a comment to T3247: Implement SWHID validation in frontend.

Cool!

Apr 19 2021, 3:58 PM · Web app
rdicosmo moved T3246: Document takedown request processing workflow from Backlog to Work in progress on the Roadmap 2021 board.
Apr 19 2021, 11:53 AM · Archive content
rdicosmo moved T3077: Ease integration of fundraising campaigns from Pending validation to Done on the Roadmap 2021 board.
Apr 19 2021, 11:53 AM · Community Building, Roadmap 2021, Website

Apr 16 2021

rdicosmo added a comment to T3252: Better handling of erroneous origins submitted to save code now.

Thanks to all of you for this dicussion and proposals.

Apr 16 2021, 1:39 PM · System administration, Save Code Now, Web app
rdicosmo added a comment to T3256: Propose reason for rejecting a save code now.

Great. In addition to the content of the free form field, the standard answer should contain proper boilerplate reminding what is expected in a Save Code Now request, along the lines of what is written in the "Help" tab of https://archive.softwareheritage.org/save/

Apr 16 2021, 1:24 PM · Save Code Now, Easy hack, Web app
rdicosmo added a comment to T2117: Save Code Now: End to End monitoring.

On a related note, it may be useful to regularly report requests that did not complete (either as success or failure) in a reasonable amount of time after being scheduled.

Apr 16 2021, 9:06 AM · System administration, Monitoring, Roadmap 2021

Apr 15 2021

rdicosmo triaged T3252: Better handling of erroneous origins submitted to save code now as Normal priority.
Apr 15 2021, 10:47 PM · System administration, Save Code Now, Web app
rdicosmo added a comment to T2912: Next generation archive counters.

This kind of journal client will be necessary in any case if we want to extend the usage of the counters for other perimeters (metadata count, origin per forge, ...)

Apr 15 2021, 3:35 PM · Roadmap 2021, System administration, Monitoring, Web app
rdicosmo added a comment to T3084: Fast track save code now requests.

Pushed, packaged, deployed.

scheduler runner continues happily to schedule existing tasks and some new task with priority

Apr 15 13:12:51 saatchi swh[234257]: INFO:swh.scheduler.celery_backend.runner:Grabbed 2084 tasks load-git
Apr 15 13:12:54 saatchi swh[234257]: INFO:swh.scheduler.cli.admin.runner:Scheduled 4128 tasks
Apr 15 13:14:06 saatchi swh[234257]: INFO:swh.scheduler.celery_backend.runner:Grabbed 1 tasks load-pypi
Apr 15 13:14:06 saatchi swh[234257]: INFO:swh.scheduler.celery_backend.runner:Grabbed 1 tasks load-git (priority)
...

That task got done almost immediately...
So there you go ;)

Apr 15 2021, 3:30 PM · System administration, Web app
rdicosmo added a comment to T2912: Next generation archive counters.

Staging webapp[1] and webapp1 on production [2] are now configured to use swh-counters to display the historical values and the live object counts.

Apr 15 2021, 12:09 PM · Roadmap 2021, System administration, Monitoring, Web app

Apr 14 2021

rdicosmo added a comment to T3084: Fast track save code now requests.

Great news :-)

Apr 14 2021, 7:01 PM · System administration, Web app

Apr 13 2021

rdicosmo committed rMSLD5beda4268f79: Slides for RDA SSC IG (authored by rdicosmo).
Slides for RDA SSC IG
Apr 13 2021, 7:32 PM
rdicosmo committed R238:9cf42fd2074c: variant for renater gforge (authored by rdicosmo).
variant for renater gforge
Apr 13 2021, 6:54 PM
rdicosmo added a comment to D267: [WIP] add first implementation of FusionForge lister.

What would left to do to make this lister work? It seems already in good state, and it would be useful to index gforge.inria.fr since it will be closed soon (https://gforge.inria.fr/forum/forum.php?forum_id=11543). For the gforge.inria.fr case specifically, it is worth noticing that project creation is closed already, so a one-shot listing could be an option if it is lighter to set up: I wrote a small script to do that, but after a few requests to https://archive.softwareheritage.org/save/, requests are throttled. I would be happy to send you a listing of the public projects hosted on gforge.inria.fr if it could help.

Apr 13 2021, 5:00 PM
rdicosmo committed R238:4e2907a3f4fa: Prepare to generalize (authored by rdicosmo).
Prepare to generalize
Apr 13 2021, 4:06 PM
rdicosmo raised the priority of T3087: Implement support for takedown notices (infra, admin tools, workflow) from Normal to High.
Apr 13 2021, 2:53 PM · meta-task, Roadmap 2021, Web app
rdicosmo added a comment to T3247: Implement SWHID validation in frontend.

Ok, this is converging with the discussion in T3234: we fully agree that having proper errors reported to the user is the way to go, so let's forget about the "sanitization" approach.

Apr 13 2021, 12:38 PM · Web app
rdicosmo added a comment to T3234: Handle gracefully trailing slashes when resolving SWHID in search box.
Apr 13 2021, 12:23 PM · Web app
rdicosmo added a comment to D5485: docs/persistent-identifiers: Add guidelines for fixing invalid SWHIDs..

Ok, so no need to change the specification document for SWHIDs.

Apr 13 2021, 12:06 PM
rdicosmo added a comment to T3234: Handle gracefully trailing slashes when resolving SWHID in search box.

@vlorentz , @anlambert : thanks for progressing the discussion on this issue.
After mulling over your inputs, here is my current understanding:

Apr 13 2021, 12:00 PM · Web app
rdicosmo added a comment to T3247: Implement SWHID validation in frontend.

I wonder if this is not overkill: SWHID may evolve in the future, and maintaining two implementations (one of them in JS!) may be source of headaches down the line.
A simple "sanitization" phase in the frontend catching the most common issues (trailing slashes, leading or trailing tabs or spaces, etc.) would probably be enough for our purpose.

Apr 13 2021, 11:34 AM · Web app

Apr 10 2021

rdicosmo added a comment to T3234: Handle gracefully trailing slashes when resolving SWHID in search box.

As a compromise, we could accept this trailing slash, but show a warning on the interface and/or codify in the SWHID specification an exhaustive list of "fixes" that user interfaces can/should do.

Apr 10 2021, 1:12 PM · Web app
rdicosmo added a comment to T3234: Handle gracefully trailing slashes when resolving SWHID in search box.

There are already many URLs in the open, so even if we remove the trailing slash now, that does not solve the problem.

Apr 10 2021, 11:44 AM · Web app
rdicosmo triaged T3234: Handle gracefully trailing slashes when resolving SWHID in search box as Normal priority.
Apr 10 2021, 11:16 AM · Web app
rdicosmo triaged T3233: Missing Apollo 11 virtual AGC repository from Google Code as Normal priority.
Apr 10 2021, 9:32 AM · SVN Loader

Apr 9 2021

rdicosmo raised the priority of T3230: Add various markdown variants to list of intrinsic metadata files to be indexed from Low to Normal.
Apr 9 2021, 4:45 PM · Intrinsic metadata, Easy hack, Indexer
rdicosmo committed rMSLD7b622587a441: Added general Open Science presentation (authored by rdicosmo).
Added general Open Science presentation
Apr 9 2021, 2:14 PM
rdicosmo updated the task description for T3230: Add various markdown variants to list of intrinsic metadata files to be indexed .
Apr 9 2021, 1:33 PM · Intrinsic metadata, Easy hack, Indexer
rdicosmo created T3230: Add various markdown variants to list of intrinsic metadata files to be indexed .
Apr 9 2021, 1:32 PM · Intrinsic metadata, Easy hack, Indexer

Apr 6 2021

rdicosmo triaged T3213: Enable save code now of software source code archives for specific users as Normal priority.
Apr 6 2021, 9:33 PM · Save Code Now, Web app
rdicosmo assigned T3213: Enable save code now of software source code archives for specific users to anlambert.
Apr 6 2021, 9:32 PM · Save Code Now, Web app
rdicosmo added a subtask for T3082: Improve Save Code Now handling: T3213: Enable save code now of software source code archives for specific users.
Apr 6 2021, 9:32 PM · System administration, Save Code Now, meta-task, Roadmap 2021, Web app
rdicosmo added a parent task for T3213: Enable save code now of software source code archives for specific users: T3082: Improve Save Code Now handling.
Apr 6 2021, 9:32 PM · Save Code Now, Web app
rdicosmo created T3213: Enable save code now of software source code archives for specific users.
Apr 6 2021, 9:31 PM · Save Code Now, Web app
rdicosmo moved T3077: Ease integration of fundraising campaigns from Work in progress to Pending validation on the Roadmap 2021 board.
Apr 6 2021, 11:45 AM · Community Building, Roadmap 2021, Website
rdicosmo committed rMSLD4570184c6adf: Update growth (authored by rdicosmo).
Update growth
Apr 6 2021, 11:29 AM
rdicosmo committed rMSLD280818490775: Update to LLW talk (authored by rdicosmo).
Update to LLW talk
Apr 6 2021, 11:16 AM
rdicosmo committed rMSLD994029b254db: Module deck (authored by rdicosmo).
Module deck
Apr 6 2021, 11:05 AM
rdicosmo committed rMSLD9affd6c0f92f: DIG (authored by rdicosmo).
DIG
Apr 6 2021, 11:05 AM

Apr 5 2021

rdicosmo assigned T3128: Improve deposit integration, management and display to moranegg.
Apr 5 2021, 12:13 PM · meta-task, Roadmap 2021, Monitoring, SWORD deposit, Web app