Page MenuHomeSoftware Heritage

rdicosmo (Roberto Di Cosmo)
UserAdministrator

User Details

User Since
Sep 9 2015, 9:17 PM (318 w, 2 d)
Roles
Administrator

Recent Activity

Wed, Oct 6

rdicosmo added a comment to T3627: Consider dropping pull request references from the git loader ingestion.

Yes, we must filter this stuff out (we discussed this issue with @zack some time ago, and you may see Torvalds' opinion too https://www.zdnet.com/article/linux-boosts-microsoft-ntfs-support-as-linus-torvalds-complains-about-github-merges/ )

Wed, Oct 6, 10:23 PM · Git loader

Tue, Oct 5

rdicosmo committed rMSLD2f4e164a2bda: IRD (authored by rdicosmo).
IRD
Tue, Oct 5, 8:05 PM
rdicosmo committed rMSLDa6b355cb0494: Final SU FSI (authored by rdicosmo).
Final SU FSI
Tue, Oct 5, 7:15 PM
rdicosmo committed rMSLD6cced4db069c: Add missing KYSW + PNSO2 images (authored by rdicosmo).
Add missing KYSW + PNSO2 images
Tue, Oct 5, 7:15 PM

Tue, Sep 21

rdicosmo committed rMSLDf1fe1dc4329f: SU-SWH better call to action (authored by rdicosmo).
SU-SWH better call to action
Tue, Sep 21, 3:46 PM
rdicosmo committed rMSLD8cfb064934ba: Add Free Software prize (authored by rdicosmo).
Add Free Software prize
Tue, Sep 21, 3:35 PM
rdicosmo committed rMSLD5e0bc2e2f763: SU-SWH updates (authored by rdicosmo).
SU-SWH updates
Tue, Sep 21, 3:35 PM
rdicosmo committed rMSLD5163859caa16: Better biblio EdP (authored by rdicosmo).
Better biblio EdP
Tue, Sep 21, 3:35 PM
rdicosmo committed rMSLD40ff045da51d: Final version of DigiHum talk (authored by rdicosmo).
Final version of DigiHum talk
Tue, Sep 21, 3:35 PM
rdicosmo committed rMSLD892702c91f21: Typo (authored by rdicosmo).
Typo
Tue, Sep 21, 3:35 PM
rdicosmo committed rMSLDc02fe4ac53c7: Update shared modules (authored by rdicosmo).
Update shared modules
Tue, Sep 21, 3:35 PM
rdicosmo committed rMSLD61fd5a1fa6dc: Add PNSO2 (authored by rdicosmo).
Add PNSO2
Tue, Sep 21, 3:35 PM
rdicosmo committed rMSLDd5acde4b060c: SU FSI presentation (authored by rdicosmo).
SU FSI presentation
Tue, Sep 21, 3:35 PM

Mon, Sep 20

rdicosmo moved T3537: Blog post Obsidian NLNet from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Mon, Sep 20, 12:21 PM · Unknown Object (Project)

Sun, Sep 19

rdicosmo moved T3162: Services page iconography from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Sun, Sep 19, 11:28 AM · Unknown Object (Project)
rdicosmo moved T3536: Blog post Easter Eggs NLNet from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Sun, Sep 19, 11:26 AM · Unknown Object (Project)
rdicosmo added a comment to T3536: Blog post Easter Eggs NLNet.

The agreement has been signed, so we can move forward and publish the blog post.

Sun, Sep 19, 11:24 AM · Unknown Object (Project)

Sep 14 2021

rdicosmo moved T3536: Blog post Easter Eggs NLNet from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Sep 14 2021, 9:29 AM · Unknown Object (Project)
rdicosmo moved T3535: Blog post Octobus NLNet from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Sep 14 2021, 9:29 AM · Unknown Object (Project)

Sep 5 2021

rdicosmo moved T3534: Update team page with John Hsieh from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Sep 5 2021, 9:33 PM · Unknown Object (Project)
rdicosmo committed rMSLDb17758ca1b46: EDP 2021 (authored by rdicosmo).
EDP 2021
Sep 5 2021, 7:45 PM
rdicosmo committed rMSLDaededa4271fc: Reorganised motivations (authored by rdicosmo).
Reorganised motivations
Sep 5 2021, 7:45 PM

Aug 31 2021

rdicosmo added a comment to T3489: Implement iframe view for content and directory elements.

Cool! LGTM

Aug 31 2021, 3:38 PM · Software Stories, Web app

Aug 30 2021

rdicosmo added a comment to T3489: Implement iframe view for content and directory elements.

In the future we will also need to answer a more complex use-case where the iframe resembles more as an embedded copy of the archive.

Do you have an example of a complex usecase to get a better idea of your needs ?

Aug 30 2021, 7:34 PM · Software Stories, Web app

Aug 28 2021

rdicosmo added a comment to T3489: Implement iframe view for content and directory elements.

Thanks for all this great work...
A few questions/remarks:

  • where is the Permalinks tab? I do not see it in the images
  • for the url, what about "/embed/" instead of "/iframe/"? "embed" seems to be the canonical term used to designate this kind of things (see YouTube, etc.)
  • instead of "go to the archive", why not "View in the Software Heritage Archive"?
  • it's great that the iframe can use the width and height attibutes! How do you plan to handle strange values (e.g.: width=10px height=200%)?
Aug 28 2021, 10:02 AM · Software Stories, Web app

Aug 23 2021

rdicosmo moved T3162: Services page iconography from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Aug 23 2021, 9:08 AM · Unknown Object (Project)
rdicosmo moved T3409: Prepare ambassador's blog post from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Aug 23 2021, 9:07 AM · Unknown Object (Project)

Jul 26 2021

rdicosmo added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

Thanks for looking into this.
What about sending logs to a separate dedicated logging machine instead of storing them locally?

Jul 26 2021, 5:57 PM · System administration

Jul 22 2021

rdicosmo added a comment to T3127: Compute and display distribution of origins by forge.

I am a bit puzzled by the numbers shown: eeally we have only 200k origins for GitLab.com.?

Indeed there is something weird here as we have more than one million gitlab.com origins in database.

softwareheritage=> select count(*) from origin where url like 'https://gitlab.com/%';
  count  
---------
 1023499
(1 row)

Looks like something was missed when computing lister metrics from scheduler database, this needs further investigations.

Jul 22 2021, 9:01 AM · Metrics/monitoring, Web app, Roadmap 2021, meta-task

Jul 21 2021

rdicosmo added a comment to T3127: Compute and display distribution of origins by forge.

I am a bit puzzled by the numbers shown: eeally we have only 200k origins for GitLab.com.?
And we know we had some 1.5m origins for Google code, why only 700k shown here?

Jul 21 2021, 3:40 PM · Metrics/monitoring, Web app, Roadmap 2021, meta-task

Jul 20 2021

rdicosmo moved T3162: Services page iconography from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Jul 20 2021, 11:27 AM · Unknown Object (Project)

Jun 26 2021

rdicosmo committed rMSLD246f33a20d92: Fix spacing and images for TAUmus (authored by rdicosmo).
Fix spacing and images for TAUmus
Jun 26 2021, 11:09 AM

Jun 22 2021

rdicosmo added a comment to T3127: Compute and display distribution of origins by forge.

Nice to see this moving forward!

Jun 22 2021, 1:59 PM · Metrics/monitoring, Web app, Roadmap 2021, meta-task

Jun 18 2021

rdicosmo committed rMSLD84f0c2a244f4: Add tech bits (authored by rdicosmo).
Add tech bits
Jun 18 2021, 5:15 PM
rdicosmo committed rMSLD6d01fa8cd328: DigiHum lecture (authored by rdicosmo).
DigiHum lecture
Jun 18 2021, 5:01 PM
rdicosmo committed rMSLD33ff1dd30e98: Fix date for RESAW talk (authored by rdicosmo).
Fix date for RESAW talk
Jun 18 2021, 5:01 PM

Jun 14 2021

rdicosmo committed rMSLD2680082474ff: RESAW (authored by rdicosmo).
RESAW
Jun 14 2021, 7:39 PM
rdicosmo committed rMSLD9b4a3fde97cb: Update list of disappearing forges (authored by rdicosmo).
Update list of disappearing forges
Jun 14 2021, 7:39 PM

Jun 11 2021

rdicosmo added a comment to T3365: save code now: Failure to ingest new 'archives' type when head response is incomplete.

Great, it seems we are getting there :-)

Jun 11 2021, 5:40 PM · Save Code Now
rdicosmo committed rMSLD074721362556: Add CACM 2021 02 image (authored by rdicosmo).
Add CACM 2021 02 image
Jun 11 2021, 11:50 AM
rdicosmo added a comment to T3365: save code now: Failure to ingest new 'archives' type when head response is incomplete.

@ardumont, @rdicosmo, I just figured out that data we are missing (Content-Length, Last-Modified) from tarballs archived by the Internet archive are in fact available in x-archive-orig-* HTTP response headers,

Jun 11 2021, 11:39 AM · Save Code Now

Jun 7 2021

rdicosmo added a comment to T3365: save code now: Failure to ingest new 'archives' type when head response is incomplete.

Thanks @ardumont for investigating this. The fact that the IA does not provide the LastModified information may make sense for their specific case (it is possible that they do not have kept the LastModified info from the original location).

Jun 7 2021, 11:08 PM · Save Code Now

May 29 2021

rdicosmo committed rMSLD9b248f35fa75: SWHID animation images (authored by rdicosmo).
SWHID animation images
May 29 2021, 4:40 PM

May 28 2021

rdicosmo added a comment to T3213: Enable save code now of software source code archives for specific users.

The feature has been implemented and looks ready for production use.

I just tested it using the Web API and the docker environment for a real world example: the Kermit Software Source Code Archive.

May 28 2021, 1:55 PM · Save Code Now, Web app

May 25 2021

rdicosmo added a comment to T3313: Web API: per-user accounting.

That will be helpful in general (to answer questions likes: which endpoint is over/underused for specific use cases) and also in view of seeing who over/underuses rate limits (e.g., to identify the need of having more generous rate limits for specific use cases).

May 25 2021, 7:25 PM · System administration, Web app
rdicosmo created T3342: Collect material for software stories.
May 25 2021, 9:14 AM · Unknown Object (Project)

May 20 2021

rdicosmo committed rMSLD0aea320d43f7: Final SIF (authored by rdicosmo).
Final SIF
May 20 2021, 12:32 PM
rdicosmo committed rMSLD1fdbf56e9431: Biden EO (authored by rdicosmo).
Biden EO
May 20 2021, 12:32 PM

May 19 2021

rdicosmo added a comment to T3202: Help new users discover the features available in the archive browsing view.

If we want non staff users to give a try to the tour before official release, we could take advantage of authentication here and activate the guided tour only for users with a dedicated permission.

May 19 2021, 5:53 PM · Web app
rdicosmo added a comment to T3202: Help new users discover the features available in the archive browsing view.

Is the Help page linked from some other place? (i.e.: are we risking 404s if we dump it?)

I mean dumping the link not the page but I could move it in the footer to still reach the page.

May 19 2021, 4:04 PM · Web app
rdicosmo added a comment to T3202: Help new users discover the features available in the archive browsing view.

After some brainstorming on the subject, I was thinking to launch the guided tour through the Help link in the left sidebar and thus dump the Help page.

May 19 2021, 3:31 PM · Web app

May 12 2021

rdicosmo added a comment to T3202: Help new users discover the features available in the archive browsing view.

So we have a winner here.

May 12 2021, 12:13 PM · Web app

May 11 2021

rdicosmo merged task T3321: Add section "meet our ambassadors" on the ambassadors page into Unknown Object (Maniphest Task).
May 11 2021, 10:46 PM · Ambassadors, Website

May 10 2021

rdicosmo added a comment to T1226: Save code now email notification for submitter.

Is this feature still needed?

I think so, some origins can be long to load into the archive (huge svn repo for instance),
having a mail notification would be of interest here.

If yes, is it easy to implement it now?

Not at the moment, we need to resolve T3286 first.

May 10 2021, 6:31 PM · Save Code Now, Web app
rdicosmo added a comment to T1226: Save code now email notification for submitter.

A lot has changed since this was opened:

May 10 2021, 6:19 PM · Save Code Now, Web app

May 8 2021

rdicosmo moved T2194: Archive Integration (Web API) from Backlog to Work in progress on the Roadmap 2021 board.
May 8 2021, 11:14 AM · Roadmap 2021, meta-task
rdicosmo moved T3118: Documentation for users and ambassadors from Backlog to Work in progress on the Roadmap 2021 board.
May 8 2021, 11:14 AM · Scientific Community Building, Community Building, Roadmap 2021, meta-task
rdicosmo moved T2912: Next generation archive counters from Pending validation to Done on the Roadmap 2021 board.
May 8 2021, 11:13 AM · Roadmap 2021, System administration, Monitoring, Web app
rdicosmo moved T3082: Improve Save Code Now handling from Backlog to Work in progress on the Roadmap 2021 board.
May 8 2021, 11:12 AM · System administration, Save Code Now, meta-task, Roadmap 2021, Web app

May 7 2021

rdicosmo added a comment to T3312: web API rate limit: 10x more quota for authenticated users.

If we need to tune rate limit for specific type of users, this could be easily added in the new throttling
code I am currently working on.

May 7 2021, 11:45 AM · Web app
rdicosmo committed rMSLD71dc9ed81cfb: Add eLife logo (authored by rdicosmo).
Add eLife logo
May 7 2021, 11:16 AM
rdicosmo committed rMSLD0ad820cf9a76: Update growth in SIF slides (authored by rdicosmo).
Update growth in SIF slides
May 7 2021, 11:09 AM
rdicosmo committed rMSLDa38474327e31: Update growth figures (authored by rdicosmo).
Update growth figures
May 7 2021, 11:08 AM
rdicosmo added a comment to T3312: web API rate limit: 10x more quota for authenticated users.

@zack, @rdicosmo yes this is totally feasible by adding a dedicated Django REST Framework throttling handler for authenticated users.

Let's work on that then.

May 7 2021, 11:05 AM · Web app
rdicosmo committed rMSLD04062c8f1869: Structure (authored by rdicosmo).
Structure
May 7 2021, 10:28 AM
rdicosmo committed rMSLDbd56776e3d4d: Slides SIF reproductibilite (authored by rdicosmo).
Slides SIF reproductibilite
May 7 2021, 10:20 AM
rdicosmo committed rMSLDd4b4e016ea3c: Add pillar of OS entry in ARDC module (authored by rdicosmo).
Add pillar of OS entry in ARDC module
May 7 2021, 10:20 AM
rdicosmo added a comment to T3312: web API rate limit: 10x more quota for authenticated users.

@anlambert ; ping me when this is done, so we can answer some pending requests :-)

May 7 2021, 9:44 AM · Web app

Apr 29 2021

rdicosmo added a comment to T3298: Consider making SWHID handling case insensitive.

So for SWHID v1, the resolver should turn the core part into lowercase , am I right ?

Apr 29 2021, 1:16 PM · Data Model, Web app
rdicosmo added a comment to T3298: Consider making SWHID handling case insensitive.
In T3298#64426, @zack wrote:

This is going to be an interesting challenge/trade-off for SWHIDv2. Because I was considering there to use more compact encodings than hex, in order to shorten the SWHID length, like base58, but those are case-sensitive in order to be more dense.

So, as a counter argument above the "SHOULD" idea, we need to be careful about promoting a practice now that might change when switching from SWHIDv1 to SWHIDv2.

Apr 29 2021, 12:19 PM · Data Model, Web app
rdicosmo updated the task description for T3298: Consider making SWHID handling case insensitive.
Apr 29 2021, 12:03 PM · Data Model, Web app
rdicosmo triaged T3298: Consider making SWHID handling case insensitive as Normal priority.
Apr 29 2021, 12:02 PM · Data Model, Web app
rdicosmo created T3298: Consider making SWHID handling case insensitive.
Apr 29 2021, 12:02 PM · Data Model, Web app

Apr 28 2021

rdicosmo added a comment to T2912: Next generation archive counters.

> I also recall now that vincent added a graph [1] recently enough.

This to try and compare a bit the counter approaches together.

So that's still using the old plumbing at least for that part.

[1] https://grafana.softwareheritage.org/goto/BlkwHorMz

Apr 28 2021, 5:23 PM · Roadmap 2021, System administration, Monitoring, Web app

Apr 27 2021

rdicosmo created T3295: Archive the Kermit historical source code collection.
Apr 27 2021, 10:41 AM · Community Building

Apr 26 2021

rdicosmo added a comment to T2912: Next generation archive counters.

Last bits deployed on archive.s.o (including the author counters).

Apr 26 2021, 1:33 PM · Roadmap 2021, System administration, Monitoring, Web app
rdicosmo moved T2912: Next generation archive counters from Work in progress to Pending validation on the Roadmap 2021 board.
Apr 26 2021, 10:50 AM · Roadmap 2021, System administration, Monitoring, Web app
rdicosmo closed T3163: Call For Participation Grants as Resolved.
Apr 26 2021, 9:23 AM · Unknown Object (Project)

Apr 24 2021

rdicosmo added a comment to T3213: Enable save code now of software source code archives for specific users.

I recall it's part of creating a primary key (of sort) composed of all the properties mentioned
above (when the artifact does not provide some hashes already).
This to bypass fetching all other again things already fetched.

Apr 24 2021, 3:20 PM · Save Code Now, Web app
rdicosmo added a comment to T3213: Enable save code now of software source code archives for specific users.

Currently users only provide an url in the save code now, the loader expects a bit more
[1] (recall it's the lister which actually provide those).

The loader expects to be provided with a list of artifacts (could be only 1 in our
case). Still, such artifacts are described through the following:

  • artifact url
  • time
  • length (could be derived from the url when discussing with the server but not all server provides it...)
  • version (could be derived with heuristic from the url as well but that's regexp-hell-ish and prone to error)
  • filename (could be derived from the url without too much risk i think...)

I gather the save code now ui could be enriched (and displayed according to chosen visit
type) but that becomes more involved for people in general.

Another road would be to make some of those properties optional...

Thoughts?

[1]

 "url": "https://ftp.gnu.org/old-gnu/emacs/",
 "artifacts": [{"url": "https://ftp.gnu.org/old-gnu/emacs/elib-1.0.tar.gz",
                "time": "1995-12-12T08:00:00+00:00",
                "length": 58335,
                "version": "1.0",
                "filename": "elib-1.0.tar.gz",
                },
                ...
               ]
...
Apr 24 2021, 9:53 AM · Save Code Now, Web app

Apr 21 2021

rdicosmo added a comment to T3213: Enable save code now of software source code archives for specific users.

Thanks @ardumont ... so it appears that adapting the logic is easy... may you do it?
@anlambert may you look into the needed modification of the UI, to enable the new type of save code now payloads for selected authenticated users?

Apr 21 2021, 6:58 PM · Save Code Now, Web app
rdicosmo added a comment to T3087: Implement support for takedown notices (infra, admin tools, workflow).

So what about exports of the archive available on git-annex?

Apr 21 2021, 6:53 PM · meta-task, Roadmap 2021, Web app

Apr 20 2021

rdicosmo added a comment to T3278: Check older pending save code now requests apparently stuck and reschedule those.

Thanks, this is quite useful indeed.

Apr 20 2021, 7:28 PM · System administration, Save Code Now
rdicosmo added a comment to T3278: Check older pending save code now requests apparently stuck and reschedule those.

Thanks for looking into this. If I look at https://grafana.softwareheritage.org/d/WXRVVc_Mz/save-code-now?viewPanel=4&orgId=1&from=1617954242247&to=1617975842247&var-environment=production&var-instance=moma.internal.softwareheritage.org&var-status=All&var-load_task_status=All&var-visit_type=All it seems there are also some 255 requests "not yet scheduled". Maybe it's the same issue?

Apr 20 2021, 11:00 AM · System administration, Save Code Now

Apr 19 2021

rdicosmo committed rMSLDd525b5e493d9: RDA Data granularity WG presentation (authored by rdicosmo).
RDA Data granularity WG presentation
Apr 19 2021, 8:17 PM
rdicosmo added a comment to T3234: Handle gracefully trailing slashes when resolving SWHID in search box.

Thanks, it is indeed an urgent matter, as various journals depend on this!

Apr 19 2021, 6:46 PM · Web app
rdicosmo reopened T3234: Handle gracefully trailing slashes when resolving SWHID in search box as "Open".

Well, it seems we have been hit by this again, in a different form:

Apr 19 2021, 6:10 PM · Web app
rdicosmo added a comment to T3247: Implement SWHID validation in frontend.

Cool!

Apr 19 2021, 3:58 PM · Web app
rdicosmo moved T3246: Document takedown request processing workflow from Backlog to Work in progress on the Roadmap 2021 board.
Apr 19 2021, 11:53 AM · Archive content
rdicosmo moved T3077: Ease integration of fundraising campaigns from Pending validation to Done on the Roadmap 2021 board.
Apr 19 2021, 11:53 AM · Community Building, Roadmap 2021, Website

Apr 16 2021

rdicosmo added a comment to T3252: Better handling of erroneous origins submitted to save code now.

Thanks to all of you for this dicussion and proposals.

Apr 16 2021, 1:39 PM · System administration, Save Code Now, Web app
rdicosmo added a comment to T3256: Propose reason for rejecting a save code now.

Great. In addition to the content of the free form field, the standard answer should contain proper boilerplate reminding what is expected in a Save Code Now request, along the lines of what is written in the "Help" tab of https://archive.softwareheritage.org/save/

Apr 16 2021, 1:24 PM · Save Code Now, Easy hack, Web app
rdicosmo added a comment to T2117: Save Code Now: End to End monitoring.

On a related note, it may be useful to regularly report requests that did not complete (either as success or failure) in a reasonable amount of time after being scheduled.

Apr 16 2021, 9:06 AM · System administration, Monitoring, Roadmap 2021

Apr 15 2021

rdicosmo triaged T3252: Better handling of erroneous origins submitted to save code now as Normal priority.
Apr 15 2021, 10:47 PM · System administration, Save Code Now, Web app
rdicosmo added a comment to T2912: Next generation archive counters.

This kind of journal client will be necessary in any case if we want to extend the usage of the counters for other perimeters (metadata count, origin per forge, ...)

Apr 15 2021, 3:35 PM · Roadmap 2021, System administration, Monitoring, Web app
rdicosmo added a comment to T3084: Fast track save code now requests.

Pushed, packaged, deployed.

scheduler runner continues happily to schedule existing tasks and some new task with priority

Apr 15 13:12:51 saatchi swh[234257]: INFO:swh.scheduler.celery_backend.runner:Grabbed 2084 tasks load-git
Apr 15 13:12:54 saatchi swh[234257]: INFO:swh.scheduler.cli.admin.runner:Scheduled 4128 tasks
Apr 15 13:14:06 saatchi swh[234257]: INFO:swh.scheduler.celery_backend.runner:Grabbed 1 tasks load-pypi
Apr 15 13:14:06 saatchi swh[234257]: INFO:swh.scheduler.celery_backend.runner:Grabbed 1 tasks load-git (priority)
...

That task got done almost immediately...
So there you go ;)

Apr 15 2021, 3:30 PM · System administration, Web app
rdicosmo added a comment to T2912: Next generation archive counters.

Staging webapp[1] and webapp1 on production [2] are now configured to use swh-counters to display the historical values and the live object counts.

Apr 15 2021, 12:09 PM · Roadmap 2021, System administration, Monitoring, Web app

Apr 14 2021

rdicosmo added a comment to T3084: Fast track save code now requests.

Great news :-)

Apr 14 2021, 7:01 PM · System administration, Web app

Apr 13 2021

rdicosmo committed rMSLD5beda4268f79: Slides for RDA SSC IG (authored by rdicosmo).
Slides for RDA SSC IG
Apr 13 2021, 7:32 PM
rdicosmo committed R238:9cf42fd2074c: variant for renater gforge (authored by rdicosmo).
variant for renater gforge
Apr 13 2021, 6:54 PM