Page MenuHomeSoftware Heritage

olasd (Nicolas Dandrimont)
UserAdministrator

Projects (7)

User Details

User Since
Sep 7 2015, 3:25 PM (197 w, 2 d)
Roles
Administrator

Recent Activity

Today

olasd committed rMSLDa3eaca673001: Add Ubuntu Party slides (authored by olasd).
Add Ubuntu Party slides
Thu, Jun 20, 2:00 PM
olasd committed rDOBJSf8624812c381: Remove :py3 qualifier from the tox test environment configuration (authored by olasd).
Remove :py3 qualifier from the tox test environment configuration
Thu, Jun 20, 1:34 PM
olasd closed D1616: Remove :py3 qualifier from the tox test environment configuration.
Thu, Jun 20, 1:34 PM

Yesterday

olasd committed rCJSWH5ee86422e255: Use a more reliable command to clean up dangling docker volumes (authored by olasd).
Use a more reliable command to clean up dangling docker volumes
Wed, Jun 19, 4:26 PM
olasd committed rDOBJS67197802d5aa: pathslicing: Make sure data is flushed to disk before renaming the tempfile (authored by olasd).
pathslicing: Make sure data is flushed to disk before renaming the tempfile
Wed, Jun 19, 4:11 PM
olasd closed D1611: pathslicing: Make sure data is flushed to disk before renaming the tempfile.
Wed, Jun 19, 4:11 PM
olasd closed T1823: make DB/FS transactions nest properly as Resolved by committing rDOBJS67197802d5aa: pathslicing: Make sure data is flushed to disk before renaming the tempfile.
Wed, Jun 19, 4:11 PM · Object storage, Storage manager
olasd created D1616: Remove :py3 qualifier from the tox test environment configuration.
Wed, Jun 19, 4:08 PM
olasd updated the diff for D1611: pathslicing: Make sure data is flushed to disk before renaming the tempfile.

Support old unittest.mock

Wed, Jun 19, 4:06 PM
olasd added inline comments to D1611: pathslicing: Make sure data is flushed to disk before renaming the tempfile.
Wed, Jun 19, 3:39 PM
olasd updated the diff for D1611: pathslicing: Make sure data is flushed to disk before renaming the tempfile.

Fallback to fsync if fdatasync isn't available

Wed, Jun 19, 3:38 PM
olasd added a comment to D1586: Don't use a join in origin_visit_get_latest..

sound good to me
@olasd any thoughts on this?

Wed, Jun 19, 2:26 PM
olasd accepted D1608: client: allow to specify any cfg parameter to KafkaConsumer().
Wed, Jun 19, 2:15 PM
olasd created D1611: pathslicing: Make sure data is flushed to disk before renaming the tempfile.
Wed, Jun 19, 1:46 PM
olasd added a revision to T1823: make DB/FS transactions nest properly: D1611: pathslicing: Make sure data is flushed to disk before renaming the tempfile.
Wed, Jun 19, 1:46 PM · Object storage, Storage manager
olasd closed T1825: Deploy kafka direct journal_writer to main storage as Resolved by committing rSPSITEe225060c2ff1: Add direct journal writer to uffizi.
Wed, Jun 19, 12:25 PM · Mirror
olasd committed rSPSITEe225060c2ff1: Add direct journal writer to uffizi (authored by olasd).
Add direct journal writer to uffizi
Wed, Jun 19, 12:25 PM
olasd closed D1601: Add direct journal writer to uffizi.
Wed, Jun 19, 12:25 PM
olasd committed rSPSITE46f441f08c5d: Stop hardcoding journal brokers everywhere (authored by olasd).
Stop hardcoding journal brokers everywhere
Wed, Jun 19, 12:25 PM
olasd closed D1599: Stop hardcoding journal brokers everywhere.
Wed, Jun 19, 12:25 PM
olasd updated the diff for D1601: Add direct journal writer to uffizi.

Rebase; update client_id to match class hierarchy

Wed, Jun 19, 12:25 PM
olasd updated the diff for D1599: Stop hardcoding journal brokers everywhere.

Rebase

Wed, Jun 19, 12:23 PM

Tue, Jun 18

olasd added a comment to T1755: Create artifact release when 'releaseNotes' is in metadata .

(I'll pass on the underlying limitation of being forced to link to a release object from wikidata, which feels a bit artificial but is out of scope for this task)

Tue, Jun 18, 5:09 PM · SWORD deposit
olasd accepted D1605: Remove ReCAPTCHA configuration for swh-web.
Tue, Jun 18, 4:57 PM
olasd triaged T1829: Find a way to properly open the kafka brokers to the internet as High priority.
Tue, Jun 18, 4:02 PM · System administration, Mirror
olasd triaged T1828: Improve directory journal backfill performance as High priority.
Tue, Jun 18, 3:57 PM · Mirror, Journal
olasd triaged T1827: Tweak content backfill order to help content replayer as High priority.
Tue, Jun 18, 3:44 PM · Mirror, Journal
olasd added inline comments to D1595: Make the journal client create tasks for multiple origins instead of one at a time..
Tue, Jun 18, 3:17 PM
olasd added a revision to T1825: Deploy kafka direct journal_writer to main storage: D1601: Add direct journal writer to uffizi.
Tue, Jun 18, 3:12 PM · Mirror
olasd created D1601: Add direct journal writer to uffizi.
Tue, Jun 18, 3:12 PM
olasd accepted D1574: Added pyblake2 in py3 test dependency.

Thanks!

Tue, Jun 18, 3:11 PM
olasd created D1599: Stop hardcoding journal brokers everywhere.
Tue, Jun 18, 3:10 PM
olasd triaged T1825: Deploy kafka direct journal_writer to main storage as High priority.
Tue, Jun 18, 2:56 PM · Mirror
olasd closed T1600: Write a storage backend that writes to kafka as Resolved.
Tue, Jun 18, 2:54 PM · Sprint 2019 03
olasd requested changes to D1574: Added pyblake2 in py3 test dependency.

I don't think that extras_require is going to work.

Tue, Jun 18, 2:53 PM
olasd accepted D1596: Remove dependency on swh-core..
Tue, Jun 18, 1:43 PM

Mon, Jun 17

olasd triaged T1817: À la recherche du content perdu as Normal priority.
Mon, Jun 17, 5:59 PM · Archive content
olasd changed the visibility for F3540729: missing_contents.csv.
Mon, Jun 17, 5:50 PM
olasd accepted D1588: phabricator.lister: Use credentials setup from config file.
Mon, Jun 17, 5:44 PM · Lister
olasd closed T691: complete object storage mirror on Azure (meta task) as Resolved.

After processing the logs of the backfilling process to make sure to redo all the ranges that were interrupted in various database migrations, I'm now confident that this task is complete: we have a full mirror of all contents on Azure, which is kept up to date by the main archive storage backend writing synchronously to it.

Mon, Jun 17, 4:25 PM · General
olasd closed T691: complete object storage mirror on Azure (meta task), a subtask of T239: preserve at least 2 copies of each content object, as Resolved.
Mon, Jun 17, 4:25 PM · General
olasd added a comment to T1815: Use a FOSS alternative or drop Google ReCAPTCHA use.

I should have read django-simple-captcha doc, indeed its integration is not really straightforward for swh-web.
Currently, only the api endpoint for creating save requests is rate limited while the save code now form is submitted using Javascript
(validating input then setting the appropriate Django CSRF token before sending the POST request).
So without captcha, it will still be difficult for a dumb bot to spam us.
I would go for removing the captcha but add rate limiting to the form submission just in case.

Mon, Jun 17, 4:22 PM · Web app
olasd added a comment to T1389: Implement a base loader for package managers.
In T1389#33215, @zack wrote:

Thanks @olasd, @ardumont, and @anlambert for this, it's a great plan and I like it a lot !
Just a few comments on the sidelines:

The lister will generate a one-shot task to load each package for the given repository, with the full information needed to do the data fetching.

This seemed clear from a different part of the description, but fwiw here I'm assuming the plan is to only load the version of the packages not already known/ingested in the past.

Mon, Jun 17, 4:19 PM · Origin-npm, Origin-Pypi, Archive coverage
olasd added a comment to T693: public page showing the amount of objects (content count et al.) in 3rd party mirrors.

FWIW https://grafana.softwareheritage.org/d/jScG7g6mk/objstorage-object-counts?orgId=1 is a (not great) implementation of this.

Mon, Jun 17, 4:12 PM · Web app, Website
olasd added a comment to T1815: Use a FOSS alternative or drop Google ReCAPTCHA use.

As an alternative, we could just set the Django CSRF token on the form using a bit of Javascript code rather than the view sending it directly in the form, which would thwart most dumb bots (that's the "ReCAPTCHA alternatives for uncustomized spam > Javascript" section of the aforementioned article).

Mon, Jun 17, 2:29 PM · Web app
olasd added a comment to T1815: Use a FOSS alternative or drop Google ReCAPTCHA use.

With the post-hoc moderation of Save Code Now requests, do we really need a captcha? Isn't the base rate limiting enough?

Mon, Jun 17, 2:24 PM · Web app

Fri, Jun 14

olasd committed rSPSITE5569db2b8b16: Remove pglogical apt config (authored by olasd).
Remove pglogical apt config
Fri, Jun 14, 10:38 AM
olasd committed rSPSITE310f75602ed2: Add missing prometheus sql exporter configs to belvedere (authored by olasd).
Add missing prometheus sql exporter configs to belvedere
Fri, Jun 14, 1:18 AM

Thu, Jun 13

olasd committed rSPSITE2dae288a1efe: Add link-local route to private routing table (authored by olasd).
Add link-local route to private routing table
Thu, Jun 13, 4:46 PM

Fri, Jun 7

olasd added a comment to T691: complete object storage mirror on Azure (meta task).
  • The main archive currently synchronously writes all contents to Azure as well as the local storage (the gap is strictly closing)
  • all partitions from uffizi have been copied to azure and mass-injected (except for partition 8 which only got partially mass injected)
  • after this process, it looks like azure is missing 10% of all objects (excluding partition 8), which are all on banco
    • I've started a procedure to copy the missing objects from banco directly. Estimated time to completion ~ 1 month
    • The same procedure has been started to copy the missing objects from partition 8 on uffizi. Estimated time to completion ~ 15 days
Fri, Jun 7, 7:30 PM · General

Thu, Jun 6

olasd added a comment to T1776: packagist (PHP) Lister.

OK. Out of curiosity, would you be able to look at the location of dists instead? Just to get a sense of how much overlap there is with archives from GitHub.

I made a short script to analyse the location of dists for packages; here is the result.
Packages iterated - 15336 (~ 6% of total packages)
Number packages whose dists were not hosted on GitHub, or bitbucket or gitlab: 2
VCS for dists: zip, git

Thu, Jun 6, 3:39 PM · Lister, Archive coverage

Tue, Jun 4

olasd added a comment to T1776: packagist (PHP) Lister.
In T1776#32892, @olasd wrote:

There is a total of 220570 packages.
I made a short script to analyse the VCS and code hosting platform for packages, here is the result.
Packages iterated - 15126 (~ 6% of total packages)
VCS found - git, hg
Number of packages that were not hosted on GitHub, or bitbucket or gitlab: 51
There are expected to be around 744 packages that are not hosted on GitHub, or bitbucket or Gitlab if we calculate using applying unitary method on the above data.

In your analysis, did you look at the source key (which seems to represent the upstream version control repository) or the dist key (which points at the tarball/zipfile that's actually downloaded by the package manager when installing that version of the package)?

I looked at the source key in my analysis.

Tue, Jun 4, 6:35 PM · Lister, Archive coverage
olasd empowered rdicosmo as an administrator.
Tue, Jun 4, 5:44 PM
olasd added a comment to T1776: packagist (PHP) Lister.

Once we have the (released) dists figured out, we can also consider -as a second step- having the packagist lister submit tasks for the upstream repositories mentioned in the source key as well, as an extra data source.

Tue, Jun 4, 2:54 PM · Lister, Archive coverage
olasd added a comment to T1776: packagist (PHP) Lister.

There is a total of 220570 packages.
I made a short script to analyse the VCS and code hosting platform for packages, here is the result.
Packages iterated - 15126 (~ 6% of total packages)
VCS found - git, hg
Number of packages that were not hosted on GitHub, or bitbucket or gitlab: 51
There are expected to be around 744 packages that are not hosted on GitHub, or bitbucket or Gitlab if we calculate using applying unitary method on the above data.

Tue, Jun 4, 2:49 PM · Lister, Archive coverage
olasd accepted D1530: archiver: Remove archiver profile from swh_storage role.
Tue, Jun 4, 9:31 AM · Puppet recipes, Software Heritage Archiver
olasd accepted D1520: Fix origin_search in the in-mem storage to search for sub-strings..
Tue, Jun 4, 9:30 AM
olasd accepted D1521: Unify argument names of the pg and in-mem storage..

Some part of me wonders if it wouldn't make sense to unify the other way around, as the argument names you picked in the in-memory storage are way more sensible, but we've not succeeded in making a change in this area without breaking some (or all) clients...

Tue, Jun 4, 9:29 AM

Mon, Jun 3

olasd added a comment to T1734: Create a Lister for launchpad.net.

Hello, we are a group of M1 computer science students of the University of Montpellier, France.

Mon, Jun 3, 6:34 PM · Archive coverage
olasd accepted D1532: requirements.txt: Remove swh-archiver project from document build.

kthxbye

Mon, Jun 3, 6:12 PM
olasd accepted D1525: Prevent generation of empty branch names..
Mon, Jun 3, 5:07 PM
olasd added a comment to T1724: Maven Central repository Lister.

Thanks a lot to @hboutemy for your valuable insights on sources in the Maven central repository, and for the pointer to Reproducible Builds on the JVM.

Mon, Jun 3, 4:18 PM · GSoC 2019, Archive coverage

Tue, May 28

olasd added a comment to T1234: Allow simple read-only connections to db from swh nodes.

As mentioned in D1516 I don't think DNS provides the proper granularity for this; somerset is a "database replica" server only coincidentally; all its databases that aren't the replica of the main database are actually primaries.

Tue, May 28, 4:07 PM · System administration
olasd added a comment to D1516: Add dbreplica CNAME.

I'm not convinced this is such a good idea; this machine is way more than a "db replica" server (it only has one replica, most its databases are actually primary) and I don't think DNS provides the appropriate granularity level to record this information.

Tue, May 28, 3:33 PM · Staff
D1512: Exempt DINSIC from swh-web rate limiting is now accepted and ready to land.
Tue, May 28, 2:48 PM
olasd accepted D1512: Exempt DINSIC from swh-web rate limiting.
Tue, May 28, 2:47 PM
olasd accepted D1512: Exempt DINSIC from swh-web rate limiting.
Tue, May 28, 2:45 PM

Wed, May 22

olasd accepted D1499: vault: Setup new vangogh server.
Wed, May 22, 12:00 PM · Puppet recipes, Vault
olasd requested changes to D1499: vault: Setup new vangogh server.

Thanks for this change!

Wed, May 22, 11:06 AM · Puppet recipes, Vault
olasd requested changes to D1497: Maven Lister.

Thanks for this first pass at a Maven lister.

Wed, May 22, 10:18 AM

May 20 2019

olasd updated subscribers of T1389: Implement a base loader for package managers.

We've discussed a plausible plan for a "base package manager loader" with @ardumont and, to some extent, @anlambert.

May 20 2019, 6:03 PM · Origin-npm, Origin-Pypi, Archive coverage
olasd updated subscribers of D1487: Update scheduler task names to new ones.

I've got two questions that @ardumont or @moranegg probably have an opinion about :)

May 20 2019, 4:22 PM

May 16 2019

olasd accepted D1480: cli: rename the api-server command as rpc-serve.

Obviously needs a core release, but looks good to me.

May 16 2019, 6:08 PM
olasd added a comment to D1479: cli: add support for aliases in click command groups.

Please add click in requirements-test.txt :-)

May 16 2019, 6:07 PM
olasd accepted D1479: cli: add support for aliases in click command groups.
May 16 2019, 6:07 PM
olasd added a comment to T1716: Vault: Migrate vault infrastructure to azure.

Considering the size of that database, and the fact that we don't have any provisions to automatically spin up a new database server, I think it would make more sense to repatriate it on our main postgres setup, rather than movig it to a new machine on azure.

May 16 2019, 4:46 PM · Vault

May 15 2019

olasd added a comment to T1709: implement an R-cran lister.

Expanding on what Dirk Eddelbuettel posted on IRC when we talked about that, a minimal R script to fetch the current package information would be:

May 15 2019, 3:37 PM · GSoC 2019, Archive coverage
olasd added a comment to T1709: implement an R-cran lister.

Here is an implementation plan for making R-CRAN lister.
I have taken inspiration from the pypi lister.
To make lister.py for R-CRAN, we need to inherit SimpleLister class and override ingest_data() function and change its first line (where safely_issue_request() is called) to call the function which would run R script to return a json response.
Then after that it is quite like any normal response, we just need to implement following function list_packages, compute url, get_model_from_repo, task_dict and transport_response_simplified.

May 15 2019, 11:14 AM · GSoC 2019, Archive coverage

May 14 2019

olasd closed T1604: Improve kafka deployment as Resolved.

So that took a few tries in puppet, but adding new brokers to the kafka deployment should now be seamless.

May 14 2019, 7:49 PM · System administration, Sprint 2019 03
olasd committed rSPSITE5e38eacfeb65: Figure out the zookeeper id from the zookeeper::servers hash (authored by olasd).
Figure out the zookeeper id from the zookeeper::servers hash
May 14 2019, 6:52 PM
olasd committed rSPSITE520d9be9c05c: Properly set the zookeeper config on kafka brokers (authored by olasd).
Properly set the zookeeper config on kafka brokers
May 14 2019, 6:52 PM
olasd committed rSPSITE1fb8fddc0f1d: add ddouard to swhscheduler (authored by olasd).
add ddouard to swhscheduler
May 14 2019, 6:03 PM
olasd committed rSPSITE64b4a67efca6: Create kafka log directories (authored by olasd).
Create kafka log directories
May 14 2019, 5:41 PM
olasd committed rSPSITEca01c0e02614: Add esnode1,2,3 as kafka brokers (authored by olasd).
Add esnode1,2,3 as kafka brokers
May 14 2019, 5:34 PM
olasd committed rSPSITEa5c4cb85dca6: Add kafka broker profile to elasticsearch nodes (authored by olasd).
Add kafka broker profile to elasticsearch nodes
May 14 2019, 5:34 PM
olasd committed rSPSITEb05231022ea8: Switch the elasticsearch node matching to a regexp (authored by olasd).
Switch the elasticsearch node matching to a regexp
May 14 2019, 5:34 PM
olasd committed rSPSITE7fcfb862c203: Finish kafka upgrade (authored by olasd).
Finish kafka upgrade
May 14 2019, 5:29 PM
olasd committed rSPSITE8a8e6cc25040: Finish journal_publisher undeployment (authored by olasd).
Finish journal_publisher undeployment
May 14 2019, 5:27 PM
olasd committed rSPSITE96011d108947: Finish storage_listener undeployment (authored by olasd).
Finish storage_listener undeployment
May 14 2019, 5:27 PM
olasd committed rSPSITEabb17a997f80: Undeploy the journal publisher (authored by olasd).
Undeploy the journal publisher
May 14 2019, 5:20 PM
olasd committed rSPSITE78d3e593e3e8: Undeploy the storage listener (authored by olasd).
Undeploy the storage listener
May 14 2019, 5:20 PM
olasd committed rSPSITE9ab89dcfa8c9: Upgrade kafka to 2.2.0 (authored by olasd).
Upgrade kafka to 2.2.0
May 14 2019, 5:09 PM
olasd committed rSPSITE472e579169d9: Set the kafka broker id from the kafka::brokers hash rather than duplicate it (authored by olasd).
Set the kafka broker id from the kafka::brokers hash rather than duplicate it
May 14 2019, 5:09 PM
olasd accepted D1466: Remove donwloaded package files once they have been processed.

Looks good, thanks!

May 14 2019, 5:08 PM
olasd added a comment to T1689: enable landing patches via the web UI for all repos.
In T1689#31539, @zack wrote:

I'm not a fan of automated merging when the test suite passes. I think merging code from third parties should always be a manual step by someone. If it can be made a Web UI button, even better, if not we will just standardize to the local arc workflow, which I've recently documented in the wiki.

May 14 2019, 4:39 PM · Development environment, Phabricator
olasd added a comment to T1689: enable landing patches via the web UI for all repos.

(looks like Herald can't easily trigger on a comment, because that'd be too easy)

May 14 2019, 2:55 PM · Development environment, Phabricator
olasd changed the visibility for F3520956: PartialDescriptiveArctichare.webm.
May 14 2019, 2:41 PM
olasd added a comment to T1689: enable landing patches via the web UI for all repos.

So, according to the fine manual (https://secure.phabricator.com/book/phabricator/article/differential_land/) the "Land Revision" button on the Phabricator user interface depends on setting up:

May 14 2019, 2:37 PM · Development environment, Phabricator
olasd added a comment to T1685: move main website from www.s.o to s.o.

We're already using LiveDNS for the softwareheritage.org zone, so DNS switchover time should not be an issue (all the more so considering the softwareheritage.org -> www.softwareheritage.org redirect is half-broken already).

May 14 2019, 2:18 PM · System administration, Website

May 13 2019

olasd closed T1698: Make sure Grafana dashboards are backed up as Resolved.

The grafana dashboards are stored in the postgresql database on pergamon, which is backed up through the full system backups.

May 13 2019, 1:58 PM · Sprint 2018 12, System administration