Page MenuHomeSoftware Heritage

MirrorFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

stuff related to the mirroring infrastructure, protocol, and tooling used to maintain the Software Heritage mirror network

Recent Activity

Fri, Oct 16

douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Same as before but with 1M (fresh) sha1s:

Fri, Oct 16, 1:02 PM · Object storage, Mirror
douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Since the results on uffizi above did suffer from a few caveats, I've made a few more tests:

  • a first result has been obtained with a dataset that had only objects stored on the XFS part of the objstorage
  • a second dataset has been created (with the order by sha256 part to spread the sha1s)
  • but results are a mix hot/cold cache tests
Fri, Oct 16, 11:59 AM · Object storage, Mirror

Thu, Oct 15

douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Some results:

Thu, Oct 15, 1:02 PM · Object storage, Mirror
zack added projects to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3): Mirror, Object storage.
Thu, Oct 15, 12:44 PM · Object storage, Mirror

Sep 22 2020

olasd added a comment to T1828: Improve directory journal backfill performance.

(the backfill had, in fact, completed within a month)

Sep 22 2020, 6:14 PM · Mirror, Journal
olasd closed T1828: Improve directory journal backfill performance as Resolved.

At this point, I don't think we'll make it much better with postgres as source.

Sep 22 2020, 6:14 PM · Mirror, Journal
moranegg moved T1576: document the typical cost(s) of hosting an archive mirror from Backlog to sponsors/clients on the Documentation board.
Sep 22 2020, 3:08 PM · Documentation, Mirror

Apr 28 2020

olasd closed T2350: Support large messages in swh.journal / kafka, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, as Resolved.
Apr 28 2020, 11:28 AM · Mirror, Journal
olasd closed T2348: swh.journal silently loses large objects instead of rejecting them as Resolved.

The kafka producer in swh.journal now reads message receipts and fails if they're negative, or if they didn't arrive within two minutes.

Apr 28 2020, 11:27 AM · Mirror, Journal
olasd closed T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, as Resolved.
Apr 28 2020, 11:24 AM · Mirror, Journal

Apr 15 2020

olasd changed the status of T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, from Open to Work in Progress.
Apr 15 2020, 10:27 AM · Mirror, Journal
olasd closed T2349: Make the journal writer reliable, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, as Resolved.
Apr 15 2020, 10:15 AM · Mirror, Journal

Apr 6 2020

olasd triaged T2348: swh.journal silently loses large objects instead of rejecting them as High priority.
Apr 6 2020, 10:22 PM · Mirror, Journal

Feb 10 2020

olasd added a comment to T1829: Find a way to properly open the kafka brokers to the internet.
c = swh.journal.client.JournalClient(**{
    'group_id': 'olasd-test-sasl-1',
    'brokers': ['kafka%02d.euwest.azure.softwareheritage.org:9093' % i for i in range(1,7)],
    'security.protocol': 'SASL_SSL',
    'sasl.mechanisms': 'SCRAM-SHA-512',
    'sasl.username': '<username>',
    'sasl.password': '<password>',
    'debug': 'consumer',
})

(yes, passing dotted config parameters in kwargs is... not the cleanest)

Feb 10 2020, 3:54 PM · System administration, Mirror

Feb 7 2020

olasd added a comment to T1829: Find a way to properly open the kafka brokers to the internet.

Following documentation with the following links:

Feb 7 2020, 7:08 PM · System administration, Mirror

Jan 23 2020

olasd added a comment to T1914: Keep mirror of contents on S3 up to date.

We've now hit T2003 hard as the client caught up with the head of the local kafka cluster. That's why the curve is flattening out currently, as I stopped the replayers until the queue is implemented.

Jan 23 2020, 2:17 PM · Mirror, Datasets

Jan 22 2020

vlorentz added a project to T2209: At least 2 full mirrors up and running: Mirror.
Jan 22 2020, 4:39 PM · Mirror, Restricted Project

Dec 7 2019

olasd added a comment to T1914: Keep mirror of contents on S3 up to date.

We'll need to address T2003 before this can be closed (if we go the journal client route), so marking accordingly.

Dec 7 2019, 6:35 PM · Mirror, Datasets
olasd added a subtask for T1914: Keep mirror of contents on S3 up to date: T2003: Content replayer may try to copy objects before they are available from an objstorage.
Dec 7 2019, 6:35 PM · Mirror, Datasets
olasd renamed T1914: Keep mirror of contents on S3 up to date from synchronously write content objects to AWS during ingestion to Keep mirror of contents on S3 up to date.
Dec 7 2019, 6:35 PM · Mirror, Datasets
olasd closed T1827: Tweak content backfill order to help content replayer as Resolved.

I've launched 16 content backfillers in parallel for each hex digit prefix which should help with this.

Dec 7 2019, 6:33 PM · Mirror, Journal
olasd added a comment to T1914: Keep mirror of contents on S3 up to date.

I don't think we're going to do this but rather use the journal client approach. (Even more so considering that writing to S3 takes 500ms for each object, which sounds like a silly artificial limit to put on a synchronous process).

Dec 7 2019, 6:32 PM · Mirror, Datasets
olasd merged task T1899: complete object storage mirror on AWS into T1954: Up-to-date objstorage mirror on S3.
Dec 7 2019, 6:30 PM · Mirror, Datasets

Aug 26 2019

olasd added a comment to T1829: Find a way to properly open the kafka brokers to the internet.

The content topic has fully replicated to the new cluster over the weekend.

Aug 26 2019, 8:57 AM · System administration, Mirror

Aug 23 2019

olasd changed the status of T1829: Find a way to properly open the kafka brokers to the internet from Open to Work in Progress.

A new Kafka cluster has been spun up on azure virtual machines, with 6 machines each with 8TB of storage available.

Aug 23 2019, 6:45 PM · System administration, Mirror

Jul 14 2019

zack renamed T1914: Keep mirror of contents on S3 up to date from synchronously write content objects to AWS to synchronously write content objects to AWS during ingestion.
Jul 14 2019, 4:48 PM · Mirror, Datasets
zack triaged T1914: Keep mirror of contents on S3 up to date as High priority.
Jul 14 2019, 4:47 PM · Mirror, Datasets

Jul 9 2019

zack triaged T1899: complete object storage mirror on AWS as Normal priority.
Jul 9 2019, 10:59 AM · Mirror, Datasets

Jun 28 2019

douardda added a comment to T1828: Improve directory journal backfill performance.

1 month is good enough. Let's stick to this.

Jun 28 2019, 10:13 AM · Mirror, Journal

Jun 25 2019

olasd changed the status of T1828: Improve directory journal backfill performance from Open to Work in Progress.

With 16 processes in parallel still, adding more CPUs gives an ETA of ~1 month, which stays pretty bad.

Jun 25 2019, 6:31 PM · Mirror, Journal
olasd added a comment to T1828: Improve directory journal backfill performance.

Running the directory backfiller (single instance) against belvedere yields an ETA of 250 days, which is around a 3x speedup from somerset.

Jun 25 2019, 2:54 PM · Mirror, Journal
douardda added a comment to T1828: Improve directory journal backfill performance.

have we now any insight on the behavior of the backfiller against belvedere?

Jun 25 2019, 9:27 AM · Mirror, Journal
douardda added a comment to T1827: Tweak content backfill order to help content replayer.

I'm enclined to prefer option 2, since performance is an issue we cannot underestimate...

Jun 25 2019, 9:25 AM · Mirror, Journal

Jun 19 2019

olasd closed T1825: Deploy kafka direct journal_writer to main storage as Resolved by committing rSPSITEe225060c2ff1: Add direct journal writer to uffizi.
Jun 19 2019, 12:25 PM · Mirror

Jun 18 2019

olasd triaged T1829: Find a way to properly open the kafka brokers to the internet as High priority.
Jun 18 2019, 4:02 PM · System administration, Mirror
olasd triaged T1828: Improve directory journal backfill performance as High priority.
Jun 18 2019, 3:57 PM · Mirror, Journal
olasd triaged T1827: Tweak content backfill order to help content replayer as High priority.
Jun 18 2019, 3:44 PM · Mirror, Journal
olasd added a revision to T1825: Deploy kafka direct journal_writer to main storage: D1601: Add direct journal writer to uffizi.
Jun 18 2019, 3:12 PM · Mirror
olasd triaged T1825: Deploy kafka direct journal_writer to main storage as High priority.
Jun 18 2019, 2:56 PM · Mirror

Mar 11 2019

zack renamed T1576: document the typical cost(s) of hosting an archive mirror from document the typical cost(s) of hosting a mirror to document the typical cost(s) of hosting an archive mirror.
Mar 11 2019, 6:12 PM · Documentation, Mirror
zack triaged T1576: document the typical cost(s) of hosting an archive mirror as Normal priority.
Mar 11 2019, 6:10 PM · Documentation, Mirror
zack renamed Mirror from Mirror tooling to Mirror.
Mar 11 2019, 6:07 PM
zack created Mirror.
Mar 11 2019, 6:06 PM