Page MenuHomeSoftware Heritage
Feed Advanced Search

Apr 22 2021

douardda added a comment to T3200: Mirror: year is out of range.

Ah fun, one of the revisions with this pb, on staging (ba3343bc4fa403a8dfbfcab7fc1a8c29ee34bd69) seems to have been crafted by https://gitlab.com/gitlab-org/gitlab-foss/-/blob/staging-26-fix_add_deploy_key_spec/spec/models/merge_request_diff_commit_spec.rb

Apr 22 2021, 12:05 PM · Mirror

Apr 21 2021

douardda added a comment to T3200: Mirror: year is out of range.

See T3170 (error generated by the same invalid kafka messages).

Apr 21 2021, 6:58 PM · Mirror

Apr 19 2021

olasd closed T2003: Content replayer may try to copy objects before they are available from an objstorage, a subtask of T1914: Keep mirror of contents on S3 up to date, as Resolved.
Apr 19 2021, 12:06 PM · Mirror, Datasets

Apr 8 2021

douardda added a comment to T3198: Mirror: unexpected closed connection to the pg server.

Just got this one below. Note that this occurred just when the replayer actually started to insert object in the storage (before that, since the start of the replayer process, only kafka scaffolding took place for quite some time, around 30mn!)

Apr 8 2021, 12:02 PM · Mirror
douardda triaged T3218: The graph replayer generates REQTMOUT Timeout errors as High priority.
Apr 8 2021, 11:44 AM · Mirror

Apr 6 2021

douardda closed T3201: Mirror: unsupported Unicode escape sequence as Resolved by committing rDSTO39507b24d0f4: Make the replayer drop the Revision.metadata.
Apr 6 2021, 4:42 PM · Mirror
douardda closed T3201: Mirror: unsupported Unicode escape sequence, a subtask of T3197: Mirror: fix common issues of a replayer session, as Resolved.
Apr 6 2021, 4:42 PM · Mirror
vlorentz added a revision to T3201: Mirror: unsupported Unicode escape sequence: D5414: Make the replayer drop the Revision.metadata.
Apr 6 2021, 4:25 PM · Mirror
vlorentz added a subtask for T3201: Mirror: unsupported Unicode escape sequence: T3089: Remove the 'metadata' column of the 'revision' table.
Apr 6 2021, 2:20 PM · Mirror

Apr 2 2021

douardda added a comment to T3197: Mirror: fix common issues of a replayer session.

Currently, the mirror test session is running with:

Apr 2 2021, 10:15 AM · Mirror
douardda added a comment to T3201: Mirror: unsupported Unicode escape sequence.

easy fix: modify the replayer to ignore this 'metadata' column while inserting revisions

Apr 2 2021, 10:05 AM · Mirror
douardda added a comment to T3201: Mirror: unsupported Unicode escape sequence.
09:45 <+vlorentz> douardda: yes and the only way around it (short of dropping data) is T3089
09:46 -swhbot:#swh-devel- T3089 (submitter: vlorentz, owner: vlorentz, status: Open): Remove the 'metadata' column of the 'revision' table <https://forge.softwareheritage.org/T3089>
09:46 <+vlorentz> or switching to cassandra
09:46 <+vlorentz> the good news is, they couldn't be inserted in the storage either, so you can safely drop them for now
Apr 2 2021, 9:59 AM · Mirror
douardda triaged T3201: Mirror: unsupported Unicode escape sequence as High priority.
Apr 2 2021, 9:54 AM · Mirror
douardda triaged T3200: Mirror: year is out of range as High priority.
Apr 2 2021, 9:51 AM · Mirror
douardda triaged T3199: Mirror: key value violates unique constraint "person_fullname_idx" as High priority.
Apr 2 2021, 9:48 AM · Mirror
douardda triaged T3198: Mirror: unexpected closed connection to the pg server as High priority.
Apr 2 2021, 9:47 AM · Mirror
douardda triaged T3197: Mirror: fix common issues of a replayer session as High priority.
Apr 2 2021, 9:41 AM · Mirror

Mar 15 2021

vlorentz triaged T3116: Roll out at least one operational mirror as Normal priority.
Mar 15 2021, 12:28 PM · Roadmap 2022, Unknown Object (Project), Mirror, Roadmap 2021, meta-task

Mar 11 2021

rdicosmo updated the task description for T3116: Roll out at least one operational mirror.
Mar 11 2021, 7:59 PM · Roadmap 2022, Unknown Object (Project), Mirror, Roadmap 2021, meta-task
rdicosmo merged task T2209: At least 2 full mirrors up and running into T3116: Roll out at least one operational mirror.
Mar 11 2021, 7:57 PM · Mirror, Roadmap 2020
rdicosmo merged T2209: At least 2 full mirrors up and running into T3116: Roll out at least one operational mirror.
Mar 11 2021, 7:57 PM · Roadmap 2022, Unknown Object (Project), Mirror, Roadmap 2021, meta-task
rdicosmo added a parent task for T2914: mirror documentation: add ballpark storage/infra requirements: T3116: Roll out at least one operational mirror.
Mar 11 2021, 7:56 PM · Mirror, Documentation
rdicosmo added a parent task for T1576: document the typical cost(s) of hosting an archive mirror: T3116: Roll out at least one operational mirror.
Mar 11 2021, 7:56 PM · Documentation, Mirror
rdicosmo added subtasks for T3116: Roll out at least one operational mirror: T2914: mirror documentation: add ballpark storage/infra requirements, T1576: document the typical cost(s) of hosting an archive mirror.
Mar 11 2021, 7:56 PM · Roadmap 2022, Unknown Object (Project), Mirror, Roadmap 2021, meta-task
rdicosmo added a subtask for T3116: Roll out at least one operational mirror: T3054: Scale out object storage design.
Mar 11 2021, 7:55 PM · Roadmap 2022, Unknown Object (Project), Mirror, Roadmap 2021, meta-task
rdicosmo created T3116: Roll out at least one operational mirror.
Mar 11 2021, 7:54 PM · Roadmap 2022, Unknown Object (Project), Mirror, Roadmap 2021, meta-task

Mar 4 2021

rdicosmo merged task T1914: Keep mirror of contents on S3 up to date into T1954: Up-to-date objstorage mirror on S3.
Mar 4 2021, 5:44 PM · Mirror, Datasets

Dec 23 2020

zack triaged T2914: mirror documentation: add ballpark storage/infra requirements as Normal priority.
Dec 23 2020, 1:55 PM · Mirror, Documentation

Nov 17 2020

olasd closed T1829: Find a way to properly open the kafka brokers to the internet as Resolved.

The new cluster in rocquencourt is using the built-in Kafka ACLs now (9993a81ffc7a1c8bd519b33ae63ac1145105f624).

Nov 17 2020, 6:33 PM · System administration, Mirror

Oct 16 2020

douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Same as before but with 1M (fresh) sha1s:

Oct 16 2020, 1:02 PM · Object storage, Mirror
douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Since the results on uffizi above did suffer from a few caveats, I've made a few more tests:

  • a first result has been obtained with a dataset that had only objects stored on the XFS part of the objstorage
  • a second dataset has been created (with the order by sha256 part to spread the sha1s)
  • but results are a mix hot/cold cache tests
Oct 16 2020, 11:59 AM · Object storage, Mirror

Oct 15 2020

douardda added a comment to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3).

Some results:

Oct 15 2020, 1:02 PM · Object storage, Mirror
zack added projects to T2706: Benchmark objstorage for mirror (uffizi vs. azure vs. s3): Mirror, Object storage.
Oct 15 2020, 12:44 PM · Object storage, Mirror

Sep 22 2020

olasd added a comment to T1828: Improve directory journal backfill performance.

(the backfill had, in fact, completed within a month)

Sep 22 2020, 6:14 PM · Mirror, Journal
olasd closed T1828: Improve directory journal backfill performance as Resolved.

At this point, I don't think we'll make it much better with postgres as source.

Sep 22 2020, 6:14 PM · Mirror, Journal
moranegg moved T1576: document the typical cost(s) of hosting an archive mirror from Backlog to sponsors/clients on the Documentation board.
Sep 22 2020, 3:08 PM · Documentation, Mirror

Apr 28 2020

olasd closed T2350: Support large messages in swh.journal / kafka, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, as Resolved.
Apr 28 2020, 11:28 AM · Mirror, Journal
olasd closed T2348: swh.journal silently loses large objects instead of rejecting them as Resolved.

The kafka producer in swh.journal now reads message receipts and fails if they're negative, or if they didn't arrive within two minutes.

Apr 28 2020, 11:27 AM · Mirror, Journal
olasd closed T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, as Resolved.
Apr 28 2020, 11:24 AM · Mirror, Journal

Apr 15 2020

olasd changed the status of T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, from Open to Work in Progress.
Apr 15 2020, 10:27 AM · Mirror, Journal
olasd closed T2349: Make the journal writer reliable, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, as Resolved.
Apr 15 2020, 10:15 AM · Mirror, Journal

Apr 6 2020

olasd triaged T2348: swh.journal silently loses large objects instead of rejecting them as High priority.
Apr 6 2020, 10:22 PM · Mirror, Journal

Feb 10 2020

olasd added a comment to T1829: Find a way to properly open the kafka brokers to the internet.
c = swh.journal.client.JournalClient(**{
    'group_id': 'olasd-test-sasl-1',
    'brokers': ['kafka%02d.euwest.azure.softwareheritage.org:9093' % i for i in range(1,7)],
    'security.protocol': 'SASL_SSL',
    'sasl.mechanisms': 'SCRAM-SHA-512',
    'sasl.username': '<username>',
    'sasl.password': '<password>',
    'debug': 'consumer',
})

(yes, passing dotted config parameters in kwargs is... not the cleanest)

Feb 10 2020, 3:54 PM · System administration, Mirror

Feb 7 2020

olasd added a comment to T1829: Find a way to properly open the kafka brokers to the internet.

Following documentation with the following links:

Feb 7 2020, 7:08 PM · System administration, Mirror

Jan 23 2020

olasd added a comment to T1914: Keep mirror of contents on S3 up to date.

We've now hit T2003 hard as the client caught up with the head of the local kafka cluster. That's why the curve is flattening out currently, as I stopped the replayers until the queue is implemented.

Jan 23 2020, 2:17 PM · Mirror, Datasets

Jan 22 2020

vlorentz added a project to T2209: At least 2 full mirrors up and running: Mirror.
Jan 22 2020, 4:39 PM · Mirror, Roadmap 2020

Dec 7 2019

olasd added a comment to T1914: Keep mirror of contents on S3 up to date.

We'll need to address T2003 before this can be closed (if we go the journal client route), so marking accordingly.

Dec 7 2019, 6:35 PM · Mirror, Datasets
olasd added a subtask for T1914: Keep mirror of contents on S3 up to date: T2003: Content replayer may try to copy objects before they are available from an objstorage.
Dec 7 2019, 6:35 PM · Mirror, Datasets
olasd renamed T1914: Keep mirror of contents on S3 up to date from synchronously write content objects to AWS during ingestion to Keep mirror of contents on S3 up to date.
Dec 7 2019, 6:35 PM · Mirror, Datasets
olasd closed T1827: Tweak content backfill order to help content replayer as Resolved.

I've launched 16 content backfillers in parallel for each hex digit prefix which should help with this.

Dec 7 2019, 6:33 PM · Mirror, Journal
olasd added a comment to T1914: Keep mirror of contents on S3 up to date.

I don't think we're going to do this but rather use the journal client approach. (Even more so considering that writing to S3 takes 500ms for each object, which sounds like a silly artificial limit to put on a synchronous process).

Dec 7 2019, 6:32 PM · Mirror, Datasets
olasd merged task T1899: complete object storage mirror on AWS into T1954: Up-to-date objstorage mirror on S3.
Dec 7 2019, 6:30 PM · Mirror, Datasets

Aug 26 2019

olasd added a comment to T1829: Find a way to properly open the kafka brokers to the internet.

The content topic has fully replicated to the new cluster over the weekend.

Aug 26 2019, 8:57 AM · System administration, Mirror

Aug 23 2019

olasd changed the status of T1829: Find a way to properly open the kafka brokers to the internet from Open to Work in Progress.

A new Kafka cluster has been spun up on azure virtual machines, with 6 machines each with 8TB of storage available.

Aug 23 2019, 6:45 PM · System administration, Mirror

Jul 14 2019

zack renamed T1914: Keep mirror of contents on S3 up to date from synchronously write content objects to AWS to synchronously write content objects to AWS during ingestion.
Jul 14 2019, 4:48 PM · Mirror, Datasets
zack triaged T1914: Keep mirror of contents on S3 up to date as High priority.
Jul 14 2019, 4:47 PM · Mirror, Datasets

Jul 9 2019

zack triaged T1899: complete object storage mirror on AWS as Normal priority.
Jul 9 2019, 10:59 AM · Mirror, Datasets

Jun 28 2019

douardda added a comment to T1828: Improve directory journal backfill performance.

1 month is good enough. Let's stick to this.

Jun 28 2019, 10:13 AM · Mirror, Journal

Jun 25 2019

olasd changed the status of T1828: Improve directory journal backfill performance from Open to Work in Progress.

With 16 processes in parallel still, adding more CPUs gives an ETA of ~1 month, which stays pretty bad.

Jun 25 2019, 6:31 PM · Mirror, Journal
olasd added a comment to T1828: Improve directory journal backfill performance.

Running the directory backfiller (single instance) against belvedere yields an ETA of 250 days, which is around a 3x speedup from somerset.

Jun 25 2019, 2:54 PM · Mirror, Journal
douardda added a comment to T1828: Improve directory journal backfill performance.

have we now any insight on the behavior of the backfiller against belvedere?

Jun 25 2019, 9:27 AM · Mirror, Journal
douardda added a comment to T1827: Tweak content backfill order to help content replayer.

I'm enclined to prefer option 2, since performance is an issue we cannot underestimate...

Jun 25 2019, 9:25 AM · Mirror, Journal

Jun 19 2019

olasd closed T1825: Deploy kafka direct journal_writer to main storage as Resolved by committing rSPSITEe225060c2ff1: Add direct journal writer to uffizi.
Jun 19 2019, 12:25 PM · Mirror

Jun 18 2019

olasd triaged T1829: Find a way to properly open the kafka brokers to the internet as High priority.
Jun 18 2019, 4:02 PM · System administration, Mirror
olasd triaged T1828: Improve directory journal backfill performance as High priority.
Jun 18 2019, 3:57 PM · Mirror, Journal
olasd triaged T1827: Tweak content backfill order to help content replayer as High priority.
Jun 18 2019, 3:44 PM · Mirror, Journal
olasd added a revision to T1825: Deploy kafka direct journal_writer to main storage: D1601: Add direct journal writer to uffizi.
Jun 18 2019, 3:12 PM · Mirror
olasd triaged T1825: Deploy kafka direct journal_writer to main storage as High priority.
Jun 18 2019, 2:56 PM · Mirror

Mar 11 2019

zack renamed T1576: document the typical cost(s) of hosting an archive mirror from document the typical cost(s) of hosting a mirror to document the typical cost(s) of hosting an archive mirror.
Mar 11 2019, 6:12 PM · Documentation, Mirror
zack triaged T1576: document the typical cost(s) of hosting an archive mirror as Normal priority.
Mar 11 2019, 6:10 PM · Documentation, Mirror
zack renamed Mirror from Mirror tooling to Mirror.
Mar 11 2019, 6:07 PM
zack created Mirror.
Mar 11 2019, 6:06 PM