Page MenuHomeSoftware Heritage
Feed All Stories

Jan 28 2020

ardumont closed D2592: webapp0: Switch to cassandra storage.

Landed.

Jan 28 2020, 5:29 PM
ardumont updated the test plan for D2592: webapp0: Switch to cassandra storage.
Jan 28 2020, 5:29 PM
ardumont updated the diff for D2592: webapp0: Switch to cassandra storage.

Move the cassandra storage's dependencies to a profile

Jan 28 2020, 5:28 PM
ardumont abandoned D2595: @danseraf: change public key.
Jan 28 2020, 5:15 PM
ardumont commandeered D2595: @danseraf: change public key.

landed

Jan 28 2020, 5:14 PM
ardumont accepted D2596: Use Cassandra 4.0 (alpha) instead of 3.11..
Jan 28 2020, 4:25 PM
ardumont accepted D2595: @danseraf: change public key.

accepted, will land it soon.

Jan 28 2020, 4:25 PM
ardumont added inline comments to D2592: webapp0: Switch to cassandra storage.
Jan 28 2020, 4:19 PM
olasd added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.

Now that I think of it, we can decompose this in stages in the storage pipeline:

Jan 28 2020, 3:38 PM · Journal
olasd added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.

@olasd I'm worried that implementing your idea would result in some complex piece of code.

Jan 28 2020, 3:30 PM · Journal
vlorentz created D2596: Use Cassandra 4.0 (alpha) instead of 3.11..
Jan 28 2020, 3:26 PM
DanSeraf created D2595: @danseraf: change public key.
Jan 28 2020, 3:16 PM
olasd accepted D2592: webapp0: Switch to cassandra storage.
Jan 28 2020, 3:08 PM
swh-public-ci added a comment to D2566: Add Cassandra backend..

Build has FAILED

Jan 28 2020, 3:06 PM
ardumont added a comment to D2592: webapp0: Switch to cassandra storage.

you're making the cassandra driver in the same process as the webapp. Are you sure you want to do that?

Jan 28 2020, 3:03 PM
vlorentz added a comment to D2594: Empty Cassandra's jvm.options, in order for it to work on Java >=9..

Fix crash of Cassandra on Java >=9

ugh

i checked and we do use the jvm 8 on the cassandra nodes.

Still, sounds like migrating to cassandra will raise quite the issues to deploy.

Jan 28 2020, 3:00 PM
vlorentz committed rCDFJc11c6cee946f: Empty Cassandra's jvm.options, in order for it to work on Java >=9. (authored by vlorentz).
Empty Cassandra's jvm.options, in order for it to work on Java >=9.
Jan 28 2020, 2:56 PM
vlorentz closed D2594: Empty Cassandra's jvm.options, in order for it to work on Java >=9..
Jan 28 2020, 2:56 PM
vlorentz updated the diff for D2594: Empty Cassandra's jvm.options, in order for it to work on Java >=9..

fix commit message

Jan 28 2020, 2:55 PM
ardumont accepted D2594: Empty Cassandra's jvm.options, in order for it to work on Java >=9..

Fix crash of Cassandra on Java >=9

Jan 28 2020, 2:55 PM
vlorentz retitled D2594: Empty Cassandra's jvm.options, in order for it to work on Java >=9. from Empty Cassandra's jvm.options, in order for it to work on Java 8. to Empty Cassandra's jvm.options, in order for it to work on Java >=9..
Jan 28 2020, 2:55 PM
vlorentz added a comment to D2592: webapp0: Switch to cassandra storage.

you're making the cassandra driver in the same process as the webapp. Are you sure you want to do that?

Jan 28 2020, 2:50 PM
vlorentz created D2594: Empty Cassandra's jvm.options, in order for it to work on Java >=9..
Jan 28 2020, 2:45 PM
vlorentz added inline comments to D2589: Add return type to get_storage..
Jan 28 2020, 2:42 PM
ardumont added inline comments to D2585: Add tests for db_transaction and db_transaction_generator..
Jan 28 2020, 2:42 PM
vlorentz added inline comments to D2585: Add tests for db_transaction and db_transaction_generator..
Jan 28 2020, 2:41 PM
ardumont updated the diff for D2592: webapp0: Switch to cassandra storage.

Fix python3-cassandra package name

Jan 28 2020, 2:32 PM
ardumont updated the diff for D2592: webapp0: Switch to cassandra storage.
  • Properly switch to webapp0's storage to cassandra storage
  • Add conditional to install the python3-cassandra when needed
Jan 28 2020, 2:32 PM
ardumont added a comment to D2592: webapp0: Switch to cassandra storage.

We should also make sure that the optional cassandra dependencies are pulled in by the storage deployment (unless swh.storage grew hardcoded dependencies on the cassandra stuff, in which case we're fine).

Jan 28 2020, 2:18 PM
ardumont updated the test plan for D2592: webapp0: Switch to cassandra storage.
Jan 28 2020, 2:17 PM
ardumont updated the test plan for D2592: webapp0: Switch to cassandra storage.
Jan 28 2020, 2:15 PM
ardumont requested changes to D2589: Add return type to get_storage..
Jan 28 2020, 2:13 PM
ardumont accepted D2585: Add tests for db_transaction and db_transaction_generator..
Jan 28 2020, 2:10 PM
swh-public-ci added a comment to D2586: Make db_transaction* remove db/cur from the signature..

Build is green
See https://jenkins.softwareheritage.org/job/DCORE/job/tox/398/ for more details.

Jan 28 2020, 2:09 PM
vlorentz updated the diff for D2586: Make db_transaction* remove db/cur from the signature..

fix wrapped function name

Jan 28 2020, 2:07 PM
Harbormaster failed to build B10318: rDWAPPS42f4b086a393: lookup missing hashes in the storage for rDWAPPS42f4b086a393: lookup missing hashes in the storage!
Jan 28 2020, 2:03 PM
swh-public-ci added a comment to D2566: Add Cassandra backend..

Build has FAILED

Jan 28 2020, 2:00 PM
vlorentz renamed T2253: Install DynamicPageList extension on the public wiki from Install DynamicPageList on the public wiki to Install DynamicPageList extension on the public wiki.
Jan 28 2020, 1:56 PM · System administration
vlorentz triaged T2253: Install DynamicPageList extension on the public wiki as Normal priority.
Jan 28 2020, 1:56 PM · System administration
Harbormaster failed remote builds in B10317: Diff 9258 for D2566: Add Cassandra backend.!
Jan 28 2020, 1:53 PM
swh-public-ci added a comment to D2566: Add Cassandra backend..

Build was aborted

Jan 28 2020, 1:53 PM
DanSeraf committed rDWAPPS773ee1a21b9c: /known/ api endpoint (authored by DanSeraf).
/known/ api endpoint
Jan 28 2020, 1:52 PM
DanSeraf committed rDWAPPS42f4b086a393: lookup missing hashes in the storage (authored by DanSeraf).
lookup missing hashes in the storage
Jan 28 2020, 1:52 PM
DanSeraf committed rDWAPPS7765946353da: group persistent identifiers by their type (authored by DanSeraf).
group persistent identifiers by their type
Jan 28 2020, 1:52 PM
DanSeraf closed D2582: Web API endpoint /known/.
Jan 28 2020, 1:52 PM
vlorentz updated the diff for D2566: Add Cassandra backend..

fix command

Jan 28 2020, 1:52 PM
swh-public-ci added a comment to D2566: Add Cassandra backend..

Build has FAILED

Jan 28 2020, 1:48 PM
swh-public-ci added a comment to D2566: Add Cassandra backend..

Build was aborted

Jan 28 2020, 1:43 PM
Harbormaster failed remote builds in B10316: Diff 9257 for D2566: Add Cassandra backend.!
Jan 28 2020, 1:42 PM
vlorentz updated the diff for D2566: Add Cassandra backend..

more logs

Jan 28 2020, 1:42 PM
swh-public-ci added a comment to D2582: Web API endpoint /known/.

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/cypress-diff/514/ for more details.

Jan 28 2020, 1:42 PM
vlorentz accepted D2582: Web API endpoint /known/.
Jan 28 2020, 1:37 PM
swh-public-ci added a comment to D2582: Web API endpoint /known/.

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tox/906/ for more details.

Jan 28 2020, 1:34 PM
ardumont updated the task description for T2244: Use swh-model for passing objects instead of dicts.
Jan 28 2020, 1:32 PM · Core & foundations
swh-public-ci added a comment to D2582: Web API endpoint /known/.

Build has FAILED

Jan 28 2020, 1:22 PM
Harbormaster failed remote builds in B10315: Diff 9256 for D2582: Web API endpoint /known/!
Jan 28 2020, 1:19 PM
swh-public-ci added a comment to D2582: Web API endpoint /known/.

Build has FAILED

Jan 28 2020, 1:19 PM
DanSeraf updated the diff for D2582: Web API endpoint /known/.
rebase
Jan 28 2020, 1:17 PM
ardumont committed rDLDBASEae17e430d5d0: npm.loader: Skip artifacts with no intrinsic metadata (authored by ardumont).
npm.loader: Skip artifacts with no intrinsic metadata
Jan 28 2020, 12:28 PM
ardumont committed rDLDBASE6f3d6446fd0a: pypi.loader: Skip artifacts with no intrinsic metadata (authored by ardumont).
pypi.loader: Skip artifacts with no intrinsic metadata
Jan 28 2020, 12:28 PM
ardumont closed D2579: package.loader: Skip artifacts with no intrinsic metadata.
Jan 28 2020, 12:27 PM
vlorentz added a comment to D2579: package.loader: Skip artifacts with no intrinsic metadata.

My bad, you don't need that

Jan 28 2020, 12:27 PM
moranegg committed rMSLD5b8071b840cc: Delete bf on authors (authored by moranegg).
Delete bf on authors
Jan 28 2020, 12:26 PM
vlorentz accepted D2579: package.loader: Skip artifacts with no intrinsic metadata.

You should replace "intrinsic_metadata" with "extrinsic_metadata". But other than that, ok for me

Jan 28 2020, 12:22 PM
douardda added inline comments to D2582: Web API endpoint /known/.
Jan 28 2020, 12:12 PM
swh-public-ci added a comment to D2579: package.loader: Skip artifacts with no intrinsic metadata.

Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/319/ for more details.

Jan 28 2020, 12:08 PM
ardumont updated the diff for D2579: package.loader: Skip artifacts with no intrinsic metadata.

Add tests on both loaders

Jan 28 2020, 12:06 PM
vlorentz added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.

@olasd I'm worried that implementing your idea would result in some complex piece of code. It also adds a new postgresql database and new kafka topics, that will need extra resources and management. And if at some point that queue database becomes too large, the retrier will become slower, causing the queue to grow even more.

Jan 28 2020, 11:39 AM · Journal
moranegg committed rMSLD86ba73aee283: Fix typo in SWH-ID diagram (authored by moranegg).
Fix typo in SWH-ID diagram
Jan 28 2020, 11:34 AM
ardumont added a comment to D2592: webapp0: Switch to cassandra storage.

For some unknown reasons yet, i don't see the python-cassandra-driver (buster) uploaded in pergamon.
Don't know if it's related or not, It's marked as source, i need to continue the work on that [2]

Jan 28 2020, 11:34 AM
vlorentz added a comment to D2582: Web API endpoint /known/.

Then you don't need if request.method == 'POST':, right?

Jan 28 2020, 11:01 AM
ardumont retitled D2592: webapp0: Switch to cassandra storage from storage0: Switch to cassandra storage to webapp0: Switch to cassandra storage.
Jan 28 2020, 11:01 AM
ardumont updated the diff for D2592: webapp0: Switch to cassandra storage.

Switch to cassandra storage only for webapp0

Jan 28 2020, 11:00 AM
ardumont added a comment to D2592: webapp0: Switch to cassandra storage.

We should also make sure that the optional cassandra dependencies are pulled in by the storage deployment (unless swh.storage grew hardcoded dependencies on the cassandra stuff, in which case we're fine).

Jan 28 2020, 10:35 AM
ardumont added a comment to D2592: webapp0: Switch to cassandra storage.

I also wanted to make sure i did not misconfigure. Nobody shouted at it so it seemed fine from that standpoint ;)

Jan 28 2020, 10:17 AM
douardda added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.
In T2003#41428, @olasd wrote:

This component would centralize the "has this object already appeared?" logic, as well as the queueing+retry logic, and would replace the current kafka mirror component.

How does that sound?

Jan 28 2020, 9:37 AM · Journal
douardda added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.
In T2003#41429, @olasd wrote:

Key metrics for the filter component:

  • kafka consumer offset
  • min(latest_attempt) where in_flight = true (time it takes for a message from submission in the buffer to (re-)processing by the filter; should stay close to the current time)
  • count(*) where given_up = false group by topic (number of objects pending a retry, should be small)
  • count(*) where in_flight = true group by topic (number of objects buffered for reprocessing, should be small)
  • max(latest_attempt) (last processing time by the requeuing process)
  • count(*) where given_up = true (checks whether the housekeeping process)
Jan 28 2020, 9:30 AM · Journal
douardda added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.

Note: haven't read the other comment below, just reacting at this one as I am reading it.

Jan 28 2020, 9:28 AM · Journal

Jan 27 2020

DanSeraf added a comment to D2582: Web API endpoint /known/.
Jan 27 2020, 8:34 PM
vlorentz added a comment to D2582: Web API endpoint /known/.

I think you missed this comment: https://forge.softwareheritage.org/D2582?id=9215#inline-17178

Jan 27 2020, 7:49 PM
zack accepted D2582: Web API endpoint /known/.
Jan 27 2020, 7:47 PM
Harbormaster failed remote builds in B10310: Diff 9251 for D2566: Add Cassandra backend.!
Jan 27 2020, 7:45 PM
swh-public-ci added a comment to D2566: Add Cassandra backend..

Build has FAILED

Jan 27 2020, 7:45 PM
swh-public-ci added a comment to D2566: Add Cassandra backend..

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/907/ for more details.

Jan 27 2020, 7:39 PM
Iamshankhadeep added a watcher for Web app: Iamshankhadeep.
Jan 27 2020, 7:38 PM
vlorentz updated the diff for D2566: Add Cassandra backend..

retrigger tests

Jan 27 2020, 7:31 PM
Harbormaster failed remote builds in B10309: Diff 9250 for D2587: Move Storage documentation and endpoint paths to a new StorageInterface class!
Jan 27 2020, 7:31 PM
swh-public-ci added a comment to D2587: Move Storage documentation and endpoint paths to a new StorageInterface class.

Build has FAILED

Jan 27 2020, 7:31 PM
vlorentz added a comment to D2592: webapp0: Switch to cassandra storage.

I agree with @olasd. Another possibility is to run an swh-storage instance on webapp0.

Jan 27 2020, 7:29 PM
swh-public-ci added a comment to D2566: Add Cassandra backend..

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/905/ for more details.

Jan 27 2020, 7:28 PM
vlorentz added inline comments to D2587: Move Storage documentation and endpoint paths to a new StorageInterface class.
Jan 27 2020, 7:25 PM
vlorentz updated the diff for D2587: Move Storage documentation and endpoint paths to a new StorageInterface class.

apply comment on content_add.

Jan 27 2020, 7:25 PM
vlorentz updated the diff for D2566: Add Cassandra backend..

timeout in wait_for_peer.

Jan 27 2020, 7:20 PM
olasd requested changes to D2592: webapp0: Switch to cassandra storage.

I believe the vault and the indexers also use storage0.euwest.azure as read-only archive backend. We don't want to switch them over to cassandra (at least not until the replay has completed, if ever).

Jan 27 2020, 6:56 PM
olasd added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.

As for implementing the queue / retry behavior in the filter component:

Jan 27 2020, 6:46 PM · Journal
swh-public-ci added a comment to D2582: Web API endpoint /known/.

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/cypress-diff/512/ for more details.

Jan 27 2020, 6:27 PM
swh-public-ci added a comment to D2582: Web API endpoint /known/.

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/cypress-diff/511/ for more details.

Jan 27 2020, 6:16 PM
olasd added a comment to T2003: Content replayer may try to copy objects before they are available from an objstorage.

So, now that T1914 is stuck, I'm giving this a harder think, and I'm wondering whether we shouldn't have a generic buffering/filtering component in the journal instead:

Jan 27 2020, 6:09 PM · Journal
swh-public-ci added a comment to D2566: Add Cassandra backend..

Build was aborted

Jan 27 2020, 6:05 PM
swh-public-ci added a comment to D2582: Web API endpoint /known/.

Build is green
See https://jenkins.softwareheritage.org/job/DWAPPS/job/cypress-diff/510/ for more details.

Jan 27 2020, 6:05 PM