Page MenuHomeSoftware Heritage
Feed Advanced Search

Mar 10 2021

rdicosmo renamed T2214: Scale-out graph and database storage in production from Scale-out storage (graph) in production to Scale-out graph and database storage in production.
Mar 10 2021, 4:36 PM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager
rdicosmo renamed T2214: Scale-out graph and database storage in production from Scale-out storage in production to Scale-out storage (graph) in production.
Mar 10 2021, 4:35 PM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager

Mar 9 2021

vlorentz added a comment to T3092: Define the requirements for an on-premise Cassandra cluster.

In terms of RAM, we currently have a 1/20 ratio to the storage for the postgresql storage. If we want to keep the same ratio, that's 1.5TB of RAM for the cache. We also need at least 32+8GB/server for Cassandra itself, which is negligible. So that's 1.5TB of RAM total, which is more reasonable; so assuming 64GB sticks (because cheaper), that's 24 sticks, so we only need two servers to hold that much RAM.

Mar 9 2021, 10:24 AM · System administration, Storage manager

Mar 8 2021

vlorentz updated subscribers of T3092: Define the requirements for an on-premise Cassandra cluster.

Summary of a discussion on 2021-01-05, on using "HDD+fully loaded in RAM" vs "SSD":

Mar 8 2021, 1:55 PM · System administration, Storage manager

Mar 5 2021

KShivendu added a comment to T1377: in-memory storage: compute all counters.

Is this task still valid ?

Mar 5 2021, 4:17 PM · Easy hack, Storage manager
vlorentz added a revision to T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra: D5030: raw_extrinsic_metadata: Make (target, authority_id, discovery_date, fetcher_id) non-unique.
Mar 5 2021, 3:52 PM · Storage manager, Extrinsic metadata
vlorentz added a revision to T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql: D5029: Add raw_extrinsic_metadata.id column in postgresql..
Mar 5 2021, 3:52 PM · Storage manager, Extrinsic metadata
vlorentz added a subtask for T3089: Remove the 'metadata' column of the 'revision' table: T2513: Copy metadata on revisions to the extrinsic metadata storage.
Mar 5 2021, 3:51 PM · Storage manager, Archive content
vlorentz closed T2304: Cassandra storage: Reduce the size of the "secondary lookup tables" for contents as Resolved.
Mar 5 2021, 3:50 PM · Storage manager
vlorentz triaged T3092: Define the requirements for an on-premise Cassandra cluster as Normal priority.
Mar 5 2021, 12:35 PM · System administration, Storage manager
vlorentz triaged T3091: Order hardware for an on-premise Cassandra cluster as Normal priority.
Mar 5 2021, 12:34 PM · System administration, Storage manager
vlorentz closed T3074: Migrate all packages away from the old SWHID class, a subtask of T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage, as Resolved.
Mar 5 2021, 12:31 PM · Storage manager, Extrinsic metadata
vlorentz closed T3074: Migrate all packages away from the old SWHID class, a subtask of T3017: Use hashes as keys in swh.journal.objects.raw_extrinsic_metadata, as Resolved.
Mar 5 2021, 12:31 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz closed T3074: Migrate all packages away from the old SWHID class, a subtask of T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql, as Resolved.
Mar 5 2021, 12:31 PM · Storage manager, Extrinsic metadata
vlorentz closed T3074: Migrate all packages away from the old SWHID class, a subtask of T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra, as Resolved.
Mar 5 2021, 12:31 PM · Storage manager, Extrinsic metadata
vlorentz added a parent task for T3089: Remove the 'metadata' column of the 'revision' table: T2059: Generate (swh) releases from all git tags.
Mar 5 2021, 12:30 PM · Storage manager, Archive content
vlorentz added a parent task for T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders: T3090: Make loaders not rely on the 'metadata' column of the 'revision' table.
Mar 5 2021, 12:29 PM · Storage manager
vlorentz triaged T3089: Remove the 'metadata' column of the 'revision' table as Normal priority.
Mar 5 2021, 12:27 PM · Storage manager, Archive content

Mar 2 2021

vlorentz closed T2291: Implement metadata_provider endpoints in swh/storage/cassandra/, a subtask of T2290: Implement origin_metadata endpoints in swh/storage/cassandra/, as Invalid.
Mar 2 2021, 4:18 PM · Easy hack, Storage manager
vlorentz closed T2291: Implement metadata_provider endpoints in swh/storage/cassandra/ as Invalid.

Closing at it's not relevant anymore

Mar 2 2021, 4:18 PM · Easy hack, Storage manager

Feb 26 2021

vlorentz added a subtask for T3017: Use hashes as keys in swh.journal.objects.raw_extrinsic_metadata: T3074: Migrate all packages away from the old SWHID class.
Feb 26 2021, 6:58 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz added a subtask for T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage: T3074: Migrate all packages away from the old SWHID class.
Feb 26 2021, 6:58 PM · Storage manager, Extrinsic metadata
vlorentz added a subtask for T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql: T3074: Migrate all packages away from the old SWHID class.
Feb 26 2021, 6:58 PM · Storage manager, Extrinsic metadata
vlorentz added a subtask for T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra: T3074: Migrate all packages away from the old SWHID class.
Feb 26 2021, 6:57 PM · Storage manager, Extrinsic metadata

Feb 16 2021

ardumont closed T3053: Unstuck swh.storage debian build as Resolved.
Feb 16 2021, 5:57 PM · System administration, Packagers, Storage manager
ardumont added a comment to T3053: Unstuck swh.storage debian build.

Package swh.storage 0.23.1 built both for stable and unstable.

Feb 16 2021, 5:42 PM · System administration, Packagers, Storage manager
ardumont added a comment to T3053: Unstuck swh.storage debian build.

Now stable build is stuck on storage tests, the one using a journal.

Feb 16 2021, 5:15 PM · System administration, Packagers, Storage manager
ardumont added a comment to T3053: Unstuck swh.storage debian build.

Status, unstable build is now ok.

Feb 16 2021, 3:45 PM · System administration, Packagers, Storage manager
ardumont added a comment to T3053: Unstuck swh.storage debian build.

For the previous tryout to work, we need to exclude all unwanted jres.

Feb 16 2021, 2:28 PM · System administration, Packagers, Storage manager
ardumont added a comment to T3053: Unstuck swh.storage debian build.

Tryout in progress

Feb 16 2021, 2:16 PM · System administration, Packagers, Storage manager
ardumont added a comment to T3053: Unstuck swh.storage debian build.

Another suggestion which sounds more standard, debian build wise:

13:05 <+olasd> I think you can just add a `Build-Conflicts: openjdk-17-jre-headless`
13:06 <+olasd> which should make the sbuild dependency resolver avoid it altogether
13:10 <+ardumont> ack, i'll try
Feb 16 2021, 1:19 PM · System administration, Packagers, Storage manager
ardumont added a comment to T3053: Unstuck swh.storage debian build.

Another solution (to prevent hard-coding JAVA_HOME) is to invert the dependency order
currently defined in debian/rules.

Feb 16 2021, 1:09 PM · System administration, Packagers, Storage manager
ardumont updated the task description for T3053: Unstuck swh.storage debian build.
Feb 16 2021, 12:49 PM · System administration, Packagers, Storage manager
ardumont added a comment to T3053: Unstuck swh.storage debian build.

Forcing JAVA_HOME to a jdk11

Feb 16 2021, 12:17 PM · System administration, Packagers, Storage manager
ardumont added a comment to T3053: Unstuck swh.storage debian build.

What changed from the last unstable build ok [1] to the new failing one [2], the jdk
versions pulled for the build changed.

Feb 16 2021, 12:16 PM · System administration, Packagers, Storage manager
ardumont changed the status of T3053: Unstuck swh.storage debian build from Open to Work in Progress.
Feb 16 2021, 12:09 PM · System administration, Packagers, Storage manager
ardumont added projects to T3053: Unstuck swh.storage debian build: Storage manager, Packagers, System administration.
Feb 16 2021, 12:09 PM · System administration, Packagers, Storage manager

Feb 14 2021

dachary added a watcher for Storage manager: dachary.
Feb 14 2021, 4:56 PM

Feb 11 2021

vsellier added a project to T2182: Switch production swh-web to use swh-search instead of postgresql search.: System administration.
Feb 11 2021, 12:14 PM · System administration, Archive search, Storage manager
vsellier closed T2182: Switch production swh-web to use swh-search instead of postgresql search., a subtask of T1910: Redesign origin search using a dedicated component (swh-search), as Resolved.
Feb 11 2021, 12:10 PM · Archive search, Storage manager
vsellier closed T2182: Switch production swh-web to use swh-search instead of postgresql search. as Resolved.

D5063 is applied, the main webapp is now using swh-search by default.

Feb 11 2021, 12:10 PM · System administration, Archive search, Storage manager
vsellier added a revision to T2182: Switch production swh-web to use swh-search instead of postgresql search.: D5063: webapp: use swh-search as main search engine in production.
Feb 11 2021, 11:27 AM · System administration, Archive search, Storage manager
vlorentz changed the status of T2590: Finish the indexer -> swh-search pipeline, a subtask of T2182: Switch production swh-web to use swh-search instead of postgresql search., from Open to Work in Progress.
Feb 11 2021, 11:01 AM · System administration, Archive search, Storage manager
vsellier changed the status of T2182: Switch production swh-web to use swh-search instead of postgresql search., a subtask of T1910: Redesign origin search using a dedicated component (swh-search), from Open to Work in Progress.
Feb 11 2021, 9:24 AM · Archive search, Storage manager
vsellier changed the status of T2182: Switch production swh-web to use swh-search instead of postgresql search. from Open to Work in Progress.

The main webapp search can be switch from the sql search to the swh-search as all the tests performed on staging and https://webapp1.internal.softwareheritage.org are ok

Feb 11 2021, 9:24 AM · System administration, Archive search, Storage manager

Feb 10 2021

vlorentz removed a project from T2602: Investigate how to upgrade the schema of the Cassandra storage: Roadmap 2020.
Feb 10 2021, 3:51 PM · Storage manager

Feb 8 2021

vlorentz added a revision to T2076: Add tests for SQL migrations: D5014: [RFC] Add basic migration tests for postgresql.
Feb 8 2021, 2:25 PM · Storage manager

Feb 5 2021

ardumont added a comment to P941 storage build stuck.

storage-ok and ko computed out of the py3 installed: line in those output above:

$ diff storage-ok.txt storage-ko.txt
12c12
< confluent-kafka==1.5.0
---
> confluent-kafka==1.6.0
21c21
< idna==2.10
---
> idna==3.1

storage-ok:

py3 installed: aiohttp==3.7.3
aiohttp-utils==3.1.1
apipkg==1.5
async-timeout==3.0.1
attrs==20.3.0
attrs-strict==0.2.0
blinker==1.4
cassandra-driver==3.24.0
certifi==2020.12.5
chardet==3.0.4
click==7.1.2
confluent-kafka==1.5.0
coverage==5.4
decorator==4.4.2
Deprecated==1.2.11
execnet==1.8.0
Flask==1.1.2
geomet==0.2.1.post1
gunicorn==20.0.4
hypothesis==5.49.0
idna==2.10
importlib-metadata==3.4.0
iniconfig==1.1.1
iso8601==0.1.13
itsdangerous==1.1.0
Jinja2==2.11.3
MarkupSafe==1.1.1
mirakuru==2.3.0
msgpack==1.0.2
multidict==5.1.0
mypy==0.800
mypy-extensions==0.4.3
packaging==20.9
pluggy==0.13.1
port-for==0.4
psutil==5.8.0
psycopg2==2.8.6
py==1.10.0
pyparsing==2.4.7
pytest==6.2.2
pytest-cov==2.11.1
pytest-forked==1.3.0
pytest-mock==3.5.1
pytest-postgresql==2.5.3
pytest-xdist==2.2.0
python-dateutil==2.8.1
python-mimeparse==1.6.0
pytz==2021.1
PyYAML==5.4.1
requests==2.25.1
sentry-sdk==0.19.5
six==1.15.0
sortedcontainers==2.3.0
sqlalchemy-stubs==0.4
swh.core==0.11.0
swh.journal==0.7.0
swh.model==0.12.0
swh.objstorage==0.2.2
swh.storage @ file:///home/tony/work/inria/repo/swh/swh-environment/swh-storage/.tox/.tmp/package/1/swh.storage-0.22.1.dev7%2Bg89cf1e51.zip
tenacity==6.3.1
toml==0.10.2
typed-ast==1.4.2
typing-extensions==3.7.4.3
urllib3==1.26.3
Werkzeug==1.0.1
wrapt==1.12.1
yarl==1.6.3
zipp==3.4.0
Feb 5 2021, 4:04 PM · Storage manager
ardumont added a comment to P941 storage build stuck.

diff so far: confluent-kafka is 1.5.0 vs 1.6.0 where it's stuck

Feb 5 2021, 3:55 PM · Storage manager
ardumont created P941 storage build stuck.
Feb 5 2021, 3:48 PM · Storage manager

Feb 4 2021

vlorentz added a parent task for T3010: Enable the validating storage proxy in production: T399: (Re-)Compute data checksums before insertion.
Feb 4 2021, 6:15 PM · Storage manager, System administration
vlorentz added a subtask for T399: (Re-)Compute data checksums before insertion: T3010: Enable the validating storage proxy in production.
Feb 4 2021, 6:15 PM · Storage manager
vlorentz added a subtask for T3010: Enable the validating storage proxy in production: T75: Check integrity of directories, revisions, and releases.
Feb 4 2021, 6:13 PM · Storage manager, System administration
vlorentz merged task T3012: Check all objects in the production storage/journal have a correct hash into T75: Check integrity of directories, revisions, and releases.
Feb 4 2021, 6:13 PM · Journal, Storage manager
olasd added a comment to T3012: Check all objects in the production storage/journal have a correct hash.

This is a duplicate of T75, the history of which would probably be useful to take into account (I suspect it can be closed).

Feb 4 2021, 6:11 PM · Journal, Storage manager
ardumont added a revision to T2968: Migrate origin_visit_status records to add the type value: D5019: storage.postgresql: Use origin_visit_status.type value as source.
Feb 4 2021, 5:41 PM · System administration, Storage manager
ardumont closed T2968: Migrate origin_visit_status records to add the type value as Resolved.
Feb 4 2021, 12:42 PM · System administration, Storage manager
ardumont moved T2968: Migrate origin_visit_status records to add the type value from in-progress to deployed/landed/monitoring on the System administration board.
Feb 4 2021, 10:22 AM · System administration, Storage manager
ardumont updated the task description for T2968: Migrate origin_visit_status records to add the type value.
Feb 4 2021, 10:22 AM · System administration, Storage manager
ardumont updated the task description for T2968: Migrate origin_visit_status records to add the type value.
Feb 4 2021, 9:43 AM · System administration, Storage manager
ardumont added a comment to T2968: Migrate origin_visit_status records to add the type value.

Migration ran to the end:

### range: [151800000, 151900000]
Timing is on.
UPDATE 0
Time: 3085.767 ms (00:03.086)
### range: [151900000, 152000000]
Timing is on.
UPDATE 0
Time: 6.366 ms
Feb 4 2021, 9:41 AM · System administration, Storage manager

Feb 3 2021

ardumont added a comment to T2968: Migrate origin_visit_status records to add the type value.

tl; dr: ETA: 12 hours [3]

Feb 3 2021, 4:15 PM · System administration, Storage manager
ardumont updated the task description for T2968: Migrate origin_visit_status records to add the type value.
Feb 3 2021, 11:16 AM · System administration, Storage manager
ardumont added a comment to T2968: Migrate origin_visit_status records to add the type value.

So the "improved query keeps on taking more time from 15s initially to cranking up
around 2 min now (after running from yesterday up to now).

Feb 3 2021, 10:43 AM · System administration, Storage manager
vlorentz added a revision to T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects: D4970: model: Add 'id' field to RawExtrinsicMetadata.
Feb 3 2021, 9:13 AM · Data Model, Storage manager, Extrinsic metadata

Feb 2 2021

vlorentz lowered the priority of T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage from High to Low.

Actually, this is probably not needed. (I may close this task later)

Feb 2 2021, 8:50 PM · Storage manager, Extrinsic metadata
ardumont added a comment to T2968: Migrate origin_visit_status records to add the type value.

Improved version (moving the join part within the update part of the query):

Feb 2 2021, 2:56 PM · System administration, Storage manager
vlorentz added a parent task for T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql: T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields.
Feb 2 2021, 2:22 PM · Storage manager, Extrinsic metadata
vlorentz added a parent task for T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra: T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields.
Feb 2 2021, 2:22 PM · Storage manager, Extrinsic metadata
vlorentz added subtasks for T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields: T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql, T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra.
Feb 2 2021, 2:22 PM · Storage manager, Extrinsic metadata
vlorentz renamed T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra from Allow querying raw_extrinsic_metadata by hash in swh.storage.cassandra to Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra.
Feb 2 2021, 2:21 PM · Storage manager, Extrinsic metadata
vlorentz renamed T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql from Allow querying raw_extrinsic_metadata by hash in swh.storage.postgresql to Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql.
Feb 2 2021, 2:21 PM · Storage manager, Extrinsic metadata
vlorentz triaged T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields as High priority.
Feb 2 2021, 2:15 PM · Storage manager, Extrinsic metadata
vlorentz updated subscribers of T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra.
Feb 2 2021, 1:40 PM · Storage manager, Extrinsic metadata
vlorentz updated subscribers of T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql.
Feb 2 2021, 1:40 PM · Storage manager, Extrinsic metadata
vlorentz updated subscribers of T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage.
Feb 2 2021, 1:40 PM · Storage manager, Extrinsic metadata
vlorentz updated subscribers of T3017: Use hashes as keys in swh.journal.objects.raw_extrinsic_metadata.
Feb 2 2021, 1:40 PM · Data Model, Storage manager, Extrinsic metadata
vlorentz triaged T3020: Add an "index" for raw_extrinsic_metadata.id in swh.storage.cassandra as High priority.
Feb 2 2021, 1:40 PM · Storage manager, Extrinsic metadata
vlorentz triaged T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql as High priority.
Feb 2 2021, 1:37 PM · Storage manager, Extrinsic metadata
vlorentz triaged T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage as High priority.
Feb 2 2021, 1:35 PM · Storage manager, Extrinsic metadata
vlorentz triaged T3017: Use hashes as keys in swh.journal.objects.raw_extrinsic_metadata as High priority.
Feb 2 2021, 1:34 PM · Data Model, Storage manager, Extrinsic metadata
ardumont added a comment to T2968: Migrate origin_visit_status records to add the type value.

Run ok in 20 min for the staging db [1]

Feb 2 2021, 12:15 PM · System administration, Storage manager
ardumont added a comment to T2968: Migrate origin_visit_status records to add the type value.

Script resulting from the prior analysis:

Feb 2 2021, 11:56 AM · System administration, Storage manager
ardumont updated the task description for T2968: Migrate origin_visit_status records to add the type value.
Feb 2 2021, 11:55 AM · System administration, Storage manager
ardumont added a comment to T2968: Migrate origin_visit_status records to add the type value.

The naive need is:

explain update origin_visit_status as ovs
set type=ov.type
from origin_visit ov
where ov.visit=ovs.visit and ov.origin=ovs.origin
and ovs.type is null;
Feb 2 2021, 11:17 AM · System administration, Storage manager
ardumont updated the task description for T2968: Migrate origin_visit_status records to add the type value.
Feb 2 2021, 11:13 AM · System administration, Storage manager
ardumont changed the status of T2968: Migrate origin_visit_status records to add the type value from Open to Work in Progress.
Feb 2 2021, 10:26 AM · System administration, Storage manager
ardumont moved T2968: Migrate origin_visit_status records to add the type value from Backlog to Weekly backlog on the System administration board.
Feb 2 2021, 10:26 AM · System administration, Storage manager
ardumont updated the task description for T2968: Migrate origin_visit_status records to add the type value.
Feb 2 2021, 10:23 AM · System administration, Storage manager
ardumont updated the task description for T2968: Migrate origin_visit_status records to add the type value.
Feb 2 2021, 10:23 AM · System administration, Storage manager

Feb 1 2021

vlorentz triaged T3012: Check all objects in the production storage/journal have a correct hash as Normal priority.
Feb 1 2021, 12:38 PM · Journal, Storage manager
vlorentz triaged T3011: Enable the validating storage proxy on staging as Normal priority.
Feb 1 2021, 12:36 PM · Storage manager, System administration
vlorentz updated the task description for T3010: Enable the validating storage proxy in production.
Feb 1 2021, 12:35 PM · Storage manager, System administration
vlorentz triaged T3010: Enable the validating storage proxy in production as Normal priority.
Feb 1 2021, 12:35 PM · Storage manager, System administration
vlorentz closed T3004: swh-storage documentation needs a better introduction as Resolved.
Feb 1 2021, 12:03 PM · Documentation, Storage manager

Jan 29 2021

vlorentz added a revision to T3004: swh-storage documentation needs a better introduction: D4971: Write introduction to swh-storage..
Jan 29 2021, 4:18 PM · Documentation, Storage manager

Jan 28 2021

vlorentz updated the task description for T3004: swh-storage documentation needs a better introduction.
Jan 28 2021, 6:02 PM · Documentation, Storage manager
vlorentz updated the task description for T3004: swh-storage documentation needs a better introduction.
Jan 28 2021, 6:02 PM · Documentation, Storage manager
vlorentz updated the task description for T3004: swh-storage documentation needs a better introduction.
Jan 28 2021, 5:12 PM · Documentation, Storage manager
vlorentz updated the task description for T3004: swh-storage documentation needs a better introduction.
Jan 28 2021, 5:12 PM · Documentation, Storage manager