Page MenuHomeSoftware Heritage

Archive integrityTag
ActivePublic

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Recent Activity

Jan 8 2023

gitlab-migration closed T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml') as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 10:25 PM · Archive integrity, Object storage, Data Model
gitlab-migration changed the status of T4284: scrubber does not comply to what's expected by the swh db tooling from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 10:03 PM · Archive integrity

Dec 22 2022

vlorentz added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

We have a new one that went unnoticed until 19 days ago: b'superduper/super/sub/bye.txt' is not a valid directory entry name.

Dec 22 2022, 12:40 PM · Archive integrity, Object storage, Data Model

Nov 22 2022

olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

To force kafka compaction to run I've done the following:

Nov 22 2022, 5:19 PM · Archive integrity, Object storage, Data Model

Oct 25 2022

olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

This is now done, the objects are fixed in the production DB and kafka.

Oct 25 2022, 8:10 PM · Archive integrity, Object storage, Data Model
olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

@vlorentz I'm running the following adaptation to your script:

Oct 25 2022, 7:03 PM · Archive integrity, Object storage, Data Model
seirl added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Oh yeah, I was thinking of just removing the entire project, but your solution also works.

Oct 25 2022, 6:15 PM · Archive integrity, Object storage, Data Model
vlorentz updated subscribers of T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Holes are bad. And I just opened a diff to make the git loader apply the same transformation, as @olasd made the same comment: D8776

Oct 25 2022, 6:10 PM · Archive integrity, Object storage, Data Model
seirl added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Do you actually want to keep these objects? This would be inconsistent with the fixed loader behavior that would just reject those objects, and not load the repository at all.

Oct 25 2022, 6:06 PM · Archive integrity, Object storage, Data Model
vlorentz changed the visibility for T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').
Oct 25 2022, 5:57 PM · Archive integrity, Object storage, Data Model
vlorentz updated subscribers of T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

I tried to add a workaround in the backfiller, but it is incredibly hard to do properly, especially as entries as disordered, so raw_manifest needs to be fixed in two different ways.

Oct 25 2022, 5:41 PM · Archive integrity, Object storage, Data Model

Oct 19 2022

gitlab-migration changed the status of T4371: Deploy swh-scrubber on all storage instances from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:07 PM · System administration, Archive integrity, Storage manager
gitlab-migration changed the status of T4228: scrubber: Investigate the apparent lock (staging) from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:06 PM · Archive integrity, System administration

Oct 18 2022

vlorentz added a parent task for T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml'): T2033: Run Cassandra storage backend with production data.
Oct 18 2022, 3:40 PM · Archive integrity, Object storage, Data Model
vlorentz triaged T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml') as High priority.
Oct 18 2022, 3:39 PM · Archive integrity, Object storage, Data Model
swh-sentry-integration assigned T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml') to vlorentz.
Oct 18 2022, 3:39 PM · Archive integrity, Object storage, Data Model

Aug 5 2022

ardumont added a parent task for T4371: Deploy swh-scrubber on all storage instances: T3841: regularly scrub all the data stores of swh.
Aug 5 2022, 4:31 PM · System administration, Archive integrity, Storage manager

Aug 4 2022

ardumont closed T4371: Deploy swh-scrubber on all storage instances as Resolved.
Aug 4 2022, 3:57 PM · System administration, Archive integrity, Storage manager
ardumont moved T4371: Deploy swh-scrubber on all storage instances from in-progress to deployed/landed/monitoring on the System administration board.
Aug 4 2022, 3:49 PM · System administration, Archive integrity, Storage manager
ardumont added a comment to T4371: Deploy swh-scrubber on all storage instances.

Deployed both in staging and production [1]:

Aug 4 2022, 3:49 PM · System administration, Archive integrity, Storage manager
ardumont changed the status of T4371: Deploy swh-scrubber on all storage instances from Open to Work in Progress.
Aug 4 2022, 3:03 PM · System administration, Archive integrity, Storage manager
ardumont added a revision to T4371: Deploy swh-scrubber on all storage instances: D8181: scrubber: Make service parametric on the db instance to scrub.
Aug 4 2022, 3:03 PM · System administration, Archive integrity, Storage manager
ardumont moved T4371: Deploy swh-scrubber on all storage instances from Backlog to Weekly backlog on the System administration board.
Aug 4 2022, 11:34 AM · System administration, Archive integrity, Storage manager

Jul 4 2022

ardumont added a project to T4371: Deploy swh-scrubber on all storage instances: System administration.
Jul 4 2022, 10:47 AM · System administration, Archive integrity, Storage manager
ardumont added projects to T4371: Deploy swh-scrubber on all storage instances: Storage manager, Archive integrity.
Jul 4 2022, 10:47 AM · System administration, Archive integrity, Storage manager

Jun 8 2022

ardumont added a revision to T4228: scrubber: Investigate the apparent lock (staging): D7968: Add missing coverage on `swh db version` cli.
Jun 8 2022, 10:46 AM · Archive integrity, System administration

Jun 7 2022

ardumont added a revision to T4228: scrubber: Investigate the apparent lock (staging): D7965: postgres db: Create guest user at db initialization time.
Jun 7 2022, 2:28 PM · Archive integrity, System administration

Jun 2 2022

ardumont added a revision to T4228: scrubber: Investigate the apparent lock (staging): D7949: db.BaseDb: Propose default get_current_version method implementation.
Jun 2 2022, 5:21 PM · Archive integrity, System administration
ardumont added a revision to T4284: scrubber does not comply to what's expected by the swh db tooling: D7943: Revert "cli.db: Use attribute current_version instead of undeclared getter".
Jun 2 2022, 11:42 AM · Archive integrity

May 31 2022

ardumont closed T4228: scrubber: Investigate the apparent lock (staging) as Resolved.
May 31 2022, 2:22 PM · Archive integrity, System administration
ardumont added a comment to T4228: scrubber: Investigate the apparent lock (staging).

No more idle in transaction queries and corrupt_object still growing [1].
We can close this.

May 31 2022, 2:22 PM · Archive integrity, System administration
ardumont closed T4284: scrubber does not comply to what's expected by the swh db tooling as Resolved.
May 31 2022, 2:18 PM · Archive integrity
ardumont added a comment to T4284: scrubber does not comply to what's expected by the swh db tooling.

Scrubber got finally adapted so migration script is now ok [1]
Closing this.

May 31 2022, 2:18 PM · Archive integrity
ardumont added a project to T4284: scrubber does not comply to what's expected by the swh db tooling: Archive integrity.
May 31 2022, 2:17 PM · Archive integrity
ardumont added a comment to T4228: scrubber: Investigate the apparent lock (staging).

We now have corrupt object being stored in the db:

11:26:40 swh-scrubber@db1:5432=> select count(*) from corrupt_object ;
+-------+
| count |
+-------+
| 59238 |
+-------+
(1 row)
May 31 2022, 11:29 AM · Archive integrity, System administration
ardumont added a comment to T4228: scrubber: Investigate the apparent lock (staging).

And v0.0.6 tagged and deployed.

May 31 2022, 11:26 AM · Archive integrity, System administration
ardumont added a comment to T4228: scrubber: Investigate the apparent lock (staging).

With [1] applied, the insert query got executed!

May 31 2022, 10:58 AM · Archive integrity, System administration
ardumont added a comment to T4228: scrubber: Investigate the apparent lock (staging).

Stopping all services and letting run only 1 checker service.

May 31 2022, 10:58 AM · Archive integrity, System administration
ardumont added a revision to T4228: scrubber: Investigate the apparent lock (staging): D7914: Wrap query in transaction.
May 31 2022, 10:36 AM · Archive integrity, System administration
ardumont added a revision to T4228: scrubber: Investigate the apparent lock (staging): D7913: db: Grant read access to guest user on all tables of the schema.
May 31 2022, 9:19 AM · Archive integrity, System administration
ardumont added a comment to T4228: scrubber: Investigate the apparent lock (staging).

Nothing bulged, same id process...
(Well, new ones appeared but old ones are still waiting for something)

May 31 2022, 9:05 AM · Archive integrity, System administration

May 30 2022

ardumont added a comment to T4228: scrubber: Investigate the apparent lock (staging).

We have plenty of processes stuck with "idle in transaction". According to [1], this means
"waiting for client inside a BEGIN block", so there might be issues in the scrubber code
[2]?

May 30 2022, 6:48 PM · Archive integrity, System administration
ardumont added a comment to T4228: scrubber: Investigate the apparent lock (staging).

There seem to still exist staling queries [1] and nothing gets written to the db [2]:

May 30 2022, 6:35 PM · Archive integrity, System administration
ardumont added a comment to T4228: scrubber: Investigate the apparent lock (staging).

Finally, after installing postgresql-client-11 and stopping scrubber services [1]:

May 30 2022, 6:10 PM · Archive integrity, System administration
ardumont added a comment to T4228: scrubber: Investigate the apparent lock (staging).

It was still not ok as the upgrade tooling also expects the sql/upgrades/<final-version>.sql to exist and be packaged.
There was nothing there hence [1]

May 30 2022, 6:00 PM · Archive integrity, System administration
ardumont added a revision to T4228: scrubber: Investigate the apparent lock (staging): D7912: Reference the scrubber db model upgrade (from version 1 to 2).
May 30 2022, 5:54 PM · Archive integrity, System administration
ardumont added a revision to T4228: scrubber: Investigate the apparent lock (staging): D7906: db: Provide a default get_current_version method to db classes.
May 30 2022, 4:14 PM · Archive integrity, System administration
ardumont added a revision to T4228: scrubber: Investigate the apparent lock (staging): D7905: Unify factory to use keyword 'postgresql' over deprecated 'local'.
May 30 2022, 3:18 PM · Archive integrity, System administration
ardumont added a revision to T4228: scrubber: Investigate the apparent lock (staging): D7904: db: Bump to version 2.
May 30 2022, 3:12 PM · Archive integrity, System administration
ardumont added a comment to T4228: scrubber: Investigate the apparent lock (staging).

And now time to unstuck the debian build...

May 30 2022, 2:56 PM · Archive integrity, System administration