Page MenuHomeSoftware Heritage
Feed Advanced Search

Jun 13 2022

ardumont added a comment to T4064: Test GitLab migration scripts.

Ah the migration scripts are still relying on the rails console to actually create merge requests... so we need that setup to work.
So i dug as to why my laptop was ok but not my desktop.

Jun 13 2022, 4:07 PM · System administration, GitLab migration, Roadmap 2020
ardumont updated the title for P1381 rails console shenanigans from untitled to rails console shenanigans.
Jun 13 2022, 4:05 PM
ardumont added a comment to T4064: Test GitLab migration scripts.

And now it fails on snippet creation which needs access to the rails console (in the current state of the migration script) [1]
I've got no proper setup which allows this to work now [2].
The only way i see is to comment that part for now... [3]

Jun 13 2022, 3:47 PM · System administration, GitLab migration, Roadmap 2020
ardumont created P1381 rails console shenanigans.
Jun 13 2022, 3:42 PM
ardumont accepted D7982: docker: update the storage.yml config file a bit.
Jun 13 2022, 3:10 PM
ardumont updated the title for P1371 still github origins listed from maven with exotic urls from still github origins listed from maven with exotic urls `git@`, ssh://... to still github origins listed from maven with exotic urls.
Jun 13 2022, 2:53 PM
ardumont moved T4064: Test GitLab migration scripts from code-review/await-feedback/pause to in-progress on the System administration board.
Jun 13 2022, 1:11 PM · System administration, GitLab migration, Roadmap 2020
ardumont closed T4319: Deploy indexer v2.0 as Resolved.
Jun 13 2022, 1:11 PM · System administration, Indexer
ardumont moved T4320: Upgrade gitlab instance to v15 from in-progress to deployed/landed/monitoring on the System administration board.
Jun 13 2022, 1:11 PM · System administration, GitLab migration, Roadmap 2020
ardumont added a comment to T4064: Test GitLab migration scripts.

Migration failed again with a problem when creating a snippet [1]

Jun 13 2022, 1:10 PM · System administration, GitLab migration, Roadmap 2020
ardumont added a comment to T4064: Test GitLab migration scripts.

The source of that issue was the ssh connection failing to be established [1]
Fixing with ssh-keygen [2]

Jun 13 2022, 10:35 AM · System administration, GitLab migration, Roadmap 2020
ardumont added a comment to T4064: Test GitLab migration scripts.

Failed [1] (full logs [2])

Jun 13 2022, 10:02 AM · System administration, GitLab migration, Roadmap 2020

Jun 10 2022

ardumont updated the task description for T4064: Test GitLab migration scripts.
Jun 10 2022, 7:02 PM · System administration, GitLab migration, Roadmap 2020
ardumont updated the task description for T4319: Deploy indexer v2.0.
Jun 10 2022, 5:33 PM · System administration, Indexer
ardumont added a comment to T4319: Deploy indexer v2.0.

workers restarting.
At least one is done and stuff are being written accordingly.

Jun 10 2022, 5:33 PM · System administration, Indexer
ardumont moved T4319: Deploy indexer v2.0 from in-progress to deployed/landed/monitoring on the System administration board.
Jun 10 2022, 5:32 PM · System administration, Indexer
ardumont updated the task description for T4319: Deploy indexer v2.0.
Jun 10 2022, 5:27 PM · System administration, Indexer
ardumont added a comment to T4319: Deploy indexer v2.0.

Migrate schema:

swhstorage@saam:~$ swh db --config-file indexer.yml upgrade indexer --to-version=134 --module-config-key=indexer_storage
INFO:swh.core.db.db_utils:Executing migration script '/usr/lib/python3/dist-packages/swh/indexer/sql/upgrades/134.sql'
Migration to version 134 done
Jun 10 2022, 5:23 PM · System administration, Indexer
ardumont updated the task description for T4319: Deploy indexer v2.0.
Jun 10 2022, 5:21 PM · System administration, Indexer
ardumont added a comment to T4319: Deploy indexer v2.0.
softwareheritage-indexer=# create table origin_intrinsic_metadata_backup as table origin_intrinsic_metadata;
SELECT 22359694
softwareheritage-indexer=# create table revision_intrinsic_metadata_backup as table revision_intrinsic_metadata;
SELECT 16955557
softwareheritage-indexer=# alter table origin_intrinsic_metadata_backup owner to swhstorage;
ALTER TABLE
softwareheritage-indexer=# alter table revision_intrinsic_metadata_backup owner to swhstorage;
ALTER TABLE
Jun 10 2022, 5:20 PM · System administration, Indexer
ardumont committed R263:cde188463674: gitlab: Allow gitlab rails command to run without ssh (authored by ardumont).
gitlab: Allow gitlab rails command to run without ssh
Jun 10 2022, 5:01 PM
ardumont committed R263:fd27626b4c22: docker: Install required ssh/config so migration can push to gitlab (authored by ardumont).
docker: Install required ssh/config so migration can push to gitlab
Jun 10 2022, 5:01 PM
ardumont committed R263:89ea6bfd9f2f: Allow main script executions within container (authored by ardumont).
Allow main script executions within container
Jun 10 2022, 5:01 PM
ardumont committed R263:59ff8d8126a7: Allow remote mysql connection (authored by ardumont).
Allow remote mysql connection
Jun 10 2022, 5:01 PM
ardumont committed R263:7d55c183835d: Install Dockerfile to allow containerized script execution (authored by ardumont).
Install Dockerfile to allow containerized script execution
Jun 10 2022, 5:01 PM
ardumont committed R263:b8fca3223505: docs: Fix typos and clean up whitespace (authored by ardumont).
docs: Fix typos and clean up whitespace
Jun 10 2022, 5:01 PM
ardumont updated the task description for T4320: Upgrade gitlab instance to v15.
Jun 10 2022, 4:53 PM · System administration, GitLab migration, Roadmap 2020
ardumont added a comment to T4320: Upgrade gitlab instance to v15.
  • Upgrade gitlab operator
Jun 10 2022, 4:53 PM · System administration, GitLab migration, Roadmap 2020
ardumont updated the task description for T4320: Upgrade gitlab instance to v15.
Jun 10 2022, 4:50 PM · System administration, GitLab migration, Roadmap 2020
ardumont updated the task description for T4320: Upgrade gitlab instance to v15.
Jun 10 2022, 2:02 PM · System administration, GitLab migration, Roadmap 2020
ardumont added a comment to T4320: Upgrade gitlab instance to v15.

Backup done:

Jun 10 2022, 11:08 AM · System administration, GitLab migration, Roadmap 2020
ardumont closed D7978: upgrades/134: Add missing index creation.
Jun 10 2022, 10:35 AM
ardumont committed rDCIDX710467138ac9: upgrades/134: Add missing index creation (authored by ardumont).
upgrades/134: Add missing index creation
Jun 10 2022, 10:35 AM
ardumont requested review of D7978: upgrades/134: Add missing index creation.
Jun 10 2022, 10:29 AM
ardumont added a comment to T4319: Deploy indexer v2.0.

With those applied ^, workers are happier now.

Jun 10 2022, 10:24 AM · System administration, Indexer
ardumont added a revision to T4319: Deploy indexer v2.0: D7978: upgrades/134: Add missing index creation.
Jun 10 2022, 10:23 AM · System administration, Indexer
ardumont added a comment to T4319: Deploy indexer v2.0.
Jun 10 2022, 10:17 AM · System administration, Indexer
ardumont added a comment to T4319: Deploy indexer v2.0.

staging
...

  • Checks
Jun 10 2022, 10:14 AM · System administration, Indexer
ardumont changed the status of T4320: Upgrade gitlab instance to v15, a subtask of T4064: Test GitLab migration scripts, from Open to Work in Progress.
Jun 10 2022, 10:06 AM · System administration, GitLab migration, Roadmap 2020
ardumont changed the status of T4320: Upgrade gitlab instance to v15 from Open to Work in Progress.
Jun 10 2022, 10:06 AM · System administration, GitLab migration, Roadmap 2020
ardumont changed the status of T4319: Deploy indexer v2.0 from Open to Work in Progress.
Jun 10 2022, 10:06 AM · System administration, Indexer
ardumont updated the task description for T4319: Deploy indexer v2.0.
Jun 10 2022, 10:06 AM · System administration, Indexer
ardumont added a comment to T4319: Deploy indexer v2.0.
  • Backup tables that will get dropped [1]
  • current deployed db version: 133 [2]
  • current version to deploy: 134
  • Upgrade db version [3]
Jun 10 2022, 10:04 AM · System administration, Indexer
ardumont updated the task description for T4319: Deploy indexer v2.0.
Jun 10 2022, 10:04 AM · System administration, Indexer
ardumont updated the task description for T4320: Upgrade gitlab instance to v15.
Jun 10 2022, 9:55 AM · System administration, GitLab migration, Roadmap 2020
ardumont updated the task description for T4319: Deploy indexer v2.0.
Jun 10 2022, 9:28 AM · System administration, Indexer
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Let's do this the other way around, closing this as i'm done.
Please reopen if you need something else.

Jun 10 2022, 9:05 AM · System administration, Git loader
ardumont closed T4283: Load https://github.com/chromium/chromium with a higher packfile size limit as Resolved.
Jun 10 2022, 9:05 AM · System administration, Git loader

Jun 9 2022

ardumont added a comment to D7971: common/archive: Ensure backward compatibility with swh-indexer 1.x.

Sorry for the late reply. I thought swh-indexer v2 was already deployed; but @ardumont is working on it now

Jun 9 2022, 6:38 PM
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

@vlorentz I don't have anything left to do, can i close it now?

Jun 9 2022, 6:10 PM · System administration, Git loader
ardumont updated the task description for T4320: Upgrade gitlab instance to v15.
Jun 9 2022, 6:08 PM · System administration, GitLab migration, Roadmap 2020
ardumont updated the task description for T4320: Upgrade gitlab instance to v15.
Jun 9 2022, 6:04 PM · System administration, GitLab migration, Roadmap 2020
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

And the 2nd fork ingestion is done as well:

swhworker@worker17:~$ url=https://github.com/Tomahawkd/chromium; /usr/bin/time -v swh loader run git $url lister_name=github lister_instance_name=github pack_size_bytes=34359738368 | tee chromium-20220607-05-pack-size-limit-32g-fork2.txt
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/Tomahawkd/chromium' with type 'git'
Enumerating objects: 12661350, done.
Counting objects: 100% (191/191), done.
Compressing objects: 100% (56/56), done.
Total 12661350 (delta 140), reused 135 (delta 135), pack-reused 12661159
INFO:swh.loader.git.loader:Listed 15230 refs for repo https://github.com/Tomahawkd/chromium
INFO:swh.loader.git.loader.GitLoader:Fetched 12661351 objects; 2 are new
self.statsd.constant_tags: {'visit_type': 'git', 'incremental_enabled': True, 'has_parent_origins': True, 'has_parent_snapshot': True, 'has_previous_snapshot': False}
self.parent_origins: [Origin(url='https://github.com/chromium/chromium', id=b'\xa9\xf66\xa1/\\\xc3\\\xa4\x18+\r\xe7L\x91\x94\xe9\x00\x96J')]
{'status': 'eventful'} for origin 'https://github.com/Tomahawkd/chromium'
        Command being timed: "swh loader run git https://github.com/Tomahawkd/chromium lister_name=github lister_instance_name=github pack_size_bytes=34359738368"
        User time (seconds): 62323.33
        System time (seconds): 3001.76
        Percent of CPU this job got: 72%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 25:03:29
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 29352136
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 8
        Minor (reclaiming a frame) page faults: 10355329
        Voluntary context switches: 265156
        Involuntary context switches: 265330
        Swaps: 0
        File system inputs: 2048
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
Jun 9 2022, 6:01 PM · System administration, Git loader
ardumont updated the task description for T4320: Upgrade gitlab instance to v15.
Jun 9 2022, 5:52 PM · System administration, GitLab migration, Roadmap 2020
ardumont added a comment to D7977: Increase runtime of origin_visit_find_by_date.

index created and now the query plans are the same on both db.

Jun 9 2022, 5:32 PM
ardumont updated the task description for T4064: Test GitLab migration scripts.
Jun 9 2022, 5:19 PM · System administration, GitLab migration, Roadmap 2020
ardumont added a comment to D7977: Increase runtime of origin_visit_find_by_date.

It seems @anlambert is right, some index [3] is missing on the replica [2] vs the main
db [1] hence the query plans divergence ([2] is more costly). The main query plan [1]
uses the missing replica index. So that missing index is currently being created on
somerset [3].

Jun 9 2022, 5:04 PM
ardumont updated the task description for T4320: Upgrade gitlab instance to v15.
Jun 9 2022, 4:14 PM · System administration, GitLab migration, Roadmap 2020
ardumont triaged T4320: Upgrade gitlab instance to v15 as High priority.
Jun 9 2022, 4:13 PM · System administration, GitLab migration, Roadmap 2020
ardumont accepted D7976: tarball: Use standard Python module zipfile to extract jar archive.

Thanks!

Jun 9 2022, 4:11 PM
ardumont added a comment to T4318: Consider using jar command to extract jar archives.

This is better as we will not have to install any new runtime dependencies in workers.

Jun 9 2022, 3:04 PM · Maven loader
ardumont added a comment to T4318: Consider using jar command to extract jar archives.

fwiw, this makes sense ;)

Jun 9 2022, 3:01 PM · Maven loader
ardumont closed D7968: Add missing coverage on `swh db version` cli.
Jun 9 2022, 2:55 PM
ardumont committed rDCORE5644f9cd33da: Add missing coverage on `swh db version` cli (authored by ardumont).
Add missing coverage on `swh db version` cli
Jun 9 2022, 2:55 PM
ardumont triaged T4319: Deploy indexer v2.0 as Normal priority.
Jun 9 2022, 2:52 PM · System administration, Indexer
ardumont updated the task description for T4064: Test GitLab migration scripts.
Jun 9 2022, 2:37 PM · System administration, GitLab migration, Roadmap 2020
ardumont updated the task description for T4064: Test GitLab migration scripts.
Jun 9 2022, 2:37 PM · System administration, GitLab migration, Roadmap 2020
ardumont committed R260:4c8a3a64e725: Make workers report their stastd metrics to prometheus exporter (authored by ardumont).
Make workers report their stastd metrics to prometheus exporter
Jun 9 2022, 2:36 PM
ardumont committed R260:d175eb2dda1f: Install prometheus exporter into cluster (authored by ardumont).
Install prometheus exporter into cluster
Jun 9 2022, 2:36 PM
ardumont committed rSPRE7b90b55041cc: Align worker17 hardware spec to worker18 (authored by ardumont).
Align worker17 hardware spec to worker18
Jun 9 2022, 2:36 PM
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

status, second fork ingestion done (prior to the other one still ongoing) [1]

Jun 9 2022, 2:28 PM · System administration, Git loader
ardumont updated the task description for T4144: Elastic worker infrastructure.
Jun 9 2022, 11:14 AM · meta-task, System administration, Roadmap 2022

Jun 8 2022

ardumont added a comment to D7894: Add arch lister module (origins from archives)..

Interesting exchange here pasted here because i think it's relevant to it somehow:

Jun 8 2022, 4:36 PM
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

That's still a pretty big packfile ~12.6G [1]... I'm pondering whether i should stop it,
install the new python3-dulwich olasd packaged and trigger it back...

Jun 8 2022, 4:32 PM · System administration, Git loader
ardumont added a comment to T4311: Package and deploy dulwich 0.20.43 in production.

fwiw, jenkins is python3-dulwich aware.

I don't see the point of that for packages that can be backported with no changes, which is what I had done before, so I admit I hadn't even looked.

Jun 8 2022, 4:25 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

So the first fork ingestion finished and took less time.

Looks like either the loader didn't detect it is a fork, or github sent a large packfile anyway.

In swh/loader/git/loader.py at the end of the prepare function, could you print self.statsd.constant_tags and self.parent_origins, to see which it is?

jsyk, I've edited accordingly the file and triggered back another fork ingestion:

swhworker@worker17:~$ url=https://github.com/Tomahawkd/chromium; /usr/bin/time -v swh loader run git $url lister_name=github lister_instance_name=github pack_size_bytes=34359738368 | tee chromium-20220607-05-pack-size-limit-32g-fork2.txt
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/Tomahawkd/chromium' with type 'git'
Jun 8 2022, 4:02 PM · System administration, Git loader
ardumont added a comment to T4311: Package and deploy dulwich 0.20.43 in production.

fwiw, jenkins is python3-dulwich aware.

Jun 8 2022, 3:52 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

@vlorentz I also encountered [1] this morning which might explain the large packfile...

Jun 8 2022, 3:30 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

So the first fork ingestion finished and took less time.

Jun 8 2022, 3:20 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Note that the first repo run took 134:40:21 (after multiple iterations so maybe more than that actually), so even if the fork ingestion take like ~10h, that'd be much quicker already ¯\_(ツ)_/¯ (been ongoing for ~52min now)

Jun 8 2022, 3:10 PM · System administration, Git loader
ardumont accepted D7972: Support empty response from swh-graph.

lg "enough a workaround" tm for now.

Jun 8 2022, 3:07 PM
ardumont added a comment to D7971: common/archive: Ensure backward compatibility with swh-indexer 1.x.

updating prod should be good enough

Jun 8 2022, 3:00 PM
ardumont added a comment to D7972: Support empty response from swh-graph.

this looks like a bug in swh-graph; it shouldn't return empty lines

Jun 8 2022, 2:52 PM
ardumont updated the diff for D7968: Add missing coverage on `swh db version` cli.

Fix typo

Jun 8 2022, 2:40 PM
ardumont added inline comments to D7968: Add missing coverage on `swh db version` cli.
Jun 8 2022, 2:39 PM
ardumont added a comment to D7949: db.BaseDb: Propose default get_current_version method implementation.

notably the tests on the missing coverage cli (i'll do it in another diff).

Jun 8 2022, 11:21 AM
ardumont closed D7913: db: Grant read access to guest user on all tables of the schema.
Jun 8 2022, 10:55 AM
ardumont committed rDCORE47d9e8e22fa6: db: Grant read access to guest user on all schema tables (authored by ardumont).
db: Grant read access to guest user on all schema tables
Jun 8 2022, 10:55 AM
ardumont closed D7965: postgres db: Create guest user at db initialization time.
Jun 8 2022, 10:54 AM
ardumont committed rDENVa82fad346f44: postgres db: Create guest user at db initialization time (authored by ardumont).
postgres db: Create guest user at db initialization time
Jun 8 2022, 10:54 AM
ardumont requested review of D7968: Add missing coverage on `swh db version` cli.
Jun 8 2022, 10:49 AM
ardumont added a revision to T4228: scrubber: Investigate the apparent lock (staging): D7968: Add missing coverage on `swh db version` cli.
Jun 8 2022, 10:46 AM · Archive integrity, System administration
ardumont updated the diff for D7913: db: Grant read access to guest user on all tables of the schema.

Warn if failing to grant read-only access to guest user

Jun 8 2022, 10:38 AM
ardumont added a comment to D7913: db: Grant read access to guest user on all tables of the schema.

maybe only show a warning if the grant query fails (rather than crashing)?

Jun 8 2022, 10:24 AM

Jun 7 2022

ardumont added a comment to T4064: Test GitLab migration scripts.

well, no, computer says no ¯\_(ツ)_/¯

Jun 7 2022, 4:54 PM · System administration, GitLab migration, Roadmap 2020
ardumont updated the task description for T4064: Test GitLab migration scripts.
Jun 7 2022, 4:33 PM · System administration, GitLab migration, Roadmap 2020
ardumont added a comment to T4064: Test GitLab migration scripts.

fwiw, we're received notifications that the upstream repository have been delivered with some fixes.
So i've pulled the upstream branch and rebase the swh branch on it.

Jun 7 2022, 4:32 PM · System administration, GitLab migration, Roadmap 2020
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Which one has that much more commit, the initial one?

Yes

If so, i would expect the fork to be loaded way faster since they should have a shared history at some point in the past.

I would have expected it not to run out of memory (which was the point of the manual load), and it already failed that test

Jun 7 2022, 4:28 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

initial load of a different repository, which has 338k more commits

Jun 7 2022, 4:16 PM · System administration, Git loader
ardumont closed T4305: Aligh swh backends for migration tools to work as Resolved.
Jun 7 2022, 4:07 PM · Core & foundations