Ah the migration scripts are still relying on the rails console to actually create merge requests... so we need that setup to work.
So i dug as to why my laptop was ok but not my desktop.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jun 13 2022
And now it fails on snippet creation which needs access to the rails console (in the current state of the migration script) [1]
I've got no proper setup which allows this to work now [2].
The only way i see is to comment that part for now... [3]
Migration failed again with a problem when creating a snippet [1]
The source of that issue was the ssh connection failing to be established [1]
Fixing with ssh-keygen [2]
Failed [1] (full logs [2])
Jun 10 2022
workers restarting.
At least one is done and stuff are being written accordingly.
Migrate schema:
swhstorage@saam:~$ swh db --config-file indexer.yml upgrade indexer --to-version=134 --module-config-key=indexer_storage INFO:swh.core.db.db_utils:Executing migration script '/usr/lib/python3/dist-packages/swh/indexer/sql/upgrades/134.sql' Migration to version 134 done
softwareheritage-indexer=# create table origin_intrinsic_metadata_backup as table origin_intrinsic_metadata; SELECT 22359694 softwareheritage-indexer=# create table revision_intrinsic_metadata_backup as table revision_intrinsic_metadata; SELECT 16955557 softwareheritage-indexer=# alter table origin_intrinsic_metadata_backup owner to swhstorage; ALTER TABLE softwareheritage-indexer=# alter table revision_intrinsic_metadata_backup owner to swhstorage; ALTER TABLE
- Upgrade gitlab operator
Backup done:
With those applied ^, workers are happier now.
In T4319#86628, @ardumont wrote:staging
...
- Checks
Although [1] [2]
[1] https://sentry.softwareheritage.org/share/issue/c224d3ab1804452e84d68170ab55590e/
[2] https://sentry.softwareheritage.org/share/issue/6e2d70aa3c4e423fa763a456130a3cc9/
staging
...
- Checks
- Backup tables that will get dropped [1]
- current deployed db version: 133 [2]
- current version to deploy: 134
- Upgrade db version [3]
Let's do this the other way around, closing this as i'm done.
Please reopen if you need something else.
Jun 9 2022
Sorry for the late reply. I thought swh-indexer v2 was already deployed; but @ardumont is working on it now
@vlorentz I don't have anything left to do, can i close it now?
And the 2nd fork ingestion is done as well:
swhworker@worker17:~$ url=https://github.com/Tomahawkd/chromium; /usr/bin/time -v swh loader run git $url lister_name=github lister_instance_name=github pack_size_bytes=34359738368 | tee chromium-20220607-05-pack-size-limit-32g-fork2.txt INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/Tomahawkd/chromium' with type 'git' Enumerating objects: 12661350, done. Counting objects: 100% (191/191), done. Compressing objects: 100% (56/56), done. Total 12661350 (delta 140), reused 135 (delta 135), pack-reused 12661159 INFO:swh.loader.git.loader:Listed 15230 refs for repo https://github.com/Tomahawkd/chromium INFO:swh.loader.git.loader.GitLoader:Fetched 12661351 objects; 2 are new self.statsd.constant_tags: {'visit_type': 'git', 'incremental_enabled': True, 'has_parent_origins': True, 'has_parent_snapshot': True, 'has_previous_snapshot': False} self.parent_origins: [Origin(url='https://github.com/chromium/chromium', id=b'\xa9\xf66\xa1/\\\xc3\\\xa4\x18+\r\xe7L\x91\x94\xe9\x00\x96J')] {'status': 'eventful'} for origin 'https://github.com/Tomahawkd/chromium' Command being timed: "swh loader run git https://github.com/Tomahawkd/chromium lister_name=github lister_instance_name=github pack_size_bytes=34359738368" User time (seconds): 62323.33 System time (seconds): 3001.76 Percent of CPU this job got: 72% Elapsed (wall clock) time (h:mm:ss or m:ss): 25:03:29 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 29352136 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 8 Minor (reclaiming a frame) page faults: 10355329 Voluntary context switches: 265156 Involuntary context switches: 265330 Swaps: 0 File system inputs: 2048 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
index created and now the query plans are the same on both db.
It seems @anlambert is right, some index [3] is missing on the replica [2] vs the main
db [1] hence the query plans divergence ([2] is more costly). The main query plan [1]
uses the missing replica index. So that missing index is currently being created on
somerset [3].
Thanks!
This is better as we will not have to install any new runtime dependencies in workers.
fwiw, this makes sense ;)
status, second fork ingestion done (prior to the other one still ongoing) [1]
Jun 8 2022
Interesting exchange here pasted here because i think it's relevant to it somehow:
That's still a pretty big packfile ~12.6G [1]... I'm pondering whether i should stop it,
install the new python3-dulwich olasd packaged and trigger it back...
fwiw, jenkins is python3-dulwich aware.
I don't see the point of that for packages that can be backported with no changes, which is what I had done before, so I admit I hadn't even looked.
So the first fork ingestion finished and took less time.
Looks like either the loader didn't detect it is a fork, or github sent a large packfile anyway.
In swh/loader/git/loader.py at the end of the prepare function, could you print self.statsd.constant_tags and self.parent_origins, to see which it is?
jsyk, I've edited accordingly the file and triggered back another fork ingestion:
swhworker@worker17:~$ url=https://github.com/Tomahawkd/chromium; /usr/bin/time -v swh loader run git $url lister_name=github lister_instance_name=github pack_size_bytes=34359738368 | tee chromium-20220607-05-pack-size-limit-32g-fork2.txt INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/Tomahawkd/chromium' with type 'git'
fwiw, jenkins is python3-dulwich aware.
@vlorentz I also encountered [1] this morning which might explain the large packfile...
So the first fork ingestion finished and took less time.
Note that the first repo run took 134:40:21 (after multiple iterations so maybe more than that actually), so even if the fork ingestion take like ~10h, that'd be much quicker already ¯\_(ツ)_/¯ (been ongoing for ~52min now)
lg "enough a workaround" tm for now.
updating prod should be good enough
this looks like a bug in swh-graph; it shouldn't return empty lines
Fix typo
notably the tests on the missing coverage cli (i'll do it in another diff).
Warn if failing to grant read-only access to guest user
maybe only show a warning if the grant query fails (rather than crashing)?
Jun 7 2022
well, no, computer says no ¯\_(ツ)_/¯
fwiw, we're received notifications that the upstream repository have been delivered with some fixes.
So i've pulled the upstream branch and rebase the swh branch on it.
Which one has that much more commit, the initial one?
Yes
If so, i would expect the fork to be loaded way faster since they should have a shared history at some point in the past.
I would have expected it not to run out of memory (which was the point of the manual load), and it already failed that test
initial load of a different repository, which has 338k more commits