Yes, I think we need to split the archival task to separate the first run that ensure we have a copy of each content from the full archiver we will have more time to improve.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 19 2016
I'm not yet 100% sure that content_add is the place where we want to update the archive table. Another possibility, for instance, would be relying on the upcoming persistent log (T424) and some watcher for it that will update the archiver table.
According to documentation, to defer a constraint, first said constraint must be deferrable (which it is not the default).
and the first part is done ^^
0:41:23 [15.8MiB/s]
Also in regards to db, softwareheritage-archiver has been created with the following schema.
Jul 18 2016
(forgot to add the Related keyword in commits... and already pushed so better luck next time...)
I'm not yet 100% sure that content_add is the place where we want to update the archive table. Another possibility, for instance, would be relying on the upcoming persistent log (T424) and some watcher for it that will update the archiver table.
Jul 17 2016
For info, i ran another failed attempt yesterday (Saturday the 16th of July 2016).
This stopped before finishing.
Jul 16 2016
Moving softwareheritage-log from the ssd to hdd (T487), we reclaimed 1.1T of data on the ssd (which were the initial blocking point).
So now, we can try to inject back the archiver's bootstrap data to finally... run it ^^
Jul 14 2016
It's partially done (worker01, worker02, worker03, worker08).
Some workers are stopped so those one need to be booted first,
I think they were stopped for the ram issue we had on louvre the 4th of July (to restart prado).
Looks like this part is not yet puppetized
For me, this sums up to change the line ExecStart in /etc/systemd/system/swh-worker.service in our multiple workers.
Jul 13 2016
But if you start looking into this (which I recommend you do), it'd be nice to fix all of it, while the context is clear.
Thanks for opening this task, I agree this is the way to go. Once done we can easily deploy the archiver as it is supposed to work in the long-term, reducing the need of further migrations down the road.
Jul 12 2016
As we said on irc, the foreign key from content_archive.id to content.sha1 makes the creation of an archiver single db quite uneasy.
Currently running in uffizi in a tmux session
Jul 11 2016
'status' is of type archive_status which is already an enum. I guess that Postgres do the right thing with integers.
This is a failure for now.
After multiple attempts in the week-end, there is not enough space on disk for the process to finish.
Jul 9 2016
Running on banco:
Irrelevant since the file is in banco
/srv/storage/space/lists/content-id-by-ctime.after-T7.txt.gz
Jul 8 2016
Jul 7 2016
Jun 27 2016
Jun 23 2016
Jun 22 2016
Jun 21 2016
Closed by D53.
Jun 20 2016
Well, this implementation seems to be some dead code since it's:
- neither tested
- nor correct (it's calling inexistant code db.revision_log_by does not exist)
Jun 16 2016
Jun 15 2016
Jun 13 2016
Gah. ._. I had some local changes in swh-scheduler and the tasks wouldn't be registered in celery. Sorry for the hassle...