12:25 <+olasd> ardumont: you should even do this migration the other way around. 1/ create the new table, empty; 2/ deploy new storage that fills both tables in parallel; 3/ run the migration process for origin visits that don't have any state information 12:26 <+olasd> 4/ switch over the read queries to the new tables; 5/ drop fields from the old table 12:26 <+olasd> (#showerthoughts) 12:27 <+olasd> that way, the only downtime needed is during #2 which should be minimal; it should even be doable on the fly as the RPC api is not changing at all 12:28 <+ardumont> for 2. i need to check the code, i think we stop the writing in origin visit to only write to origin visit status 12:28 <+ardumont> check the code first* 12:29 <+olasd> well that's not good; I don't trust that the read queries will behave properly on the new table with all the data, so I think we need to live with both in parallel until we figure out whether the new schema is good enough 12:29 <+olasd> (performance wise, notably for origins with lots of visits like the debian / pypi / npm origins) 12:29 <+ardumont> olasd: ok, will check and update back those then (if it's indeed the case, i do think it is) 12:29 <+ardumont> vlorentz: ^ 12:30 <+ardumont> anlambert[m]: tbc, yes, pin the storage if you can to the latest tag (v0.0.188) 12:31 olasd: "I don't trust that the read queries will behave properly on the new table with all the data," why not? 12:31 <+anlambert[m]> ardumont: already done, back on track 12:31 <+ardumont> thx 12:33 <+olasd> vlorentz: because it's triplicating an already large set of data 12:35 so ardumont has to revert two thirds of D2938 ? 12:35 -- Notice(swhbot): D2938 (author: ardumont, Closed) on swh-storage: pg-storage: Adapt internal implementations to use origin visit status model representation 12:35 <+ardumont> i was realizing and reflect on that 12:35 <+olasd> well, I'm uneasy halting the world for a week while the data is being moved and we figure out whether the new stuff works at all 12:36 what about staging? 12:36 I mean, use it to test 12:36 <+ardumont> unfortunately the dataset on staging is way smaller 12:36 <+olasd> staging doesn't have 0.0001% of the production data 12:37 hmm 12:37 <+olasd> it's a smoke test 12:37 <+olasd> not much more 12:37 can we duplicate the replica on azure, and use it like a staging with all data? 12:37 <+olasd> no 12:37 too expansive? 12:38 <+olasd> too expensive, and postgresql on azure is useless 12:38 <+olasd> if you're confident doing the full migration in one go, I won't block you too hard 12:38 <+olasd> but I'm dubious 12:39 I'm just trying to find a better way 12:40 <+ardumont> sure, thx 12:40 <+olasd> to minimize the amount of data to move around, it should be possible to turn the current origin_visit table into origin_visit_status, and recreate a new origin_visit table from scratch 12:40 <+olasd> (rather than going the other way around) 12:41 <+olasd> no idea how that will play out with the replication 12:42 sounds risky 12:43 <+olasd> how so? 12:43 <+olasd> it's just adding one new field duplicating the current date field 12:46 <+ardumont> for the dump idea, wondering if we can't make a dump out of origin/origin_visit from prod, restore it to staging 12:47 <+ardumont> trigger further some load after that and check what happens in staging webapp 12:47 <+ardumont> (it'll be missing some snapshots though) 12:47 <+olasd> do you have 200GB free in the staging postgres? 12:48 <+olasd> I mean, that's probably a decent middle ground 12:48 <+ardumont> not immediately but that can be done 12:49 <+olasd> vlorentz: the current proposed migration is risky too, in the sense that it's pretty much a one-way deal once workers start writing /only/ to the new tables 12:49 <+olasd> s/tables/table/ 12:50 <+ardumont> indeed 12:50 <+ardumont> tbc it's been quite a while since i grew weary of that migration 12:50 how so? 12:50 I meant renaming the table 12:50 ardumont's proposal sounds good 12:51 <+ardumont> (weary and wary both :) 12:55 <+ardumont> i'll check that then 12:55 <+ardumont> increasing db0 (staging) fs 12:56 <+ardumont> dumping out of somerset some tables (origin, origin-visit) 12:56 <+ardumont> (somerset looks less loaded) 12:56 <+olasd> please use belvedere 12:56 <+ardumont> ah lol, checked the wrong machine, thx 12:56 <+olasd> it's faster 12:57 <+ardumont> even better 12:57 <+olasd> (and it's not being used by the public frontend) 12:58 <+ardumont> righty right