Changing provider in title to authority for terminology consistency.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 27 2020
Apr 26 2020
Upon (admittedly quick) review, I don't think that anything more than D3068 is needed to address this.
Double-checking/feedback welcome!
Apr 24 2020
I wrote that little script to check the number of origin IRIs and URIs in the archive
Apr 23 2020
Apr 22 2020
Apr 21 2020
Apr 20 2020
Apr 17 2020
and it finished alright \m/
Apr 15 2020
while not finished, run is still happy so far
Apr 14 2020
Deployed.
Remains open because there remain decision to be made
about the few real ones (3) we have so far [1]
This can be closed now thanks to D2977.
So our high number of falsy hash collisions is fixed thanks to D2977 now \m/.
Apr 11 2020
Apr 9 2020
We have a somewhat common (but fairly infrequent) pattern of visits crashing hard, and lingering forever in an ongoing state.
In the mean time (pending reviews), a new load was triggered without the proxy storage and all went fine.
Apr 8 2020
An interesting experiment, disabling the proxy buffer storage in the loader nixguix configuration.
And the number of hashcollision dropped to 0 (no new event for that loader since yesterday around 6pm our time).
Apr 7 2020
Right now, heading for 2. for now as the solution for 3. is still a pending question [2]
I'll start with a general reasoning about origin visit vs. origin visit state objects in our "conceptual" data model, as it was sprinkled throughout my comment initially.
It looks you misread what I meant: I was talking about a new OriginVisitUpdate with a snapshot "inconsistent" with the previous snapshot reported by the previous OriginVisitUpdate for the same visit.
In T2310#43182, @douardda wrote:
- do we allow an OriginVisitUpdate(status='ongoing', snapshot=yyy) with the snapshot yyy not a superset of a previous update?
It doesn't make sense to have this, but I'm not sure we should care.
I think this is a rather simple check to implement so I don't see why not do it. Intrinsic robustness is always (if not over complex) a good thing add.
In T2310#43131, @ardumont wrote:Thanks for the questions. I'm unsure about some questions and i replied as best
i could.do we allow an OriginVisitUpdate(status='ongoing', snaphost=None)? what would
be the meaning of this?Yes. It means "loading started, so no snapshot yet".
That sounds sensible ;)
In T2310#43129, @vlorentz wrote:We currently don't have "created" (so no "start" either), but it would make sense to create it.
Regarding this model, a few questions come to my mind:
- do we allow an OriginVisitUpdate(status='ongoing', snaphost=None)? what would be the meaning of this? or do we enforce one just after the created step to model the start transition?
This could mean these things:
- on a first update, to mean the visit was created (but we don't need it if we have a "created" state)
Apr 6 2020
(also, i agree with @vlorentz's faster first reply ;)
The rest is only in origin_visit_update.
Thanks for the questions. I'm unsure about some questions and i replied as best
i could.
We currently don't have "created" (so no "start" either), but it would make sense to create it.
As I understand this, an origin visit, consisting in one OriginVisit object plus a list of OriginVisitUpdate represent the process of visiting an origin to load its content in the archive.
Apr 3 2020
All in all, this task serves the purpose of being sure those exists.
Apr 2 2020
@olasd So we agree to go with #1, right?
In T2346#43057, @vlorentz wrote:I'd say, let's keep the metadata field for now, just to avoid migrating back and forth.
And if we want to pack it with lots of data, we can switch from semantic 1 to semantic 2 later, which shouldn't be too much trouble.
In T2346#43055, @olasd wrote:The only concern I have about removing the metadata field, is that at some point I'd like the "size" of the visit to enter into consideration in the feedback loop of the scheduler (T2345). A metadata field in the visit with the count of objects added (or even just a "visit score") could be a way of recording that info. It would also help the web frontend show the activity for a given repository.
Thanks for recording this.
Apr 1 2020
After discussion on IRC, we're opting for option 2.
Build is green
Rebase on latest master
Mar 31 2020
Mar 30 2020
Mar 25 2020
Current plan: