Should this be closed now? The documentation is at https://docs.softwareheritage.org/devel/swh-dataset/
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Mar 8 2021
Mar 7 2021
Mar 4 2021
Notice that the filtering may need to be done at all levels: origins, but also SWHIDs in general.
An example (real) use case is a takedown request for just one specific commit in a repository: we do not want to dereference all the rest.
Feb 10 2021
In T376#58337, @ardumont wrote:Note that does not mean this is or will be ingested anytime soon though.
We are still missing at least the one cog to actually schedule those listed origins.More details in T2345#58247
Feb 4 2021
I asked one of the authors of the original HyperLogLog paper (not Philippe, that unfortunately passed away years ago :-()
The original HyperLogLog has three different behaviour, one for small cardinals, another for median cardinals, and a third for very large cardinals.
There is indeed a risk of breaking monotonicity at the boundaries between segments, but in each segment it is monotonic.
Our counters are already in the "very large cardinal" zone, so we should be safe with any implementation.
Feb 3 2021
In T2912#58063, @zack wrote:In T2912#58062, @rdicosmo wrote:Thanks @vsellier, that seems quite ok indeed. The only question left is to know if the estimator implemented is monotonic (i.e. we will never have negative bumps in the graph :-))
may I suggest (for reasons discussed in the past) to just remove the graphs from the main archive.s.o page
We decided to keep the counters.
Thanks @vsellier, that seems quite ok indeed. The only question left is to know if the estimator implemented is monotonic (i.e. we will never have negative bumps in the graph :-))
Feb 2 2021
Make explicit Python3 dependency
Feb 1 2021
In T376#57824, @rdicosmo wrote:Thanks @ardumont , that's great! If you think this does not need any more support on the Eclipse side, may you let them know?
Thanks @ardumont , that's great! If you think this does not need any more support on the Eclipse side, may you let them know?
Jan 31 2021
Jan 30 2021
Jan 29 2021
In T2912#57643, @vlorentz wrote:I don't think this solves the issue of overestimating the number of objects, when two threads insert the same objects at the same time.
! In T2912#57655, @vsellier wrote:
I'm not sure to understand,
Thanks @ardumont for experimenting with this. The 500 seems normal: we need to tell Eclipse about us first, I'll put you in touch. So maybe it's still a no-brainer, and we just need to document the "contant the owner to get whitelisted" human step :-)
Jan 28 2021
Bloom filters are still on the table for other use cases, like testing super quickly for contents that we do not have, but if nobody has strong objections, this seems the way to go for the counters (very small footprint, small under/over counting errors, thanks Philippe Flajolet's magic :-))