in addition we will also need to modify storage to store and allow retrieval of hashed origin URLs.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jan 22 2020
Jan 21 2020
Jan 15 2020
Jan 14 2020
Jan 8 2020
I just ran it on Azure. It has a different schema (the "revision" table with split into "revision" and "revision_parent") so the benchmarks are not exactly comparable.
I still use 16 workers, all running on the same machine, and with no compression
Dec 20 2019
A cassandra cluster has been deployed on cassandra[01-06].euwest.azure.internal.softwareheritage.org.
Nov 27 2019
proposed CLI interface:
swh [ -C config.yml ] graph mount PID DIR
will mount the content of the given PID to the given local DIR.
Nov 25 2019
Nov 22 2019
Launched on somerset:
We should consider just adding a btree index on sha1(url) and see where that takes us.
Forgot to mention it here, but it's done now.
Nov 21 2019
Nov 18 2019
Nov 14 2019
+1 on this
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/778/ for more details.
Nov 12 2019
https://github.com/omniti-labs/pg_extractor is a decent way to extract the schema from a running database into something that's easily diffable
Nov 11 2019
Nov 8 2019
Nov 7 2019
Probably not. I'm working on adding support for other objects.
In T2053#38352, @vlorentz wrote:Added parallelism. 450k/s with 16 workers and no compression. I won't try with 32 workers because Python processes would use too much CPU on my machine.
Added parallelism. 450k/s with 16 workers and no compression. I won't try with 32 workers because Python processes would use too much CPU on my machine.
Throughput improved to 34k/s just by not querying unneeded fields.
Nov 5 2019
Looks good, thanks !
This has now been deployed in all components as well as in production databases. Bye bye origin['type'].
Nov 4 2019
I wrote a prototype for exporting revisions: https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/cassandra_stream_graph.py
Oct 31 2019
Oct 28 2019
Oct 24 2019
Oct 22 2019
Oct 19 2019
Oct 18 2019
The missing unique constraint has now been added.
Oct 11 2019
Oct 10 2019
Oct 8 2019
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/681/ for more details.
This revision was not accepted when it landed; it landed in state Needs Review.
Adapt according to last missing review points
You missed this comment:
You missed this comment:
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/680/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/679/ for more details.
Rebase on latest development
Rebase to latest diff
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/676/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/675/ for more details.
Fix commit message
Fix ci mypy which is not happy, local mypy was ok
BUILD has failed
Build has FAILED
- Fix code-block syntax
- Add some more assertions on storage writing or not
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/673/ for more details.
Adapt according to review and mypy check
In D2084#48380, @vlorentz wrote:Accepting, because @ardumont promised to fix it on IRC ;)
Accepting, because @ardumont promised to fix it on IRC ;)
Actually, I still have some nitpicking to do: instead of having sample_data as a large fixture returning a dict of various value types; could you split it into smaller fixtures, each returning a single type of values?