- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 17 2019
Sep 16 2019
- typecheck: do not fiddle with MYPYPHATH and rely on PEP 561 instead
- typecheck: rely on "pip list" to generate MYPYPATH
In D1983#46101, @vlorentz wrote:but that will need to wait for Python >= 3.6, as they require PEP 526 variable annotations.
What about type comments?
Sep 15 2019
- add untyped deps to make swh-loader-core pass
Sep 13 2019
Neither of the two spectrum endpoints "fully sort in RAM then write sequentially" and "write randomly" is satisfactory here.
What we want is: in memory sorting within the limits allowed by available RAM + swapon/swapoff of partially sorted subsets + sequential write at the end.
We can implement this in Java in the Setup class, but, in fact, that is exactly what /usr/bin/sort is good at doing. So I propose to shell out to it from Setup and serialize sort result to a writer for the binary format of T1944.
Status update: we have now binary serialization formats for the two maps, see docstrings of PidToIntMap and IntToPidMap in swh.graph.pid
That means that Python code can read the compact maps (and also write them, but at a speed that is impractical for generation). Conversion of the textual maps generated for the most recent compressed graph is ongoing and almost completed.
Sep 12 2019
Controversial points:
It adds a new Kafka topic, of objects that are not part of the data model
- pid.py: avoid importing unused mmap constants
- cli.py: avoid importing unused PID_BIN_SIZE constant
- test_pid.py: fix alphabetic ordering of node types
- pid.py: use a dict for more idiomatic file mode check
- pid maps: add limited support for updatable maps
- CLI: make restore of int->pid maps use mmap writing instead of seek
- pid.py: avoid importing unusued mmap constants
- cli.py: avoid importing unused PID_BIN_SIZE constant
- test_pid.py: fix alphabetic ordering of node types
- pid.py: use a dict for more idiomatic file mode check
Sep 11 2019
Sep 9 2019
- pid.py: use a dict for more idiomatic file mode check
- test_pid.py: fix alphabetic ordering of node types
Sep 8 2019
- cli.py: avoid importing unused PID_BIN_SIZE constant
- pid.py: avoid importing unusued mmap constants
Sep 6 2019
binary (de)serialiazer for more compact PID<->int maps
- binary (de)serialiazer for more compact PID<->int maps
- fix typo in function name cross-ref
- requirements.txt: add new dep on swh.model
- change PID order to be alphabetic and match current Java implementation
- fix binary serialization tests after PID ordering change
- pid2int: add type checking on the key
- add restore script from textual to binary maps
- update to fix (most of) @vlorentz review remarks
- serialization: integrate map dump/restore commands into CLI
Sep 5 2019
In D1944#45155, @vlorentz wrote:Please see how other SWH tools implement their CLI. There should be a file named swh/graph/cli.py that extends the cli from swh.core, and an entrypoint declared in setup.py
- integrate cli with swh.core.cli
- binary (de)serialiazer for more compact PID<->int maps
- fix typo in function name cross-ref
- requirements.txt: add new dep on swh.model
- change PID order to be alphabetic and match current Java implementation
- fix binary serialization tests after PID ordering change
- pid2int: add type checking on the key
- add restore script from textual to binary maps
- update to fix (most of) @vlorentz review remarks
- serialization: integrate map dump/restore commands into CLI
update to avoid hijacking swh indexer CLI (and make it actually work)
(Work on redoing the cli.py module right is still pending.)
- update to fix (most of) @vlorentz review remarks
Sep 4 2019
- add restore script from textual to binary maps
- fix binary serialization tests after PID ordering change
- pid2int: add type checking on the key
- change PID order to be alphabetic and match current Java implementation
Sep 3 2019
- binary (de)serialiazer for more compact PID<->int maps
- fix typo in function name cross-ref
- requirements.txt: add new dep on swh.model
requirements.txt: add new dep on swh.model
(it's only closed in a branch, not in master)
- fix typo in function name cross-ref
Sep 1 2019
Aug 30 2019
Aug 29 2019
In T1958#36628, @olasd wrote:
- one HHHL (PCIe card) Intel Optane P4800x (The smallest, 375GB version will be sufficient); should be around 1200 EUR;
- one PCIe card with M.2 slots (Asus Hyper M.2 x16 or equivalent); I've found prices between 100 and 400 EUR;
- two consumer-grade NVMe disks for L2ARC cache (1 or 2TB each; e.g. Samsung 970 Evo); the 1TB drives retail for 200 EUR each.
I'll try to fish for a quote with known suppliers.