Page MenuHomeSoftware Heritage
Feed Advanced Search

Sep 17 2019

zack committed rDENV91b5b1c94014: mypy.ini: make some other modules pass type checking (authored by zack).
mypy.ini: make some other modules pass type checking
Sep 17 2019, 8:54 AM
zack committed rDENV8959157a7eb4: typecheck: do not fiddle with MYPYPHATH and rely on PEP 561 instead (authored by zack).
typecheck: do not fiddle with MYPYPHATH and rely on PEP 561 instead
Sep 17 2019, 8:54 AM
zack committed rDENV5deaa5f0c244: typecheck: rely on "pip list" to generate MYPYPATH (authored by zack).
typecheck: rely on "pip list" to generate MYPYPATH
Sep 17 2019, 8:54 AM
zack committed rDENV46d746d58083: mypy: bare-bone configuration and make target ("typecheck") (authored by zack).
mypy: bare-bone configuration and make target ("typecheck")
Sep 17 2019, 8:54 AM
zack committed rDENV13f01f35e26e: add untyped deps to make swh-loader-core pass (authored by zack).
add untyped deps to make swh-loader-core pass
Sep 17 2019, 8:54 AM
zack closed D1983: mypy: bare-bone configuration and make target ("typecheck").
Sep 17 2019, 8:54 AM

Sep 16 2019

Herald added a reviewer for D1989: click "required" param wants bool, not int: Reviewers.
Sep 16 2019, 5:20 PM
zack committed rDDEPcae84fcdd581: fix typos in docstrings and docs (authored by zack).
fix typos in docstrings and docs
Sep 16 2019, 5:19 PM
Herald added a reviewer for D1988: click "required" param wants bool, not int: Reviewers.
Sep 16 2019, 5:18 PM
Herald added a reviewer for D1987: click "required" param wants bool, not int: Reviewers.
Sep 16 2019, 5:14 PM
Herald added a reviewer for D1986: click "required" param wants bool, not int: Reviewers.
Sep 16 2019, 4:59 PM
Herald added a reviewer for D1985: click "required" param wants bool, not int: Reviewers.
Sep 16 2019, 4:15 PM
zack updated the diff for D1983: mypy: bare-bone configuration and make target ("typecheck").
  • typecheck: do not fiddle with MYPYPHATH and rely on PEP 561 instead
Sep 16 2019, 1:56 PM
zack updated the diff for D1983: mypy: bare-bone configuration and make target ("typecheck").
  • typecheck: rely on "pip list" to generate MYPYPATH
Sep 16 2019, 12:16 PM
zack added a comment to D1983: mypy: bare-bone configuration and make target ("typecheck").

but that will need to wait for Python >= 3.6, as they require PEP 526 variable annotations.

What about type comments?

Sep 16 2019, 7:53 AM

Sep 15 2019

zack updated the diff for D1983: mypy: bare-bone configuration and make target ("typecheck").
  • add untyped deps to make swh-loader-core pass
Sep 15 2019, 2:55 PM
zack committed rDCORE3ee89dd35853: swh.core.config.parse_config_file: fix sphinx markup in docstring (authored by zack).
swh.core.config.parse_config_file: fix sphinx markup in docstring
Sep 15 2019, 2:48 PM
zack committed rDGRPH36b71ee91b61: tox.ini: remove undeclared check-manifest environment (authored by zack).
tox.ini: remove undeclared check-manifest environment
Sep 15 2019, 2:42 PM
zack committed rDLDBASEbeeddebb9562: fix typo in docstring (courtesy of codespell) (authored by zack).
fix typo in docstring (courtesy of codespell)
Sep 15 2019, 12:01 PM
zack committed rDLDBASEf58a7711ee82: tox.ini: normalize indent (authored by zack).
tox.ini: normalize indent
Sep 15 2019, 12:01 PM
zack committed rDDATASET5b7d1880fe62: make swh.dataset a proper (yet empty) package (authored by zack).
make swh.dataset a proper (yet empty) package
Sep 15 2019, 11:37 AM
Herald added a reviewer for D1983: mypy: bare-bone configuration and make target ("typecheck"): Reviewers.
Sep 15 2019, 10:58 AM
zack committed rDMODd70b486c0137: fix indentation and spelling: make "make check" happy (authored by zack).
fix indentation and spelling: make "make check" happy
Sep 15 2019, 10:51 AM
zack committed rDOBJSca7733f8c1ed: fix docstring typos (courtesy of codespell) (authored by zack).
fix docstring typos (courtesy of codespell)
Sep 15 2019, 10:50 AM

Sep 13 2019

zack updated subscribers of T1950: Reduce RAM usage for generating mapping files.

Neither of the two spectrum endpoints "fully sort in RAM then write sequentially" and "write randomly" is satisfactory here.
What we want is: in memory sorting within the limits allowed by available RAM + swapon/swapoff of partially sorted subsets + sequential write at the end.
We can implement this in Java in the Setup class, but, in fact, that is exactly what /usr/bin/sort is good at doing. So I propose to shell out to it from Setup and serialize sort result to a writer for the binary format of T1944.

Sep 13 2019, 1:22 PM · Compressed graph service
zack changed the status of T1944: use a compact, binary format for node ids mapping files, a subtask of T1950: Reduce RAM usage for generating mapping files, from Open to Work in Progress.
Sep 13 2019, 1:19 PM · Compressed graph service
zack changed the status of T1944: use a compact, binary format for node ids mapping files from Open to Work in Progress.

Status update: we have now binary serialization formats for the two maps, see docstrings of PidToIntMap and IntToPidMap in swh.graph.pid
That means that Python code can read the compact maps (and also write them, but at a speed that is impractical for generation). Conversion of the textual maps generated for the most recent compressed graph is ongoing and almost completed.

Sep 13 2019, 1:19 PM · Compressed graph service

Sep 12 2019

zack committed rDTPL6330ded66a78: sort .gitignore for readability (authored by zack).
sort .gitignore for readability
Sep 12 2019, 10:39 AM
zack committed rDTPL05651241cb09: git ignore mypy cache dir (authored by zack).
git ignore mypy cache dir
Sep 12 2019, 10:39 AM
zack added a comment to D1959: Publish origin_intrinsic_metadata to Kafka..

Controversial points:

It adds a new Kafka topic, of objects that are not part of the data model

Sep 12 2019, 10:07 AM
zack committed rDGRPH636278c6fc43: pid.py: use a dict for more idiomatic file mode check (authored by zack).
pid.py: use a dict for more idiomatic file mode check
Sep 12 2019, 10:04 AM
zack committed rDGRPH94c6e1ed7457: test_pid.py: fix alphabetic ordering of node types (authored by zack).
test_pid.py: fix alphabetic ordering of node types
Sep 12 2019, 10:04 AM
zack committed rDGRPHdaff920072ed: cli.py: avoid importing unused PID_BIN_SIZE constant (authored by zack).
cli.py: avoid importing unused PID_BIN_SIZE constant
Sep 12 2019, 10:04 AM
zack committed rDGRPHf06713d216c7: pid.py: avoid importing unused mmap constants (authored by zack).
pid.py: avoid importing unused mmap constants
Sep 12 2019, 10:04 AM
zack committed rDGRPH66cfb3625025: CLI: make restore of int->pid maps use mmap writing instead of seek (authored by zack).
CLI: make restore of int->pid maps use mmap writing instead of seek
Sep 12 2019, 10:04 AM
zack committed rDGRPH84f223a0eeb3: pid maps: add limited support for updatable maps (authored by zack).
pid maps: add limited support for updatable maps
Sep 12 2019, 10:04 AM
zack closed D1970: pid maps: add limited support for updatable maps.
Sep 12 2019, 10:04 AM
zack updated the diff for D1970: pid maps: add limited support for updatable maps.
  • pid.py: avoid importing unused mmap constants
  • cli.py: avoid importing unused PID_BIN_SIZE constant
  • test_pid.py: fix alphabetic ordering of node types
  • pid.py: use a dict for more idiomatic file mode check
Sep 12 2019, 10:01 AM
zack updated the diff for D1970: pid maps: add limited support for updatable maps.
  • pid maps: add limited support for updatable maps
  • CLI: make restore of int->pid maps use mmap writing instead of seek
  • pid.py: avoid importing unusued mmap constants
  • cli.py: avoid importing unused PID_BIN_SIZE constant
  • test_pid.py: fix alphabetic ordering of node types
  • pid.py: use a dict for more idiomatic file mode check
Sep 12 2019, 9:52 AM
D1970: pid maps: add limited support for updatable maps is now accepted and ready to land.
Sep 12 2019, 9:52 AM

Sep 11 2019

zack accepted D1979: Graph.java: implement a flyweight copy() method.
Sep 11 2019, 2:48 PM

Sep 9 2019

zack added inline comments to D1970: pid maps: add limited support for updatable maps.
Sep 9 2019, 11:47 AM
zack updated the diff for D1970: pid maps: add limited support for updatable maps.
  • pid.py: use a dict for more idiomatic file mode check
Sep 9 2019, 11:47 AM
zack added inline comments to D1970: pid maps: add limited support for updatable maps.
Sep 9 2019, 11:44 AM
zack updated the diff for D1970: pid maps: add limited support for updatable maps.
  • test_pid.py: fix alphabetic ordering of node types
Sep 9 2019, 11:43 AM
zack added inline comments to D1970: pid maps: add limited support for updatable maps.
Sep 9 2019, 10:31 AM

Sep 8 2019

zack committed rDGRPH69e0296bf46f: requirements.txt: add missing dep on aiohttp (authored by zack).
requirements.txt: add missing dep on aiohttp
Sep 8 2019, 1:55 PM
zack updated the diff for D1970: pid maps: add limited support for updatable maps.
  • cli.py: avoid importing unused PID_BIN_SIZE constant
Sep 8 2019, 11:24 AM
zack updated the diff for D1970: pid maps: add limited support for updatable maps.
  • pid.py: avoid importing unusued mmap constants
Sep 8 2019, 11:15 AM
zack added a reviewer for D1970: pid maps: add limited support for updatable maps: seirl.
Sep 8 2019, 11:13 AM
Herald added a reviewer for D1970: pid maps: add limited support for updatable maps: Reviewers.
Sep 8 2019, 11:13 AM

Sep 6 2019

zack committed rDGRPHad1133cc3cb4: int->pid map restore: support arbitrarily ordered inputs (authored by zack).
int->pid map restore: support arbitrarily ordered inputs
Sep 6 2019, 4:23 PM
zack committed rDGRPH1cda22bd9422: binary (de)serialiazer for more compact PID<->int maps (authored by zack).
binary (de)serialiazer for more compact PID<->int maps
Sep 6 2019, 2:54 PM
zack closed D1944: binary (de)serialiazer for more compact PID<->int maps.
Sep 6 2019, 2:54 PM
zack updated the diff for D1944: binary (de)serialiazer for more compact PID<->int maps.

binary (de)serialiazer for more compact PID<->int maps

Sep 6 2019, 2:54 PM
zack updated the diff for D1944: binary (de)serialiazer for more compact PID<->int maps.
  • binary (de)serialiazer for more compact PID<->int maps
  • fix typo in function name cross-ref
  • requirements.txt: add new dep on swh.model
  • change PID order to be alphabetic and match current Java implementation
  • fix binary serialization tests after PID ordering change
  • pid2int: add type checking on the key
  • add restore script from textual to binary maps
  • update to fix (most of) @vlorentz review remarks
  • serialization: integrate map dump/restore commands into CLI
Sep 6 2019, 2:36 PM
zack closed D1963: integrate cli with swh.core.cli.

merged in c837f455ef37

Sep 6 2019, 2:35 PM
zack committed rDGRPHc837f455ef37: integrate cli with swh.core.cli (authored by zack).
integrate cli with swh.core.cli
Sep 6 2019, 2:34 PM
zack committed rMSLD6657c2d0ee38: merkle DAG: update UML topology picture with sha1 types and missing arrows (authored by zack).
merkle DAG: update UML topology picture with sha1 types and missing arrows
Sep 6 2019, 11:32 AM

Sep 5 2019

zack triaged T1974: Document low-level storage layers as Normal priority.
Sep 5 2019, 9:01 PM · Documentation
zack added a comment to D1944: binary (de)serialiazer for more compact PID<->int maps.

Please see how other SWH tools implement their CLI. There should be a file named swh/graph/cli.py that extends the cli from swh.core, and an entrypoint declared in setup.py

Sep 5 2019, 8:17 PM
zack updated the diff for D1944: binary (de)serialiazer for more compact PID<->int maps.
  • integrate cli with swh.core.cli
  • binary (de)serialiazer for more compact PID<->int maps
  • fix typo in function name cross-ref
  • requirements.txt: add new dep on swh.model
  • change PID order to be alphabetic and match current Java implementation
  • fix binary serialization tests after PID ordering change
  • pid2int: add type checking on the key
  • add restore script from textual to binary maps
  • update to fix (most of) @vlorentz review remarks
  • serialization: integrate map dump/restore commands into CLI
Sep 5 2019, 8:14 PM
zack updated the diff for D1963: integrate cli with swh.core.cli.

update to avoid hijacking swh indexer CLI (and make it actually work)

Sep 5 2019, 6:24 PM
Herald added a reviewer for D1963: integrate cli with swh.core.cli: Reviewers.
Sep 5 2019, 6:19 PM
zack triaged T1986: swh.model.identifiers: move validation from parsing_persistent_identifier to PersistentId constructor as Low priority.
Sep 5 2019, 5:39 PM · Easy hack, Data Model
zack updated subscribers of D1944: binary (de)serialiazer for more compact PID<->int maps.
Sep 5 2019, 5:19 PM
zack planned changes to D1944: binary (de)serialiazer for more compact PID<->int maps.

(Work on redoing the cli.py module right is still pending.)

Sep 5 2019, 4:20 PM
zack added inline comments to D1944: binary (de)serialiazer for more compact PID<->int maps.
Sep 5 2019, 4:19 PM
zack updated the diff for D1944: binary (de)serialiazer for more compact PID<->int maps.
  • update to fix (most of) @vlorentz review remarks
Sep 5 2019, 4:19 PM
zack planned changes to D1944: binary (de)serialiazer for more compact PID<->int maps.
Sep 5 2019, 3:46 PM

Sep 4 2019

zack accepted D1956: docs: fix toc.
Sep 4 2019, 5:47 PM
zack added a project to T1984: Fix the broken/missing tocs in all the modules: Documentation.
Sep 4 2019, 5:46 PM · Documentation
zack updated the diff for D1944: binary (de)serialiazer for more compact PID<->int maps.
  • add restore script from textual to binary maps
Sep 4 2019, 9:26 AM
zack updated the diff for D1944: binary (de)serialiazer for more compact PID<->int maps.
  • fix binary serialization tests after PID ordering change
  • pid2int: add type checking on the key
Sep 4 2019, 8:28 AM
zack added inline comments to D1944: binary (de)serialiazer for more compact PID<->int maps.
Sep 4 2019, 7:59 AM
zack updated the diff for D1944: binary (de)serialiazer for more compact PID<->int maps.
  • change PID order to be alphabetic and match current Java implementation
Sep 4 2019, 7:57 AM

Sep 3 2019

zack updated the diff for D1944: binary (de)serialiazer for more compact PID<->int maps.
  • binary (de)serialiazer for more compact PID<->int maps
  • fix typo in function name cross-ref
  • requirements.txt: add new dep on swh.model
Sep 3 2019, 6:45 PM
zack reopened D1944: binary (de)serialiazer for more compact PID<->int maps.
Sep 3 2019, 6:45 PM
zack committed rDGRPH7c16703ac537: requirements.txt: add new dep on swh.model (authored by zack).
requirements.txt: add new dep on swh.model
Sep 3 2019, 6:44 PM
zack closed D1944: binary (de)serialiazer for more compact PID<->int maps.
Sep 3 2019, 6:44 PM
zack updated the diff for D1944: binary (de)serialiazer for more compact PID<->int maps.

requirements.txt: add new dep on swh.model

Sep 3 2019, 6:43 PM
zack reopened D1944: binary (de)serialiazer for more compact PID<->int maps.

(it's only closed in a branch, not in master)

Sep 3 2019, 6:42 PM
zack committed rDGRPH596dc78573f2: fix typo in function name cross-ref (authored by zack).
fix typo in function name cross-ref
Sep 3 2019, 6:22 PM
zack closed D1944: binary (de)serialiazer for more compact PID<->int maps.
Sep 3 2019, 6:22 PM
zack committed rDGRPH51acefe11543: binary (de)serialiazer for more compact PID<->int maps (authored by zack).
binary (de)serialiazer for more compact PID<->int maps
Sep 3 2019, 6:22 PM
zack updated the diff for D1944: binary (de)serialiazer for more compact PID<->int maps.
  • fix typo in function name cross-ref
Sep 3 2019, 6:19 PM
zack added a reviewer for D1944: binary (de)serialiazer for more compact PID<->int maps: Reviewers.
Sep 3 2019, 6:08 PM
zack created D1944: binary (de)serialiazer for more compact PID<->int maps.
Sep 3 2019, 6:08 PM
zack committed rDGRPHcb5e5ad19ee8: wip: streaming interface python <-> java (authored by zack).
wip: streaming interface python <-> java
Sep 3 2019, 2:04 PM

Sep 1 2019

zack triaged T1979: Remarks on the tutorial "Run a new lister" as Normal priority.
Sep 1 2019, 10:45 AM · Documentation, Lister

Aug 30 2019

zack committed R183:e399873bdd19: add MSR 2019 Software Heritage Graph Dataset paper (authored by zack).
add MSR 2019 Software Heritage Graph Dataset paper
Aug 30 2019, 7:21 PM
zack committed R183:b74417742dfa: add book: Applied Cryptography by Schneier (authored by zack).
add book: Applied Cryptography by Schneier
Aug 30 2019, 7:21 PM

Aug 29 2019

zack added a comment to T1958: Performance tuning of zfs infrastructure.
In T1958#36628, @olasd wrote:
  • one HHHL (PCIe card) Intel Optane P4800x (The smallest, 375GB version will be sufficient); should be around 1200 EUR;
  • one PCIe card with M.2 slots (Asus Hyper M.2 x16 or equivalent); I've found prices between 100 and 400 EUR;
  • two consumer-grade NVMe disks for L2ARC cache (1 or 2TB each; e.g. Samsung 970 Evo); the 1TB drives retail for 200 EUR each.

I'll try to fish for a quote with known suppliers.

Aug 29 2019, 7:32 PM · System administration

Aug 27 2019

zack committed rDGRPH973a949a1248: fix indentation in test code too (authored by zack).
fix indentation in test code too
Aug 27 2019, 9:26 PM
zack committed rDGRPH51e7217b97cd: fix @author in Java files to use team name (authored by zack).
fix @author in Java files to use team name
Aug 27 2019, 2:05 PM
zack committed rDGRPHfa29d218159e: cosmetic: reindent Java code to match coding style (authored by zack).
cosmetic: reindent Java code to match coding style
Aug 27 2019, 2:05 PM

Aug 26 2019

zack added a project to T1971: Integrate swh-graph javadoc in swh-docs: Documentation.
Aug 26 2019, 10:45 PM · Documentation, Compressed graph service
zack triaged T1970: Web API: make /origin/ return the swh:1:ori:... PID as Low priority.
Aug 26 2019, 4:01 PM · Web app
zack created T1970: Web API: make /origin/ return the swh:1:ori:... PID.
Aug 26 2019, 4:01 PM · Web app
zack committed rDSNIP526bd01dd205: SQL graph export: actually use a fifo, rather than a regular file (authored by zack).
SQL graph export: actually use a fifo, rather than a regular file
Aug 26 2019, 11:24 AM