Page MenuHomeSoftware Heritage

Remove entry_to_bytes and other *_to_bytes functions
AbandonedPublic

Authored by vlorentz on Jan 31 2019, 5:49 PM.

Details

Summary

entry_to_bytes and its friends were called many times (eg. entry_to_bytes
alone was called 40k times while indexing 500 origins with the metadata
indexer), and its use of isinstance used a non-negligible amount of
CPU time.

Instead of using *_to_bytes function as post-processing on all bits of
data returned by postgresql, this patch tells psycopg to use a new
typecast_bytea function when needed (in adapt_conn).
This function deffers the decoding work to psycopg2, which returns a
memoryview, which is turned into bytes.

Diff Detail

Repository
rDSTO Storage manager
Branch
kill-entry_to_bytes
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 3918
Build 5131: tox-on-jenkinsJenkins
Build 5130: arc lint + arc unit

Event Timeline

ardumont requested changes to this revision.Feb 1 2019, 8:58 AM
ardumont added a subscriber: ardumont.

Sounds nice.

/me *coughs*

I am under the impression your diff is against an old swh-storage.

I remember baseDb and db_utils moved to swh.core...

I only request changes to check for that.

All in all, i'm quite excited about this and can't wait to put this in production ;)

swh/storage/db.py
843

can't we just return the fetchone()'s result now?

return cur.fetchone()
This revision now requires changes to proceed.Feb 1 2019, 8:58 AM

vlorentz abandoned this revision.

Not everything is to throw away though...