Details

Reviewers

vlorentz
olasd

Commits

rDSTOC12a2dd634422: Add in_memory.person_get implementation and improve person tests
R65:12a2dd634422: Add in_memory.person_get implementation and improve person tests
rDSTO12a2dd634422: Add in_memory.person_get implementation and improve person tests

Summary

Implement person_get for the in-memory storage.
I have also updated the person counter in the stat_counters method.

This is needed to remove all the storage mocks in swh-web tests.

Related T1271
Related T1377

Test Plan

test_person_get is now executed for the in-memory storage tests.

Diff Detail

Repository

rDSTO Storage manager

Branch

memory-storage-person_get

Lint

No Linters Available

Unit

No Unit Test Coverage

Build Status

Buildable 3295
Build 4246: tox-on-jenkins	Jenkins
Build 4245: arc lint + arc unit

Event Timeline

anlambert created this revision.Jan 8 2019, 2:59 PM

anlambert edited the summary of this revision. (Show Details)Jan 8 2019, 3:00 PM

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/145/ for more details.

Harbormaster completed remote builds in B3284: Diff 2833.Jan 8 2019, 3:03 PM

The unicity checks should really happen on the fullname, as the email and name fields are only advisory (i.e. they only exist if we succeeded in parsing them).

This revision now requires changes to proceed.Jan 8 2019, 3:11 PM

In D897#19157, @olasd wrote:

The unicity checks should really happen on the fullname, as the email and name fields are only advisory (i.e. they only exist if we succeeded in parsing them).

What should work is having both a dict mapping fullnames to full person objects (with added id), and a list of full person objects that will allow id -> person lookups. This will make the deduplication a constant time hash lookup, and object lookups a constant time list access.

In D897#19157, @olasd wrote:

The unicity checks should really happen on the fullname, as the email and name fields are only advisory (i.e. they only exist if we succeeded in parsing them).

And we need tests to check this behavior :)

swh/storage/tests/test_api_client.py
65–67 ↗	(On Diff #2833)	You could use `revision_add` to add persons.

In D897#19161, @vlorentz wrote:

In D897#19157, @olasd wrote:

The unicity checks should really happen on the fullname, as the email and name fields are only advisory (i.e. they only exist if we succeeded in parsing them).

And we need tests to check this behavior :)

*nod*, I guess the ship has sailed and the person model is part of our external interface now

swh/storage/tests/test_api_client.py
65–67 ↗	(On Diff #2833)	Or we could just expose person_add in the remote API, as apparently the person stuff is part of our external interface now.

Ack for the above comments regarding unicity check and optimizations.

Regarding the test_person_get test, I will rather rewrite it adding revisions to storage first, as suggested by vlorentz, before trying to call storage.person_get.

Based on what I read and tested in main storage implementation, the storage._person_add method does not check
if the fullname of a person is already present in the person table, resulting in an exception being raised when
such a case appears (https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/db.py$884).
In the contrary, when a revision is added, a stored procedure adds author and committer to the person table
and checks if they already exist based on their full names (https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/sql/40-swh-func.sql$616).

As the storage._person_add is only used in test implementation and is not secure, I think it should not be exposed
to the main storage api and it could even be removed as persons are only added when revisions and releases are
ingested into the archive. What do you think ?

Update:

Address olasd comments regarding fullname unicity and optimizations
Rewrite test_person_get test to add persons from revisions ingestion instead of calling private _person_add method
Add a new test to check person fullname unicity
Reactivate person tests for RemoteStorage

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/146/ for more details.

Harbormaster completed remote builds in B3290: Diff 2834.Jan 8 2019, 5:37 PM

vlorentz accepted this revision.Jan 8 2019, 5:49 PM

I have a few stylistic comments that may be addressed before this lands. There's also a question about self._objects which confuses me a bit.

In D897#19166, @anlambert wrote:

Rewrite test_person_get test to add persons from revisions ingestion instead of calling private _person_add method

With this change I believe _person_add can be removed from the sql storage

swh/storage/in_memory.py
1034–1035	could you swap the if and yield? it'd avoid needing the backslash
1238	I'm not sure what the semantics of `self._objects` is, but using it for this check doesn't feel right: what if someone's full name ends up being a valid object identifier for another type?
swh/storage/tests/test_storage.py
1252–1253	I'd linebreak after `list(` to avoid needing backslashes
1261–1265	ditto

anlambert marked an inline comment as done.Jan 9 2019, 12:06 PM

anlambert added inline comments.

swh/storage/in_memory.py
1238	Based on my understanding of vlorentz implementation, the `_objects` dict enable to easily lookup objects by sha1s and count object by type for the stat counters. Other keys in that dict corresponds to sha1s or (origin_type, origin_url) tuples so it's safe to use fullname as key here. I did not want to add another dict as class member as I knew I could use the `_objects` one.

Update: code style fixes

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/147/ for more details.

Harbormaster completed remote builds in B3295: Diff 2837.Jan 9 2019, 1:28 PM

olasd accepted this revision.Jan 9 2019, 2:28 PM

olasd added inline comments.

swh/storage/in_memory.py
1238	I mean, technically, a full name could match a sha1 (yeah, I know it's more than a bit far-fetched). I don't think it matters much, it's just a bit surprising.

This revision is now accepted and ready to land.Jan 9 2019, 2:28 PM

anlambert marked an inline comment as done.Jan 9 2019, 4:16 PM

anlambert added inline comments.