Page MenuHomeSoftware Heritage

indexer: ci: broken master build: Fix various test/utils data model issues (following the new model validation delivery)
ClosedPublic

Authored by ardumont on Sep 4 2019, 4:21 PM.

Details

Summary

Following the latest data model validation, we have our test data which is not passing those.

This is to fix those:

  • Fix wrong b'target_type' in revision dict input
  • Add missing snapshot 'target_type' key field
  • Add missing revision {type, committer_date, message}
  • Remove unnecessary fields from origins (lister, project)
  • Fix directory entries data
  • Fix initialization data in various metadata integration tests
Test Plan

tox

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 7690
Build 11043: tox-on-jenkinsJenkins
Build 11042: arc lint + arc unit

Event Timeline

swh/indexer/tests/utils.py
166–170

nitpick: use the same formatting as the author date

189–190

the var name should be DIRECTORY_ENTRIES, btw

  • tests/utils: Fix directory data
  • test_metadata: Fix directory setup
  • tests/utils: Fix indentation
  • tests/utils: Fix variable name

This tries to fix the test data to pass the new validation. Sadly, this is not
enough to make the tests green again as some side-effects were actually making
the tests work. Trying to untangle what's running wrong is somehow tedious and
not that simple.

ardumont marked 2 inline comments as done.
  • test_metadata: Fix the content id to the one actually used

printing all around... the incongruity came up:

##### dir-ls:  <generator object Storage.directory_ls at 0x7fe96f6716d8>
##### files:  [{'status': 'visible', 'sha1': b'abc', 'sha1_git': b'abc', 'sha256': b'abc', 'length': 1029, 'name': b'index.js', 'type': 'file', 'target': b'abc', 'perms': 33188, 'dir_id': b'10'}, {'status': 'visible', 'sha1': b'aab', 'sha1_git': b'aab', 'sha256': b'aab', 'length': 1029, 'name': b'package.json', 'type': 'file', 'target': b'aab', 'perms': 33188, 'dir_id': b'10'}]
##### detected-files {'NpmMapping': [b'aab']}
############### context:  NpmMapping
############### files [b'aab']
############### metadata  dict_keys([(b'cde', -9039153642638674709)]) # <- indexer storage in memory state
swh/indexer/tests/test_metadata.py
1183

I made the test_metadata.py ok by using this id which was what ended being looked up
...

but i did not follow through why this needs to be changed...

  • fix typo
  • test_metadata: Fix the content id to the one actually used
  • test_origin_metadata: Make tests green

As mentioned in one of the commits, i did not care for the duplication (i aimed at fixing those tests). I'll have to check again in a bit but not right now.

@vlorentz please have a look to check there is nothing strikingly wrong already ;)

swh/indexer/tests/test_metadata.py
1183

Context of what i am saying is P520.

ardumont retitled this revision from tests/utils: Fix various test data model issues to indexer: ci: broken master build: Fix various test/utils data model issues (following the new model validation delivery).Sep 6 2019, 9:45 AM
vlorentz added inline comments.
swh/indexer/tests/test_metadata.py
1183

This is caused by:

swh/indexer/tests/utils.py
206–215

because of this, directory_ls used to return {'target': b'aab', .., 'sha1': b'cde'} (b'aab' being a a sha1_git).

578–581

But, this new code adds {'target': b'aab', .., 'sha1': b'aab'}; so the sha1 changed.

You should not populate storage.content_add from DIRECTORY_ENTRIES. Add a new constant (with the same data that was deleted) instead.

This revision now requires changes to proceed.Sep 6 2019, 10:36 AM
swh/indexer/tests/utils.py
578–581

But, this new code adds {'target': b'aab', .., 'sha1': b'aab'}; so the sha1 changed.

yes, i hear you.

But, I remember i tried to add a constants in the objstorage (b'aab', b'abc') first but it failed the same.
The lookup of the sha1 stopped being b'cde' (within the indexer run) when removing the extra fields in the DIRECTORY_ENTRIES (necessary to pass the validation).
Looks like as you said some misbehavior in the in-memory storage somehow...

578–581

The actual fix is to target b'cde' for that particular entry...
(then i don't have to fix any other failing tests)

swh/indexer/tests/utils.py
578–581

No. The actual fix is to use storage.content_add([{'target': b'aab', .., 'sha1': b'cde'}])

swh/indexer/tests/utils.py
578–581

Well, i mean the real fix would be to untangle this, this is not maintainable as is.
It's not clear where what is coming from where.

As a first approximation though, just changing the less cogs possible (only the requisite from the validation introduction) to have the tests green as before is the way.
Then improving.

And

No. The actual fix is to use storage.content_add([{'target': b'aab', .., 'sha1': b'cde'}])

Even though correct, that does not add up to clarity...

Or let's use actual hashes instead of placeholders like 'aab' or 'cde', so it doesn't break again when we add hash validation in the model.

just changing the less cogs possible

That's what I'm suggesting. Move the extra fields from DIRECTORY_ENTRIES to a new CONTENT var.

tests/utils: Fix various test data model issues failing validation

Because of the test data were wrongly structured.
The validation check introduced made it apparent.

Summary:

  • Fix wrong b'target_type' in revision dict input
  • Add missing snapshot 'target_type' key field
  • Add missing revision {type, committer_date, message}
  • Remove unnecessary fields from origins (lister, project)
  • Remove wrong extra directory entry fields

(+ Rebase to squash commits)

vlorentz added inline comments.
swh/indexer/tests/utils.py
560–561

Still not happy about this, but let's fix it later.

This revision is now accepted and ready to land.Sep 6 2019, 11:35 AM
This revision was automatically updated to reflect the committed changes.
swh/indexer/tests/utils.py
560–561

neither am i but yeah, not right now ;)