Page MenuHomeSoftware Heritage

indexer: ci: broken master build: Fix various test/utils data model issues (following the new model validation delivery)
ClosedPublic

Authored by ardumont on Wed, Sep 4, 4:21 PM.

Details

Summary

Following the latest data model validation, we have our test data which is not passing those.

This is to fix those:

  • Fix wrong b'target_type' in revision dict input
  • Add missing snapshot 'target_type' key field
  • Add missing revision {type, committer_date, message}
  • Remove unnecessary fields from origins (lister, project)
  • Fix directory entries data
  • Fix initialization data in various metadata integration tests
Test Plan

tox

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Wed, Sep 4, 4:21 PM
ardumont planned changes to this revision.Wed, Sep 4, 4:24 PM
vlorentz added inline comments.Thu, Sep 5, 11:11 AM
swh/indexer/tests/utils.py
167–171

nitpick: use the same formatting as the author date

189–190

the var name should be DIRECTORY_ENTRIES, btw

ardumont updated this revision to Diff 6601.Thu, Sep 5, 2:44 PM
  • tests/utils: Fix directory data
  • test_metadata: Fix directory setup
  • tests/utils: Fix indentation
  • tests/utils: Fix variable name

This tries to fix the test data to pass the new validation. Sadly, this is not
enough to make the tests green again as some side-effects were actually making
the tests work. Trying to untangle what's running wrong is somehow tedious and
not that simple.

ardumont planned changes to this revision.Thu, Sep 5, 2:44 PM
ardumont marked 2 inline comments as done.
ardumont updated this revision to Diff 6602.Thu, Sep 5, 2:46 PM
  • fix typo
ardumont planned changes to this revision.Thu, Sep 5, 2:49 PM

grrrr, current status: P520

ardumont updated this revision to Diff 6611.Thu, Sep 5, 5:40 PM
  • test_metadata: Fix the content id to the one actually used

printing all around... the incongruity came up:

##### dir-ls:  <generator object Storage.directory_ls at 0x7fe96f6716d8>
##### files:  [{'status': 'visible', 'sha1': b'abc', 'sha1_git': b'abc', 'sha256': b'abc', 'length': 1029, 'name': b'index.js', 'type': 'file', 'target': b'abc', 'perms': 33188, 'dir_id': b'10'}, {'status': 'visible', 'sha1': b'aab', 'sha1_git': b'aab', 'sha256': b'aab', 'length': 1029, 'name': b'package.json', 'type': 'file', 'target': b'aab', 'perms': 33188, 'dir_id': b'10'}]
##### detected-files {'NpmMapping': [b'aab']}
############### context:  NpmMapping
############### files [b'aab']
############### metadata  dict_keys([(b'cde', -9039153642638674709)]) # <- indexer storage in memory state
ardumont planned changes to this revision.Thu, Sep 5, 5:41 PM
ardumont added inline comments.Thu, Sep 5, 5:53 PM
swh/indexer/tests/test_metadata.py
1183

I made the test_metadata.py ok by using this id which was what ended being looked up
...

but i did not follow through why this needs to be changed...

ardumont updated this revision to Diff 6614.EditedThu, Sep 5, 6:29 PM
  • fix typo
  • test_metadata: Fix the content id to the one actually used
  • test_origin_metadata: Make tests green

As mentioned in one of the commits, i did not care for the duplication (i aimed at fixing those tests). I'll have to check again in a bit but not right now.

@vlorentz please have a look to check there is nothing strikingly wrong already ;)

ardumont edited the summary of this revision. (Show Details)Fri, Sep 6, 9:37 AM
ardumont added inline comments.Fri, Sep 6, 9:44 AM
swh/indexer/tests/test_metadata.py
1183

Context of what i am saying is P520.

ardumont retitled this revision from tests/utils: Fix various test data model issues to indexer: ci: broken master build: Fix various test/utils data model issues (following the new model validation delivery).Fri, Sep 6, 9:45 AM
vlorentz requested changes to this revision.Fri, Sep 6, 10:36 AM
vlorentz added inline comments.
swh/indexer/tests/test_metadata.py
1183

This is caused by:

swh/indexer/tests/utils.py
206–215

because of this, directory_ls used to return {'target': b'aab', .., 'sha1': b'cde'} (b'aab' being a a sha1_git).

578–581

But, this new code adds {'target': b'aab', .., 'sha1': b'aab'}; so the sha1 changed.

You should not populate storage.content_add from DIRECTORY_ENTRIES. Add a new constant (with the same data that was deleted) instead.

This revision now requires changes to proceed.Fri, Sep 6, 10:36 AM
ardumont added inline comments.Fri, Sep 6, 11:07 AM
swh/indexer/tests/utils.py
578–581

But, this new code adds {'target': b'aab', .., 'sha1': b'aab'}; so the sha1 changed.

yes, i hear you.

But, I remember i tried to add a constants in the objstorage (b'aab', b'abc') first but it failed the same.
The lookup of the sha1 stopped being b'cde' (within the indexer run) when removing the extra fields in the DIRECTORY_ENTRIES (necessary to pass the validation).
Looks like as you said some misbehavior in the in-memory storage somehow...

578–581

The actual fix is to target b'cde' for that particular entry...
(then i don't have to fix any other failing tests)

vlorentz added inline comments.Fri, Sep 6, 11:09 AM
swh/indexer/tests/utils.py
578–581

No. The actual fix is to use storage.content_add([{'target': b'aab', .., 'sha1': b'cde'}])

ardumont added inline comments.Fri, Sep 6, 11:15 AM
swh/indexer/tests/utils.py
578–581

Well, i mean the real fix would be to untangle this, this is not maintainable as is.
It's not clear where what is coming from where.

As a first approximation though, just changing the less cogs possible (only the requisite from the validation introduction) to have the tests green as before is the way.
Then improving.

And

No. The actual fix is to use storage.content_add([{'target': b'aab', .., 'sha1': b'cde'}])

Even though correct, that does not add up to clarity...

Or let's use actual hashes instead of placeholders like 'aab' or 'cde', so it doesn't break again when we add hash validation in the model.

just changing the less cogs possible

That's what I'm suggesting. Move the extra fields from DIRECTORY_ENTRIES to a new CONTENT var.

ardumont updated this revision to Diff 6622.Fri, Sep 6, 11:29 AM

tests/utils: Fix various test data model issues failing validation

Because of the test data were wrongly structured.
The validation check introduced made it apparent.

Summary:

  • Fix wrong b'target_type' in revision dict input
  • Add missing snapshot 'target_type' key field
  • Add missing revision {type, committer_date, message}
  • Remove unnecessary fields from origins (lister, project)
  • Remove wrong extra directory entry fields

(+ Rebase to squash commits)

vlorentz accepted this revision.Fri, Sep 6, 11:35 AM
vlorentz added inline comments.
swh/indexer/tests/utils.py
560–561

Still not happy about this, but let's fix it later.

This revision is now accepted and ready to land.Fri, Sep 6, 11:35 AM
This revision was automatically updated to reflect the committed changes.
ardumont added inline comments.Fri, Sep 6, 11:42 AM
swh/indexer/tests/utils.py
560–561

neither am i but yeah, not right now ;)