Page MenuHomeSoftware Heritage

Add pre-load metadata checks
ClosedPublic

Authored by moranegg on Dec 1 2017, 4:28 PM.

Diff Detail

Repository
rDDEP Push deposit
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Updating D274: Added checks on metadata of a deposit before load

Reviewers: Ardumont

As i abide by what i say, i have taken a look, but i'll take another look on monday.

swh/deposit/api/private/deposit_check.py
66–80

why a sublist in the list?

This seems like an option between 2, is that it?

79

I must say, i don't quite get what that does...

swh/deposit/tests/api/test_deposit_check.py
97

To be exhaustive, we should also have a test which fails the metadata check... or is the actual deposit_ko test enough?

Thanks for the feedback..

swh/deposit/api/private/deposit_check.py
66–80

yes it is, because with codemeta it's name and with atom it's title, but if we are forcing in codemeta we can keep only name
don't know what's the better approach

79

we have 4 must metadata, if one of those isn't found it should return false
so I can return true only if I have 4 founds
during the search let's say, we don't have a url, I don't want to continue the search so I break and return false

but it can be shorter without the last if, I don't know which is more comprehensible

swh/deposit/tests/api/test_deposit_check.py
97

yes we should, let's say a deposit with metadata but not all 4 must metadata
i'll do that today

swh/deposit/api/private/deposit_check.py
79

Ok, yes, sleeping on it + your explanation, i understand your intent.

The imbricated loops and the breaks within those made my brain snap.

This should be extracted as a function.
That will be simpler to test (without all the deposit setup).
The class then simply calls that function (which is tested on the side and we don't have to deal with all cases from the deposit standpoint, 1 case will be enough).

Here are the different scenario possible i saw:

import unittest

# that's the function exctrated and dubbed check_metadata
from swh.deposit.api.private.deposit_check import check_metadata

class CheckMetadata(unittest.TestCase):
    @istest
    def check_metadata_ok(self):
        actual_check = check_metadata({
            'url': 'something',
            'external_identifier': 'something-else',
            'name': 'foo',
            'author': 'someone',
        })

        self.assertTrue(actual_check)

    @istest
    def check_metadata_ok2(self):
        actual_check = check_metadata({
            'url': 'something',
            'external_identifier': 'something-else',
            'title': 'bar',
            'author': 'someone',
        })

        self.assertTrue(actual_check)

    @istest
    def check_metadata_ko(self):
        actual_check = check_metadata({
            'url': 'something',
            'external_identifier': 'something-else',
            'author': 'someone',
        })

        self.assertFalse(actual_check)

    @istest
    def check_metadata_ko2(self):
        actual_check = check_metadata({
            'url': 'something',
            'external_identifier': 'something-else',
            'title': 'foobar',
        })

        self.assertFalse(actual_check)
swh/deposit/api/private/deposit_check.py
79

s/exctrated/extracted/

107

Beware here, we squash the previous check's result on archive(s).
That means if the metadata are ok and the archives are not, we will pass the checks nonetheless.

swh/deposit/tests/api/test_deposit_check.py
97

Now, this should be good to go as is with the proposed way of defining the check_metadata method as function :)

As a part of my monday morning useless bikeshedding, I'd like to note that the title of this diff isn't at the imperative tense so it violates https://wiki.softwareheritage.org/index.php?title=Git_style_guide and the inconsistency makes my delicate eyes hurt.

ardumont retitled this revision from Added checks on metadata of a deposit before load to Add pre-load metadata checks.Dec 4 2017, 1:35 PM

As a part of my monday morning useless bikeshedding, I'd like to note that the title of this diff isn't at the imperative tense so it violates https://wiki.softwareheritage.org/index.php?title=Git_style_guide and the inconsistency makes my delicate eyes hurt.

diff name updated

  • Added tests for metadata check
This revision was automatically updated to reflect the committed changes.
swh/deposit/api/private/deposit_check.py
79

Why not something like this? :-)

def _check_metadata(self, metadata):
    required_fields = (('url',),
                ('external_identifier',),
                ('name', 'title'),
                ('author',)
    )
    return all(any(field in metadata for field in possible_names)
               for possible_names in required_fields)
swh/deposit/api/private/deposit_check.py
79

More like:

def _check_metadata(self, metadata):
    required_fields = (
                ('url',),
                ('external_identifier',),
                ('name', 'title'),
                ('author',)
    )
    return all(any(any(name in field for field in metadata) for name in possible_names)
               for possible_names in required_fields)
swh/deposit/api/private/deposit_check.py
79
# Even better
return all(any(name in field for field in metadata for name in possible_names)
           for possible_names in required_fields)

# If you just want to mess with people
return all(any(name in field
               for field, name in itertools.product(metadata, possible_names))
           for possible_names in required_fields)