Page MenuHomeSoftware Heritage

map_row will return list of RawExtrensicMetadata
ClosedPublic

Authored by TG1999 on Wed, Feb 10, 7:23 PM.

Details

Summary

Currently we are not returning metadata in the list while mapping a row, this update will also return metadata in this list and will also provide a feature to map row data with RawExtrensicMetadat

Added dictionary in every tuple of the list and also made a function for mapping of row data with RawExtrensicMetadata

Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

Diff Detail

Repository
rDMFCD ClearlyDefined metadata fetcher
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

What is the goal of this change?

You're telling us how you did it, but not what it is or why.

swh/clearlydefined/mapping_utils.py
95–98

Ok, I think I get it; you added a function to convert these Tuple[str, MetadataTargetType, Optional[Origin], Dict] you're passing everywhere into RawExtrinsicMetadata.

Instead, you should create RawExtrinsicMetadata directly instead of Tuple[str, MetadataTargetType, Optional[Origin], Dict].

Change return type of map_row

Currently we are not returning metadata in the list while mapping a row, this update will also return metadata in this list so we can map that metadata with metadata field of RawExtrensicMetadata and will also provide a feature to map row data with RawExtrensicMetadata.

Added dictionary in every tuple of the list and also made a function for mapping of row data with RawExtrensicMetadata

Build is green

Patch application report for D5061 (id=18059)

Rebasing onto 9940287285...

Current branch diff-target is up to date.
Changes applied before test
commit 9ea53d7d9997cf91156a918b1096cbd87962c245
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Thu Feb 11 17:30:26 2021 +0530

    Change return type of map_row
    
    Currently we are not returning metadata in the list while mapping a row, this update will also return metadata in this list so we can map that metadata with metadata field of RawExtrensicMetadata and will also provide a feature to map row data with RawExtrensicMetadata.
    
    Added dictionary in every tuple of the list and also made a function for mapping of row data with RawExtrensicMetadata
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

See https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/84/ for more details.

swh/clearlydefined/mapping_utils.py
332–335

again... Union[Optional[XXX], XXX] is the same as Optional[XXX]

Change return type of map_row

Currently map_row was returning a Tuple of status and list of tuples, this update will change the return type to Tuple of status and a Generator[RawExtrensicMetadata], this change was made to change the architecture of the current code and also to map metadata field of RawextrensicMetadata.

Changed return type of all map_* functions and fed them extra parameter date (that was required to map row data with RawExtrensicMetadata)

Build is green

Patch application report for D5061 (id=18066)

Rebasing onto 9940287285...

Current branch diff-target is up to date.
Changes applied before test
commit 51119f564954f35340743a7ed2d7fe8733ecb6a0
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Thu Feb 11 19:11:03 2021 +0530

    Change return type of map_row
    
    Currently map_row was returning a Tuple of status and list of tuples, this update will change the return type to Tuple of status and a Generator[RawExtrensicMetadata], this change
    was made to change the architecture of the current code and also to map metadata field of RawextrensicMetadata.
    
    Changed return type of all map_* functions and fed them extra parameter date (that was required to map row data with RawExtrensicMetadata)
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

See https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/85/ for more details.

swh/clearlydefined/mapping_utils.py
94–115

Instead of taking a list and returning an iterator, just take one element and return an other element.

Callers can take care of iterating.

160–161

What is the complete type of data?

Change return type of mapping row with metadata

Currently we were returning list of RawExtrensicMetadata, this update will return single object of RawExtrensicMetadata. This change was made since caller function will take care of making the list

Build has FAILED

Patch application report for D5061 (id=18082)

Could not rebase; Attempt merge onto 9940287285...

Updating 9940287..0255af0
Fast-forward
 swh/clearlydefined/mapping_utils.py                | 180 ++++++++++++------
 swh/clearlydefined/tests/data/clearlydefined.json  |   2 +-
 .../tests/data/clearlydefined_metadata.json        |   7 +
 .../tests/data/clearlydefined_true.json            |   7 -
 .../tests/data/licensee_metadata.json              |  11 ++
 .../tests/data/scancode_metadata.json              |  87 +++++++++
 swh/clearlydefined/tests/test_mapping_utils.py     | 201 +++++++++++++++------
 7 files changed, 382 insertions(+), 113 deletions(-)
 create mode 100644 swh/clearlydefined/tests/data/clearlydefined_metadata.json
 create mode 100644 swh/clearlydefined/tests/data/licensee_metadata.json
 create mode 100644 swh/clearlydefined/tests/data/scancode_metadata.json
Changes applied before test
commit 0255af0b874f382e4abe5411eab956ffd562b583
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Fri Feb 12 12:53:04 2021 +0530

    Change return type of mapping row with metadata
    
    Currently we were returning list of RawExtrensicMetadata, this update will return single object of RawExtrensicMetadata. This change was made since caller function will take care of making the list
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

commit 51119f564954f35340743a7ed2d7fe8733ecb6a0
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Thu Feb 11 19:11:03 2021 +0530

    Change return type of map_row
    
    Currently map_row was returning a Tuple of status and list of tuples, this update will change the return type to Tuple of status and a Generator[RawExtrensicMetadata], this change
    was made to change the architecture of the current code and also to map metadata field of RawextrensicMetadata.
    
    Changed return type of all map_* functions and fed them extra parameter date (that was required to map row data with RawExtrensicMetadata)
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

Link to build: https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/86/
See console output for more information: https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/86/console

Change return type of mapping row with metadata

Currently we were returning list of RawExtrensicMetadata, this update will return single object of RawExtrensicMetadata. This change was made since caller function will take care of making

Build has FAILED

Patch application report for D5061 (id=18083)

Rebasing onto 9940287285...

Current branch diff-target is up to date.
Changes applied before test
commit ec614209806b209df492bc9395c8f380b781a4bf
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Fri Feb 12 13:01:17 2021 +0530

    Change return type of mapping row with metadata
    
    Currently we were returning list of RawExtrensicMetadata, this update will return single object of RawExtrensicMetadata. This change was made since caller function will take care of making
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

Link to build: https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/87/
See console output for more information: https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/87/console

Change return type of mapping row with metadata

Currently we were returning list of RawExtrensicMetadata, this update will return single object of RawExtrensicMetadata. This change was made since caller function will take care of making

Build is green

Patch application report for D5061 (id=18084)

Rebasing onto 9940287285...

Current branch diff-target is up to date.
Changes applied before test
commit cbf07cf743aa48d5a1c4600710b9ad826df2d658
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Fri Feb 12 13:05:23 2021 +0530

    Change return type of mapping row with metadata
    
    Currently we were returning list of RawExtrensicMetadata, this update will return single object of RawExtrensicMetadata. This change was made since caller function will take care of making
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

See https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/88/ for more details.

Thanks, that's much better.

Working on structured RawExtrinsicMetadata objects (with named attributed) is much more readable than passing tuples (with anonymous members) around, don't you agree?


Your commit message is good in the sense that it explains what you are doing; but it can still be improved.

You could express the same things in a much more compact way.

Currently, your commit message is:

Change return type of mapping row with metadata

Currently we were returning list of RawExtrensicMetadata, this update will return single object of RawExtrensicMetadata. This change was made since caller function will take care of making

(looks like it's missing the end, by the way)

The first line doesn't tell much, so you can remove it.

The second sentence (Currently we were returning list of RawExtrensicMetadata, this update will return single object of RawExtrensicMetadata.) however, is pretty good; it almost explains exactly what you are doing.

First, let's fix it to match what you actually did: Currently we were returning list of tuples, this update will return single object of RawExtrinsicMetadata.

Now, you can make it shorter, by removing "superficial" words. Currently we were returning becomes We returned, and this update will return single object of becomes We will return a single. So now, this sentence is: We returned a list of tuples, we will return a single RawExtrinsicMetadata.

That's not very natural English, so we can use some prepositions instead of repeating "we return": "We return a single RawExtrinsicMetadata instead of a list of tuples`

Finally, as it's a good style to write in imperative form, let's drop the "We": Return a single RawExtrinsicMetadata instead of a list of tuples

And that's it. That's a very good sentence for the first line of a commit message, because it explains exactly what you did; and experienced readers can guess why you did it (ie. simplify the code by using a single object + make it more readable by using a structured objects instead of tuples).

Now, you can work on the rest of the commit to explain this.

swh/clearlydefined/mapping_utils.py
5–11

Please import in the order recommended by PEP 8

isort should do it automatically for you before you commit; did you skip it?

95

use four arguments instead of a 4-tuple.

110

data[2]'s type is Optional[Origin], so check for None instead of checking its type.

112–126

there is again an argument named data of type list, with no docstring.

it's impossible to know what it is when reading this. Please:

  1. rename it
  2. write the full type
  3. write a docstring to also explain what the function does.
116

clearlydefined isn't a deposit client; check out the definition of authority types: https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html#authorities

331

full type of row?

what about passing the elements of the row as arguments directly?

swh/clearlydefined/tests/data/clearlydefined.json
61 ↗(On Diff #18084)

why this change?

swh/clearlydefined/tests/data/clearlydefined_true.json
57–64 ↗(On Diff #18084)

why this change?

swh/clearlydefined/tests/data/licensee_metadata.json
1–12

make the tests read from licensee.json instead of duplicating this

swh/clearlydefined/tests/data/scancode_metadata.json
1–88

Make the tests read from scancode.json instead

This revision now requires changes to proceed.Fri, Feb 12, 10:00 AM
TG1999 marked 5 inline comments as done.

Change input type of map_row_data_with_metadata

Instead of passing a tuple pass 4 parameteres

Build has FAILED

Patch application report for D5061 (id=18104)

Rebasing onto 9940287285...

Current branch diff-target is up to date.
Changes applied before test
commit 264d475a74be774573bfdc126e4c4a91dafd82ab
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Sat Feb 13 18:06:12 2021 +0530

    Chage return type of map_row
    
    Return a list of RawExtrensicMetadata instead a list of tuples, to give a better structure for code
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

Link to build: https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/89/
See console output for more information: https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/89/console

swh/clearlydefined/mapping_utils.py
116

Can you suggest me, what should be done here ?

Resolve Flake 8 error

Remove extra whitespace

Build is green

Patch application report for D5061 (id=18105)

Rebasing onto 9940287285...

Current branch diff-target is up to date.
Changes applied before test
commit 52ff2250c1746d57ee6299971105988d1b22cd63
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Sat Feb 13 18:10:40 2021 +0530

    Chage return type of map_row
    
    Return a list of RawExtrensicMetadata instead a list of tuples, to give a better structure for code
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

See https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/90/ for more details.

Change type of MetadataAuthority

Build is green

Patch application report for D5061 (id=18106)

Rebasing onto 9940287285...

Current branch diff-target is up to date.
Changes applied before test
commit 3fb094d949e64d88c77f7de88975e9968372fb14
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Sun Feb 14 15:47:24 2021 +0530

    Chage return type of map_row
    
    Return a list of RawExtrensicMetadata instead a list of tuples, to give a better structure for code
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

See https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/91/ for more details.

Why the new test files? Can't you reuse the existing ones?

And please reword your commit message; the second line already explains everything, while the first one is very vague (ie. delete the first line)

swh/clearlydefined/mapping_utils.py
62

That's not detailed enough; JSON is a notation/language, not a format; there are thousands of formats based on JSON. Look at other parts of the codebase for examples.

112–143

This is unnecessarily complex; this function takes a mapping_status only to return it and sometimes set it to False`. Instead, you should let the caller do the boolean operation (which they already do anyway...)

187–188

type

211–212

type

This revision now requires changes to proceed.Mon, Feb 15, 10:49 AM
TG1999 added inline comments.
swh/clearlydefined/mapping_utils.py
112–143

I did not write that function initially, but since these lines of code were redundant in map_clearlydefined, map_scancode, map_licensee, so we decided to put this code in a function, so suggestions on this ?

swh/clearlydefined/mapping_utils.py
112–143

We discussed this already. But if you don't see how, leave it like this and I'll do it.

Add type of List[RawExtrensicMetadata] and change format of RawExtrensicMetadata

Build is green

Patch application report for D5061 (id=18107)

Rebasing onto 9940287285...

Current branch diff-target is up to date.
Changes applied before test
commit 0a95835f96f6de94c9461c6765d4db9fd1aa3c96
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Mon Feb 15 18:29:27 2021 +0530

    Change return type of map_row
    
    Return a list of RawExtrensicMetadata instead a list of tuples, to give a better structure for code
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

See https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/92/ for more details.

You forgot to reword the commit message (and this diff's title)

This revision now requires changes to proceed.Mon, Feb 15, 3:14 PM
TG1999 retitled this revision from Change return type of map_row to map_row will return list of RawExtrensicMetadata.Mon, Feb 15, 3:17 PM

Build is green

Patch application report for D5061 (id=18123)

Rebasing onto 9940287285...

Current branch diff-target is up to date.
Changes applied before test
commit a7c188f07ac9d6cc54dd03c197b06e50aa6a65a1
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Tue Feb 16 12:27:38 2021 +0530

    map_row will return list of RawExtrensicMetadata
    
    Currently map_row returns a complex tuple, that damages code readability, so instead of returning complex tuple it should return a named class RawExtrensicMetadata.
    
    Add different fields like file, format in map_row, map_scancode, map_clearlydefined, map_licensee that are required to map previous tuple data with RawExtrensicMetadata. Add docstrings and tests.
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

See https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/93/ for more details.

"map_row will return list of RawExtrensicMetadata" is not in the imperative mood

https://chris.beams.io/posts/git-commit/#imperative

This revision now requires changes to proceed.Tue, Feb 16, 9:18 AM

Build is green

Patch application report for D5061 (id=18124)

Rebasing onto 9940287285...

Current branch diff-target is up to date.
Changes applied before test
commit 32246c7026cc50f6df2b448a5122fd91b00fbd5e
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Tue Feb 16 13:51:52 2021 +0530

    map_row return a list of RawExtrensicMetadata
    
    Currently map_row returns a complex tuple, that damages code readability, so instead of returning complex tuple it should return a named class RawExtrensicMetadata.
    
    Add different fields like file, format in map_row, map_scancode, map_clearlydefined, map_licensee that are required to map previous tuple data with RawExtrensicMetadata. Add docstrings and tests.
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

See https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/94/ for more details.

Build is green

Patch application report for D5061 (id=18125)

Rebasing onto 9940287285...

Current branch diff-target is up to date.
Changes applied before test
commit f2c1b0bea04a2c4a9848893a27218b8d87ebe2f5
Author: Tushar Goel <tushar.goel.dav@gmail.com>
Date:   Tue Feb 16 14:08:09 2021 +0530

    Return list of RawExtrensicMetadata instead of tuple
    
    Currently map_row returns a complex tuple, that damages code readability, so instead of returning complex tuple it should return a named class RawExtrensicMetadata.
    
    Add different fields like file, format in map_row, map_scancode, map_clearlydefined, map_licensee that are required to map previous tuple data with RawExtrensicMetadata. Add docstrings and tests.
    
    Signed-off-by: Tushar Goel <tushar.goel.dav@gmail.com>

See https://jenkins.softwareheritage.org/job/DMFCD/job/tests-on-diff/95/ for more details.

This revision is now accepted and ready to land.Tue, Feb 16, 9:39 AM