Page MenuHomeSoftware Heritage

CffMapping: Ignores invalid yaml files
ClosedPublic

Authored by VickyMerzOwn on Jun 17 2022, 10:46 AM.

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 29934
Build 46797: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 46796: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8002 (id=28834)

Rebasing onto 62db0cb208...

First, rewinding head to replay your work on top of it...
Applying: Ignores invalid yaml files
Using index info to reconstruct a base tree...
M	swh/indexer/metadata_dictionary/cff.py
Falling back to patching base and 3-way merge...
Auto-merging swh/indexer/metadata_dictionary/cff.py
CONFLICT (content): Merge conflict in swh/indexer/metadata_dictionary/cff.py
Patch failed at 0001 Ignores invalid yaml files

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto 62db0cb208...

Already up to date.
Changes applied before test
commit 2064b713c5d10533d11f98105f568baf037838cf
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    Ignores invalid yaml files

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/260/ for more details.

ardumont added inline comments.
swh/indexer/metadata_dictionary/cff.py
29

Add a warning log to explicit that the yaml is invalid and that we skip it.

Logs warning of invalid yaml file

Build has FAILED

Patch application report for D8002 (id=28836)

Could not rebase; Attempt merge onto 62db0cb208...

Updating 62db0cb..b49dd61
Fast-forward
 swh/indexer/metadata_dictionary/cff.py | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)
Changes applied before test
commit b49dd61f0a1d5ee43c2441f1e8a01c516be85a0f
Merge: e91959e 62db0cb
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:31:19 2022 +0530

    Fix merge conflicts

commit e91959efd7739177601e9ec599b6dd701bfd71e6
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    Ignores invalid yaml files
    
    Adds warning log that yaml is invalid

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/261/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/261/console

swh/indexer/metadata_dictionary/cff.py
29

I disagree, there is nothing we can do about them, so no point in logging. We also do not warn about syntax errors in JSON files.

Logs the invalid-yaml-file-warning

Build has FAILED

Patch application report for D8002 (id=28837)

Rebasing onto 62db0cb208...

Current branch diff-target is up to date.
Changes applied before test
commit 93f8b4c9a78f6afdce4ea0345ae8cae9eeb945d3
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    Ignores invalid yaml files
    
    Adds warning log that yaml is invalid

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/262/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/262/console

swh/indexer/metadata_dictionary/cff.py
29

to have a tendency or a rough stats of how often that happens, isn't it worth it?
Either way, fine.

39

well, with some more info though ;)
But like i said, i won't push too much.

Adds test for skipping invalid yaml files

Build is green

Patch application report for D8002 (id=28845)

Could not rebase; Attempt merge onto 62db0cb208...

Updating 62db0cb..5cb2edf
Fast-forward
 swh/indexer/metadata_dictionary/cff.py | 15 +++++---
 swh/indexer/metadata_dictionary/npm.py | 67 ----------------------------------
 swh/indexer/tests/test_metadata.py     | 27 +++++++++++++-
 3 files changed, 36 insertions(+), 73 deletions(-)
Changes applied before test
commit 5cb2edf23d7b6aa078230d83ea79214fe69ee04d
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 22:22:40 2022 +0530

    Test for invalid yaml file

commit 93f8b4c9a78f6afdce4ea0345ae8cae9eeb945d3
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    Ignores invalid yaml files
    
    Adds warning log that yaml is invalid

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/263/ for more details.

Undo deletion of normalize_description

Build is green

Patch application report for D8002 (id=28846)

Could not rebase; Attempt merge onto 62db0cb208...

Updating 62db0cb..22787ea
Fast-forward
 swh/indexer/metadata_dictionary/cff.py | 15 ++++++++++-----
 swh/indexer/metadata_dictionary/npm.py |  3 +--
 swh/indexer/tests/test_metadata.py     | 25 +++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 7 deletions(-)
Changes applied before test
commit 22787ea2cf4efc49716c622f9ad19e21b5803de0
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 22:22:40 2022 +0530

    Test for invalid yaml file
    
    to find the diff
    
    Hypothesis falsification overcome

commit 93f8b4c9a78f6afdce4ea0345ae8cae9eeb945d3
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    Ignores invalid yaml files
    
    Adds warning log that yaml is invalid

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/264/ for more details.

  • Ignores invalid yaml files
  • Solves hypothesis error in T4333

Build has FAILED

Patch application report for D8002 (id=28847)

Rebasing onto 62db0cb208...

Current branch diff-target is up to date.
Changes applied before test
commit f817b1a6ae5266a23188c1f8e7ea80f5daf6a0c5
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 22:22:40 2022 +0530

    Solves hypothesis error in T4333

commit 93f8b4c9a78f6afdce4ea0345ae8cae9eeb945d3
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    Ignores invalid yaml files
    
    Adds warning log that yaml is invalid

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/265/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/265/console

Build is green

Patch application report for D8002 (id=28848)

Rebasing onto 14f8a8574c...

First, rewinding head to replay your work on top of it...
Applying: Ignores invalid yaml files
Applying: Solves hypothesis error in T4333
Changes applied before test
commit 75a038fee74e7eb454bc06c2a0f80d2dd379b5cc
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 22:22:40 2022 +0530

    Solves hypothesis error in T4333

commit b6254f9e0496e6a314e6e249a8cee4a11c1f0f42
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    Ignores invalid yaml files
    
    Adds warning log that yaml is invalid

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/266/ for more details.

The code looks good, though I have two inline comments below.

About the diff itself:

  1. Please squash the two commits
  2. Please mention the task you are solving in the diff description (see https://docs.softwareheritage.org/devel/contributing/git-style-guide.html )
  3. also reword the commit message to follow the Git style conventions (see the linked doc, too)
  4. when updating diffs, please describe your changes (eg. "rebase" when rebasing, "fix XXX" when fixing a bug that you introduced, "add missing commit", etc., instead of repeating
Ignores invalid yaml files
Solves hypothesis error in T4333

because this is meant for reviewers to understand what you changed without digging in the diff history.

swh/indexer/metadata_dictionary/npm.py
138–162

Please undo these changes for now, they are out of scope of this diff. Then, please submit another diff with only this change.

swh/indexer/tests/test_metadata.py
167–176

you can simplify this example, to only keep what matters to the test. It makes the test more readable. (Right now, I don't see what part of this file is the syntax error)

swh/indexer/tests/test_metadata.py
167–176

ok, I'll take the unnecessary part out. Would you like me to add a comment describing what is wrong in content?

Cleans commit history

  1. Updating D8002: Ignores invalid yaml files
  2. Solves T4276.
  3. Adds a try except statement to catch yaml parser exceptions.
  4. Tests using an invalid example which must return None.

Build is green

Patch application report for D8002 (id=28849)

Rebasing onto 14f8a8574c...

First, rewinding head to replay your work on top of it...
Applying: Ignores invalid yaml files
Changes applied before test
commit 84a0d6c6ac8776d46818e01d47d25e11c513ef99
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    Ignores invalid yaml files
    Tests that invalid yaml files are skipped

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/267/ for more details.

  1. Updating D8002: Ignores invalid yaml files
  2. Solves T4276.

you should write these in the commit message, not in the update

swh/indexer/metadata_dictionary/npm.py
138–139

you forgot to undo this change too

swh/indexer/tests/test_metadata.py
167–176

yes, please

Build is green

Patch application report for D8002 (id=28850)

Rebasing onto 14f8a8574c...

First, rewinding head to replay your work on top of it...
Applying: Ignores invalid yaml files
Changes applied before test
commit 57fded1b9d812aa61d601dac5ffb3c9886da5b85
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    Ignores invalid yaml files
    Tests that invalid yaml files are skipped

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/268/ for more details.

  1. Please mention the task you are solving in the diff description (see https://docs.softwareheritage.org/devel/contributing/git-style-guide.html )
  2. also reword the commit message to follow the Git style conventions (see the linked doc, too)

You forgot about these

  1. Please mention the task you are solving in the diff description (see https://docs.softwareheritage.org/devel/contributing/git-style-guide.html )

Do I repeat which task I am solving in every diff description?
If so, I suppose I am to put as the subject, what I am changing in the diff, eg. "rebase", "fix XXX" etc in the title of the diff. Is this right?

  1. also reword the commit message to follow the Git style conventions (see the linked doc, too)

So my commit message will be "Fixes T4276" only?
Wouldn't I need to mention that a test to verify if the fix works is also added in this commit?

  1. Please mention the task you are solving in the diff description (see https://docs.softwareheritage.org/devel/contributing/git-style-guide.html )

Do I repeat which task I am solving in every diff description?

Yes

If so, I suppose I am to put as the subject, what I am changing in the diff, eg. "rebase", "fix XXX" etc in the title of the diff. Is this right?

No, the title should describe the change, so it is readable without opening the diff or the task. "fix XXX" should only be a line within the description. "rebase" would never be in the title of a diff; I think you confuse diffs (eg. D8002 is a diff, created with arc diff --create or simply arc diff with no --update argument) with diff updates (which are a change to an existing diff, created with arc diff --update)

  1. also reword the commit message to follow the Git style conventions (see the linked doc, too)

So my commit message will be "Fixes T4276" only?

It should be "cff: Ignore invalid yaml files" because:

  1. imperative mood
  2. the "cff:" prefix makes it clearer what part of the code is changed
  3. "Ignore invalid yaml" instead of "Fix T4276", because it is meant to be read without jumping between web pages. Additionally, git histories usually outlive forges, so we should not rely on the forge to keep this type of information.

Wouldn't I need to mention that a test to verify if the fix works is also added in this commit?

It is implicit that every fix comes with a new test.

VickyMerzOwn retitled this revision from Ignores invalid yaml files to CffMapping: Ignores invalid yaml files.Jun 21 2022, 8:54 AM
VickyMerzOwn edited the summary of this revision. (Show Details)

Okay, got it. I have made these changes. Updated the diff title and is the summary of D8002 sufficient & correct now?

Also, I'll keep in mind the commit message imperative style along with diff update description styles.

Build is green

Patch application report for D8002 (id=28855)

Rebasing onto 14f8a8574c...

First, rewinding head to replay your work on top of it...
Applying: cff: Ignore invalid yaml files
Changes applied before test
commit 359f1a8d9a0e3e70436465f3c3aa1cd2f91f735b
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    cff: Ignore invalid yaml files

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/269/ for more details.

Tests cff mappings of empty yaml files
Tests cff mappings of lists in yaml files

Build is green

Patch application report for D8002 (id=28856)

Rebasing onto 14f8a8574c...

First, rewinding head to replay your work on top of it...
Applying: cff: Ignore invalid yaml files
Changes applied before test
commit 2a43d997b97f08a617723066e879752a8d80a2d7
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    cff: Ignore invalid yaml files

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/270/ for more details.

Build is green

Patch application report for D8002 (id=28857)

Rebasing onto 14f8a8574c...

Current branch diff-target is up to date.
Changes applied before test
commit 153bda12aa0d5cbcb55ac9b32539a79d2021ab60
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Fri Jun 17 14:14:32 2022 +0530

    cff: Ignore invalid yaml files

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/271/ for more details.

This revision is now accepted and ready to land.Jun 21 2022, 11:25 AM
This revision was automatically updated to reflect the committed changes.