Page MenuHomeSoftware Heritage

make codespell pass, fixing false-positives due to single-quoted strings
AbandonedPublic

Authored by zack on Sep 18 2019, 3:52 PM.

Details

Reviewers
ardumont
vlorentz
Group Reviewers
Reviewers
Test Plan

make check

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
bug/codespell
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 7856
Build 11304: tox-on-jenkinsJenkins
Build 11303: arc lint + arc unit

Event Timeline

ardumont added a subscriber: ardumont.

Same as D2000.
Looks good but the ci needs fixing.

This revision is now accepted and ready to land.Sep 19 2019, 9:45 AM
vlorentz added a subscriber: vlorentz.

Why these three? We use single quotes everywhere.

This revision now requires changes to proceed.Sep 19 2019, 10:21 AM

Why these three? We use single quotes everywhere.

from the diff title:

fixing [codespell] false-positives due to single-quoted strings

The "problem" I believe is that 'files' is interpreted as a trailing apostrophe. It's clearly a bug in codespell, but as it's neutral whether we use single or double, I think it's worth an exception to make a clean codespell run pass.

Indeed, I thought codespell was a linter instead of a spell-checker.

You can fix the false-positive by overriding codespell's regexp to ignore single quotes at the end of words:

find swh docs -name '*.py' -o -name '*.rst' | xargs -r codespell -r "[\\w\\-'’\`]*[\\w\\-’\`]"

(the default regexp is: https://github.com/codespell-project/codespell/blob/d7fa1e4/codespell_lib/_codespell.py#L29 )

find swh docs -name '*.py' -o -name '*.rst' | xargs -r codespell -r "[\\w\\-'’\`]*[\\w\\-’\`]"

> (the default regexp is: https://github.com/codespell-project/codespell/blob/d7fa1e4/codespell_lib/_codespell.py#L29 )

Nice, thanks for checking, this is a much better solution indeed. (As a nitpick, I'll go for "[\\w\\-'’\`]+(?=[^'])" instead, but the idea is the same.)

I'll push it to swh-environment and also upstream to codespell.

In D1999#46521, @zack wrote:

(As a nitpick, I'll go for "[\\w\\-'’\`]+(?=[^'])" instead, but the idea is the same.)

Nice indeed. You may also want to patch it so it doesn't detect the quote in r'foo', but I don't see a good way to do it (there may be a lot of characters in front of strings...)