Page MenuHomeSoftware Heritage

Fix tox execution and move dataset folder

Authored by anlambert on Oct 24 2019, 3:37 PM.



CI builds for swh-graph were failing since a couple of weeks, likely due to some recent changes in upstream pytest.

That diff fixes pytest invocation in tox.ini.

The tests/dataset folder is also moved in swh/graph/tests as it is required to run the tests.
Previously it was installed in <python-site-packages>/tests which is not a proper install location.

Related D2100 (not fixed either)

Diff Detail

rDGRPH Graph service
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

anlambert created this revision.Oct 24 2019, 3:37 PM
anlambert added a comment.EditedOct 24 2019, 3:42 PM

Hum, arc patch failed ...

Surely related to this warning message when I created the diff:

arc diff origin/master
No lint engine configured for this project.
Running unit tests...
No unit test engine is configured for this project.
Invalid Content Encoding (Non-UTF8)
This diff includes files which are not valid UTF-8 (they contain invalid byte
sequences). You can either stop this workflow and fix these files, or
continue. If you continue, these files will be marked as binary.

You can learn more about how Phabricator handles character encodings (and how
to configure encoding settings and detect and correct encoding problems) by
reading 'User Guide: UTF-8 and Character Encoding' in the Phabricator


    Do you want to mark these files as binary and continue? [Y/n] Y

Looks like the issue has already been reported to Phabricator ( but has not been fixed yet.

anlambert abandoned this revision.Oct 24 2019, 3:50 PM

Let's try to submit the diff again to avoid the arc patch failing.

anlambert reclaimed this revision.Oct 24 2019, 4:42 PM

Reopening it as I can not figure how to workaround the arcanist issue. It looks like some files are not detected as binaries so arcanist expects them to be UTF-8 encoded.

Anyway, even if the Jenkins job fails due to arc patch failing, that diff effectively fixes tests execution.

anlambert edited the summary of this revision. (Show Details)Oct 24 2019, 4:50 PM
ardumont edited the summary of this revision. (Show Details)Oct 25 2019, 9:05 AM
ardumont edited the summary of this revision. (Show Details)

Beware that dataset is also used by some java code.
And as far as i understood it, its generated by a docker container.
So if you move this dataset around, you might either break it or stop some form of synchronization ;)

I hinted D2100 because i tried to unstuck it and explain some trail of thoughts there.
To be unstuck i think a discussion with @seirl (or @zack) is needed.


@ardumont, whoopsie I did not see you already created a diff on the subject.

You are right about moving the dataset folder, I missed some paths to fix, notably here.

I will update that diff accordingly.

anlambert updated this revision to Diff 7428.Oct 25 2019, 11:10 AM

Update: Fix dataset folder path in

anlambert updated this revision to Diff 7615.Nov 4 2019, 11:58 AM

Rebase to master, fix example path in and ensure to use Pathlib to manipulate paths

This revision was not accepted when it landed; it landed in state Needs Review.Nov 4 2019, 12:06 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.