Page MenuHomeSoftware Heritage

'db serve' option to start the API service
ClosedPublic

Authored by DanSeraf on Nov 22 2020, 4:19 PM.

Details

Summary

Serve the local sqlite database generated with the "db import" option.

Related to T2760

Diff Detail

Repository
rDTSCN Code scanner
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D4552 (id=16157)

Rebasing onto 521420e7d5...

Current branch diff-target is up to date.
Changes applied before test
commit 3f79698ed9ea772e96d59d2d6b9b0f6d0de6e097
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Sun Nov 22 16:05:19 2020 +0100

    'db serve' option to start the API service
    
    Serve the local sqlite database generated with the "db import" option.
    
    The API endpoint is compatible with the one used in the Web API [1].
    
    [1] https://archive.softwareheritage.org/api/1/known/doc/

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/80/ for more details.

zack requested changes to this revision.Nov 23 2020, 10:29 AM
zack added a subscriber: zack.

Thanks, looks generally good to me !
I've pointed out only a few minor things here and there.

swh/scanner/api.py
1–4 ↗(On Diff #16157)

I'm not a fan of api.py as a module name. What about backend.py or server.py?
It seems to me that "scanner backend" or "scanner server" would all be better names for this thing than "scanner API".

13–14 ↗(On Diff #16157)

better: "Backend for swh-scanner, implementing the /known endpoint of the Software Heritage Web API"

21 ↗(On Diff #16157)

please make 1000 a constant, so that it's easier to change (and possible make it configurable at runtime later)

26–30 ↗(On Diff #16157)

You're writing the dictionary twice here, which is not very efficient.
You can create it only once, already with the right values, e.g., using a dictionary comprehension expression.

28–29 ↗(On Diff #16157)

I'm worried it might be a performance issue to check the SWHIDs one by one.

Would it be possible to at least reuse the same cursor here, rather than creating one for each SWHID? (Please check that is OK in terms of DB-API 2.0 usage, as I'm not 100% sure.)

We can worry later about whether we should extend known() API so that it can lookup multiple SWHIDs at once.

swh/scanner/tests/test_api.py
11 ↗(On Diff #16157)

if you make this a constant in the backend module as I've suggested about, you can just use it here without having to redefine/duplicate its value

This revision now requires changes to proceed.Nov 23 2020, 10:29 AM
swh/scanner/api.py
1–4 ↗(On Diff #16157)

The "backend" name is fine for me!

28–29 ↗(On Diff #16157)

Yeah i think is possible to create one cursor for all the queries.

swh/scanner/tests/test_api.py
11 ↗(On Diff #16157)

This constant regards the chunk size while inserting known SWHIDs inside the local database in the db module and it is defined in the cli module since it could be passed as argument. Should i define it somewhere else?

  • changed module name
  • query the database with only one cursor
  • get SWHID known status directly when generating the response

Build is green

Patch application report for D4552 (id=16228)

Rebasing onto 502595f6f0...

First, rewinding head to replay your work on top of it...
Applying: 'db serve' option to start the API service
Using index info to reconstruct a base tree...
M	swh/scanner/cli.py
Falling back to patching base and 3-way merge...
Auto-merging swh/scanner/cli.py
CONFLICT (content): Merge conflict in swh/scanner/cli.py
Patch failed at 0001 'db serve' option to start the API service

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto 502595f6f0...

Already up to date.
Changes applied before test
commit 5b66b3ab99ed7ebe66b0118f586430d9f2f0b2e5
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Sun Nov 22 16:05:19 2020 +0100

    'db serve' option to start the API service
    
    Serve the local sqlite database generated with the "db import" option.
    
    The API endpoint is compatible with the one used in the Web API [1].
    
    [1] https://archive.softwareheritage.org/api/1/known/doc/

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/81/ for more details.

Build is green

Patch application report for D4552 (id=16232)

Rebasing onto 502595f6f0...

Current branch diff-target is up to date.
Changes applied before test
commit beaef0a19d3f1e3f5b65aa096762f399cdacf2b0
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Tue Nov 24 11:46:08 2020 +0100

    'db serve' option to start the API service
    
    Serve the local sqlite database generated with the "db import" option.
    
    The API endpoint is compatible with the one used in the Web API [1].
    
    [1] https://archive.softwareheritage.org/api/1/known/doc/

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/82/ for more details.

I'm accepting this, but I've also noted down a bunch of minor issues in the code. Please fix them before landing this.

swh/scanner/backend.py
25

oops: 1000 is still in the error message :) use an f-string with {LIMIT}

swh/scanner/cli.py
200

unneeded line, "host" will be the default due to "--host"

209

ditto, unneeded

211

please make this a constant too, like BACKEND_DEFAULT_PORT

swh/scanner/tests/test_backend.py
11

this should be removed in favor of from swh.scanner.backend import LIMIT above

This revision is now accepted and ready to land.Nov 24 2020, 1:38 PM

Build is green

Patch application report for D4552 (id=16254)

Rebasing onto 502595f6f0...

Current branch diff-target is up to date.
Changes applied before test
commit 09c28d60f1adab988e314a4ce873211aaa59f605
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Tue Nov 24 11:46:08 2020 +0100

    'db serve' option to start the API service
    
    Serve the local sqlite database generated with the "db import" option.
    
    The API endpoint is compatible with the one used in the Web API [1].
    
    [1] https://archive.softwareheritage.org/api/1/known/doc/

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/83/ for more details.