Details
- Reviewers
douardda - Group Reviewers
Reviewers - Maniphest Tasks
- T1529: Efficient reindex when updating a metadata mapping
- Commits
- rDCIDXaf5a0141ec52: Add subcommand to list mappings.
rDCIDX1e1463895545: Add a CLI tool to reindex origins based on mapping used.
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Branch
- scheduling-cli
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 4364 Build 5770: tox-on-jenkins Jenkins Build 5769: arc lint + arc unit
Event Timeline
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/442/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/442/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/446/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/446/console
Please add an entry_point in the setup.py for this cli.
swh/indexer/cli.py | ||
---|---|---|
24–46 | It's weird to be able to load a config file which content is not used anywhere, so it seems. I would expect every config-like option of the schedule group to be a config entry read from the config file, with possible cli voerload via these options. Also, the --no-dry-run flag value is useless IMHO. Just keep --dry-run with is_flag=True. | |
61 | What's the point of this limit value? Looks like a hard max limit of the number of origins to reindex at once, so it should at least be mentioned in the command's docstring. However I'm wondering, won't this value prevent some origins to ever be reindexable? I mean if I run this command once and get hit by this limit, how do I know I've been so? How can I 'resume' the reindexation from there? | |
79–80 | What are the expected values for this option? | |
81 | What other task-type can be used here? Can I create say an inconsistent 'swh-origin-git-update' task using this cli tool? | |
87 | Why is this command limited to already indexed origins only? |
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/448/ for more details.
swh/indexer/cli.py | ||
---|---|---|
24–46 |
Will do
I agree; I did that for uniformity with swh.scheduler. | |
61 | That's a per-request limit, but this function keeps making requests for more origins until it exhausts them all (hence the while loop, especially start = origins[-1]+1) | |
81 | You can. As indexer_origin_metadata is a config in swh.scheduler's db, it makes sense to have it configurable here too. | |
87 | Origins not already indexed are out of scope of this diff, and will be handled as part of T1528. |
- Bump required swh.scheduler version.
- Honor --dry-run.
- Use a config file for scheduler/idx_storage/storage.
- Better doc for --mappings.
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/449/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/450/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/451/ for more details.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/455/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/455/console
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/457/ for more details.