Page MenuHomeSoftware Heritage

Add a search feature to snapshot branches view
Closed, ResolvedPublic

Description

It is now possible to filter snapshot branches according to a substring contained in their names at swh-storage level.

swh-web added support for that filtering feature by adding a branch_name_include_substring parameter to
the swh.web.common.archive.lookup_snapshot function.

This calls for adding a search form in the branches view of swh-web in order to search for branches of interest.

This could be implemented by adding a new branch_name_include query parameter to the branches view and by
adapting the swh.web.browse.snapshot_context.browse_snapshot_branches implementation.

Event Timeline

anlambert triaged this task as Normal priority.Mar 18 2021, 7:45 PM
anlambert created this task.

Hey, can I take this task up next?

@anlambert I have a few questions regarding this task.
1: Should we add a filter UI component in the page or just a query param is enough?
2: If we have to add a UI component, do we have a design or example for that? (maybe a generic one for both branches and Releases, the "search branches" widget in github looks nice)
3: What about the API /snapshot/<id>, should we add the possibility of this filter there as well?
4: The branches seem to be randomly sorted over Date. Is that for a reason or something we should address?

2: If we have to add a UI component, do we have a design or example for that? (maybe a generic one for both branches and Releases, the "search branches" widget in github looks nice)

I found an example for the search widget here https://archive.softwareheritage.org/save/#requests
I will use this

  1. Should we add a filter UI component in the page or just a query param is enough?

I was thinking of adding a filter UI component above branches and releases list view.
When its input changes, we should redirect to the same URL by adding a query parameter indicating
that branches filtering must be performed.

2: If we have to add a UI component, do we have a design or example for that?

You can inspire from the origin search form located at the top right of the Web UI

3: What about the API /snapshot/<id>, should we add the possibility of this filter there as well?

Yes, we could also add a new query parameter to that endpoint to perform branches filtering.

4: The branches seem to be randomly sorted over Date. Is that for a reason or something we should address?

Currently, branches are sorted by their names. If we want to add other ordering (commit date for instance),
this should be handled at the swh-storage level but that's out of scope for that task.

@anlambert The server code seems to have an issue when branches are filtered by name.
"swh.web.common.archive.lookup_snapshot" function is raising an error if the number of searched branches is 0.
raise
This will eventually cause a 404 error when the user search for a non existing branch name in the UI. I don't think this is the intended behavior.

It makes more sense if we can show either a "no branch with similar name exits" message or a page with just an empty list.
I can make this happen by adding some ugly code in "swh.web.snapshot_context.browse_snapshot_branches" function. (By adding a try catch around the lookup_snapshot call and mocking an empty iter as response)

The ideal fix would be to refactor the code to move the entire presentation logic to the template engine. (We already have a check in the template to show "The list of branches is empty" message in case the branch list is empty. But, i think this case will never occur with the current server code)
message

Otherwise we may have to go with the JS data tables idea.

This will eventually cause a 404 error when the user search for a non existing branch name in the UI. I don't think this is the intended behavior.

@jayeshv You should catch the NotFoundExc when the swh.web.common.archive.lookup_snapshot function is called in the browse_snapshot_branches one. In the exception handler code, you can then set the displayed_branches variable to an empty list and discard the processing when branches have been found.

The django template for branches list view already handles empty branches list. Nevertheless, error message should be improved to indicate no branches with given keyword exist.

More answers after further reading and analysis.

"swh.web.common.archive.lookup_snapshot" function is raising an error if the number of searched branches is 0.

Maybe there is room for improvements here, we could check if the snapshot exists first using storage.snapshot_missing method
(need to be wrapped in swh.web.common.archive module) and only raises an exception if the snapshot does not exist.
Empty branches list after a search should be allowed to be returned.

It makes more sense if we can show either a "no branch with similar name exits" message or a page with just an empty list.
I can make this happen by adding some ugly code in "swh.web.snapshot_context.browse_snapshot_branches" function. (By adding a try catch around the lookup_snapshot call and mocking an empty iter as response)

I would have done it that way also, see previous comment.

Otherwise we may have to go with the JS data tables idea.

There is no need to use JavaScript here from my point of view.

The ideal fix would be to refactor the code to move the entire presentation logic to the template engine. (We already have a check in the template to show "The list of branches is empty" message in case the branch list is empty. But, i think this case will never occur with the current server code)

Unfortunately, we have examples of origins with empty snapshot in the archive so that code can get executed.

There is no need to use JavaScript here from my point of view.

I mean using Datatables would be nice to allow sorting the branches list by columns but currently no such sorting is implemented on the backend side
and some snapshots can be huge (see v8 js engine for instance) so we will hit performance issue.

We must also allow to browse the archive content without JavaScript enabled so using Datatables here would be a problem.

@anlambert Thank you. I will try to do some refactoring here.

@anlambert Thank you. I will try to do some refactoring here.

@jayeshv I think we should go for that part first

Maybe there is room for improvements here, we could check if the snapshot exists first using storage.snapshot_missing method
(need to be wrapped in swh.web.common.archive module) and only raises an exception if the snapshot does not exist.
Empty branches list after a search should be allowed to be returned.

This will simplify branches view implementation. You should create a diff to handle it.

This will simplify branches view implementation. You should create a diff to handle it.

Yes. Let me check it's side effects as well. The 'archive.lookup_snapshot' is called from multiple places.

This will simplify branches view implementation. You should create a diff to handle it.

Yes. Let me check it's side effects as well. The 'archive.lookup_snapshot' is called from multiple places.

Web application Python source code is highly covered by tests so you should quickly see if there is side effects with such a change.