Rebase on master
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 19 2019
Rebase and update
- Change index storage mechanism
Best solution is tox -- -k test_name
Jul 16 2019
- Add break to prevent multiple yields
- Use all hashes in a content
Jul 13 2019
- Remove dependency
- Modify in_memory content_add to add skipped_content
I did not find any mechanism in db.py that is actually storing skipped_content. db.py line 51, is passed without implementation.
In db.py line 128, the query does not compare blake2s256 despite content_hash_keys = ['sha1', 'sha1_git', 'sha256', 'blake2s256']. Does this mean that skipped content will never be hashed with blake2s256?
I have a concern here, storage.py line 120. The function self.content_missing can throw an exception in case of a hash collision. Shouldn't line 120 be in a try except block to catch that error and ignore that particular content?
Jul 11 2019
Use db_transaction annotation
Jul 10 2019
small edit and rebase
Jul 9 2019
I went through all the tests in test_storage.py. It appears that only content_fossology_license_get needs to be refactored. All other storage methods return a dictionary or a list of dictionaries, where each dictionary has multiple keys.
Jul 7 2019
I am familiar with the web APIs and I went through the discussion in T782. When you say output a single dictionary, I believe you mean something like this
{ sha1: [ {tool: TOOL, licenses: [licences]}, {tool: TOOL, licenses: [licences]} ],
Jul 6 2019
This logic is similar to one being used in db.skipped_content_missing. However the current implementation in_memory._content_add does not populate _skipped_contents and _skipped_content_indexes.
Jul 2 2019
After re-reading the documentation, I realized that the configuration files given in swh-indexer is actually used by swh-scheduler and swh-storage, indicating that these two modules are at the root of the dependency tree.
I looked at how the docs look after being built, for e.g. take swh-indexer at https://docs.softwareheritage.org/devel/swh-indexer/dev-info.html. It seems like the configuration information along with instructions to run and test it are best suited to this page. Have you considered adding comments about configuration parameters in this page itself, rather than making a top level file, because because only someone hacking on swh-indexer would be interested in the configuration.
I can begin working on it, once I understand what is required. Is my interpretation of the task correct?
Jul 1 2019
Jun 29 2019
This is related to T1388.
Jun 28 2019
Rebase from master
To expand on this further, I think the pre-push hook scripts you have configured only work when using git in a terminal. I am using vs code and curiously, I used the gui to push changes. I believe the scripts are not able to check such a situation and I can commit changes without review. This is probably a security flaw, that should be considered seriously. @zack
It seems like my revision is already in origin master. There are no changes to push. Should I close this revision.
Jun 27 2019
Aren't all doc strings rendered at some place with documentation for packages, sub modules the functions they contain?
While formatting the documents I realized that whoever was writing was try to keep line lengths short but they followed an arbitrary line length which was longer than 80 character. Since # noqa was applied these formatting errors did not pop up. I think a lot of this can be resolved if there can be a guideline to add a vertical ruler to the editor at 80 chars.
Fix more docstrings
Reformat docstring wherever possible
Reformat docstring for max line length
I'll close this revision.
I get it now, that's an ingenious way of keeping documentation up to date. However if this is evaluated why is adding whitespace changing the output, the dictionary and list should still have valid items.
Then swh-indexer does not require any changes I will close this diff. Wow, I never new docstrings could be evaluated. Why is this required though?
I will close this diff since no changes required for swh-deposit. With complex cases like this there is no way this process can be automated.
I followed swh-docker-dev documentation to host the setup locally. But it only hosts the web portal, I couldn't access locally hosted documentation. How can I see the effect my changes are making?
It is odd that py3 test cases are failing although I have only changed to formatting of the docstrings.
Jun 26 2019
what is the expected formatting for snippets like these
the 80 character mark is -----------------------------------------------------------------| @browse_route(r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/', # noqa r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/(?P<path>.+)/', # noqa r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/directory/', # noqa r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/directory/(?P<path>.+)/', # noqa r'origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/', # noqa r'origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/(?P<path>.+)/', # noqa r'origin/(?P<origin_url>.+)/directory/', # noqa r'origin/(?P<origin_url>.+)/directory/(?P<path>.+)/', # noqa view_name='browse-origin-directory') def origin_directory_browse(request, origin_url, origin_type=None, timestamp=None, path=None): """Django view for browsing the content of a directory associated to an origin for a given visit.
"""Django view that produces an HTML display of a content identified by its hash value.
Jun 25 2019
https://docs.softwareheritage.org/devel/apidoc/swh.model.html#module-swh.model.identifiers
This explains some of the fields associated with dates, timestamps and offset.
This task is completed it can be closed.
Merge with master
Ready to land made all the required changes.
Remove tox dependency
Remove + symbol from end of line
Add assert_called_once_with to test case
Fix test case. Silly mistakes are costly.
Jun 24 2019
While I was adding comments to all the tables in the db, I experimented a bit with pgdump.
Some databases could benefit from some backups without the overhead of having point in time recovery set up for them
If I understand correctly, it means that recovery from all previous time stamps is not a concern here. In such a case, is a chron job running pgdump at regular intervals feasible?
Jun 23 2019
- Change key reference from origin_id to id
Jun 22 2019
with patch('swh.web.common.service.idx_storage') as mock_idx_storage, \ patch('swh.web.common.service.storage') as mock_storage:
Adding backslash solves it
with patch('swh.web.common.service.idx_storage') as mock_idx_storage, patch('swh.web.common.service.storage') as mock_storage:
giving sytax error on mock_idx_storage