Page MenuHomeSoftware Heritage
Feed Advanced Search

Jul 19 2019

twitu updated the diff for D1693: Get skipped content that are missing data.

Rebase on master

Jul 19 2019, 5:37 PM
twitu updated the diff for D1693: Get skipped content that are missing data.

Rebase and update

Jul 19 2019, 5:30 PM
twitu updated the diff for D1693: Get skipped content that are missing data.
  • Change index storage mechanism
Jul 19 2019, 5:20 PM
twitu added a comment to P447 Run a single test file.

Best solution is tox -- -k test_name

Jul 19 2019, 5:07 PM

Jul 16 2019

twitu updated the diff for D1693: Get skipped content that are missing data.
  • Add break to prevent multiple yields
Jul 16 2019, 8:22 PM
twitu updated the diff for D1693: Get skipped content that are missing data.
  • Use all hashes in a content
Jul 16 2019, 6:51 PM

Jul 13 2019

twitu updated the diff for D1693: Get skipped content that are missing data.
  • Remove dependency
Jul 13 2019, 9:21 AM
twitu updated the diff for D1693: Get skipped content that are missing data.
  • Modify in_memory content_add to add skipped_content
Jul 13 2019, 9:21 AM
twitu added a comment to D1693: Get skipped content that are missing data.

I did not find any mechanism in db.py that is actually storing skipped_content. db.py line 51, is passed without implementation.

Jul 13 2019, 9:15 AM
twitu added a comment to D1693: Get skipped content that are missing data.

In db.py line 128, the query does not compare blake2s256 despite content_hash_keys = ['sha1', 'sha1_git', 'sha256', 'blake2s256']. Does this mean that skipped content will never be hashed with blake2s256?

Jul 13 2019, 8:43 AM
twitu added a comment to D1693: Get skipped content that are missing data.

I have a concern here, storage.py line 120. The function self.content_missing can throw an exception in case of a hash collision. Shouldn't line 120 be in a try except block to catch that error and ignore that particular content?

Jul 13 2019, 6:25 AM

Jul 11 2019

twitu updated the diff for D1720: Modify API output and test.
  • Modify output and test
  • In reference to T1433
  • In reference to T1433
  • Use db_transaction annotation
Jul 11 2019, 8:24 PM
twitu updated the diff for D1720: Modify API output and test.

Use db_transaction annotation

Jul 11 2019, 8:12 PM

Jul 10 2019

twitu created P463 tox test stack trace for swh-indexer in the S1 Public space.
Jul 10 2019, 6:11 PM
twitu retitled D1720: Modify API output and test from Modify output and test to Modify API output and test.
Jul 10 2019, 5:39 PM
twitu updated the diff for D1720: Modify API output and test.

small edit and rebase

Jul 10 2019, 5:36 PM
Herald added a reviewer for D1720: Modify API output and test: Reviewers.
Jul 10 2019, 5:31 PM

Jul 9 2019

twitu committed rDSTOC2ead4ce360ba: Added comments for all tables and columns (authored by twitu).
Added comments for all tables and columns
Jul 9 2019, 3:07 PM
twitu added a comment to T1433: Refactor output of indexer storage's `get` methods..

I went through all the tests in test_storage.py. It appears that only content_fossology_license_get needs to be refactored. All other storage methods return a dictionary or a list of dictionaries, where each dictionary has multiple keys.

Jul 9 2019, 5:28 AM · Easy hack, Indexer

Jul 7 2019

twitu added a comment to T1433: Refactor output of indexer storage's `get` methods..

I am familiar with the web APIs and I went through the discussion in T782. When you say output a single dictionary, I believe you mean something like this

{
  sha1: [
    {tool: TOOL, licenses: [licences]},
    {tool: TOOL, licenses: [licences]}
  ],
Jul 7 2019, 4:23 PM · Easy hack, Indexer

Jul 6 2019

twitu added a comment to D1693: Get skipped content that are missing data.

This logic is similar to one being used in db.skipped_content_missing. However the current implementation in_memory._content_add does not populate _skipped_contents and _skipped_content_indexes.

Jul 6 2019, 9:30 AM
Herald added a reviewer for D1693: Get skipped content that are missing data: Reviewers.
Jul 6 2019, 9:25 AM

Jul 2 2019

twitu added a comment to T1758: consistently document the configuration option of each module.

After re-reading the documentation, I realized that the configuration files given in swh-indexer is actually used by swh-scheduler and swh-storage, indicating that these two modules are at the root of the dependency tree.

Jul 2 2019, 7:36 PM · Easy hack, Documentation
twitu added a comment to T1758: consistently document the configuration option of each module.

I looked at how the docs look after being built, for e.g. take swh-indexer at https://docs.softwareheritage.org/devel/swh-indexer/dev-info.html. It seems like the configuration information along with instructions to run and test it are best suited to this page. Have you considered adding comments about configuration parameters in this page itself, rather than making a top level file, because because only someone hacking on swh-indexer would be interested in the configuration.

Jul 2 2019, 4:45 PM · Easy hack, Documentation
twitu added a comment to T1758: consistently document the configuration option of each module.

I can begin working on it, once I understand what is required. Is my interpretation of the task correct?

Jul 2 2019, 12:55 PM · Easy hack, Documentation
twitu closed T1864: Inkscape is not mentioned as dependency for building swh-docs as Resolved.
Jul 2 2019, 4:51 AM · Documentation
twitu committed rDDOC20790ca8a5b5: Add inkscape to required tools in README (authored by twitu).
Add inkscape to required tools in README
Jul 2 2019, 4:50 AM
twitu closed D1673: Add inkscape to required tools in README.
Jul 2 2019, 4:50 AM

Jul 1 2019

Herald added a reviewer for D1673: Add inkscape to required tools in README: Reviewers.
Jul 1 2019, 6:39 PM

Jun 29 2019

twitu added a comment to T1758: consistently document the configuration option of each module.

This is related to T1388.

Jun 29 2019, 2:41 PM · Easy hack, Documentation
twitu created T1864: Inkscape is not mentioned as dependency for building swh-docs in the S1 Public space.
Jun 29 2019, 2:36 PM · Documentation

Jun 28 2019

twitu added a reverting change for rDDEP216d0f74d8c3: Reformat docstrings for max line length: rDDEP4c4324788ae0: Revert "Reformat docstrings for max line length".
Jun 28 2019, 3:04 PM
twitu committed rDDEP4c4324788ae0: Revert "Reformat docstrings for max line length" (authored by twitu).
Revert "Reformat docstrings for max line length"
Jun 28 2019, 3:04 PM
twitu closed D1658: Revert "Reformat docstrings for max line length".
Jun 28 2019, 3:04 PM
twitu updated the diff for D1658: Revert "Reformat docstrings for max line length".

Rebase from master

Jun 28 2019, 3:03 PM
twitu updated subscribers of D1658: Revert "Reformat docstrings for max line length".

To expand on this further, I think the pre-push hook scripts you have configured only work when using git in a terminal. I am using vs code and curiously, I used the gui to push changes. I believe the scripts are not able to check such a situation and I can commit changes without review. This is probably a security flaw, that should be considered seriously. @zack

Jun 28 2019, 5:47 AM
Herald added a reviewer for D1658: Revert "Reformat docstrings for max line length": Reviewers.
Jun 28 2019, 5:35 AM
twitu added a reverting change for rDDEP216d0f74d8c3: Reformat docstrings for max line length: D1658: Revert "Reformat docstrings for max line length".
Jun 28 2019, 5:35 AM
twitu closed T1836: Reformat docstrings that exceed 80 columns as Resolved.
Jun 28 2019, 5:25 AM · Documentation, Easy hack
twitu committed rDWAPPSb2555fabb0fc: Reformatted docstrings wherever possible (authored by twitu).
Reformatted docstrings wherever possible
Jun 28 2019, 5:24 AM
twitu closed D1650: Reformat docstring in utils.py.
Jun 28 2019, 5:24 AM
twitu committed rDDEP216d0f74d8c3: Reformat docstrings for max line length (authored by twitu).
Reformat docstrings for max line length
Jun 28 2019, 5:23 AM
twitu added a comment to D1650: Reformat docstring in utils.py.

It seems like my revision is already in origin master. There are no changes to push. Should I close this revision.

Jun 28 2019, 4:54 AM

Jun 27 2019

twitu updated the diff for D1650: Reformat docstring in utils.py.

Rebased master

Jun 27 2019, 8:16 PM
twitu added a comment to D1650: Reformat docstring in utils.py.

Aren't all doc strings rendered at some place with documentation for packages, sub modules the functions they contain?

Jun 27 2019, 7:59 PM
twitu added a comment to D1650: Reformat docstring in utils.py.

While formatting the documents I realized that whoever was writing was try to keep line lengths short but they followed an arbitrary line length which was longer than 80 character. Since # noqa was applied these formatting errors did not pop up. I think a lot of this can be resolved if there can be a guideline to add a vertical ruler to the editor at 80 chars.

Jun 27 2019, 7:51 PM
twitu updated the diff for D1650: Reformat docstring in utils.py.

Fix more docstrings

Jun 27 2019, 7:50 PM
twitu updated the task description for T1836: Reformat docstrings that exceed 80 columns.
Jun 27 2019, 7:40 PM · Documentation, Easy hack
twitu committed rDLDSVN3667f7165353: Remove unnecessary noqa (authored by twitu).
Remove unnecessary noqa
Jun 27 2019, 7:24 PM
twitu closed D1655: Remove unnecessary noqa.
Jun 27 2019, 7:24 PM
twitu updated the diff for D1650: Reformat docstring in utils.py.

Reformat docstring wherever possible

Jun 27 2019, 7:24 PM
twitu committed rDMODdde39f51c0fa: Reformat docstring for max line length (authored by twitu).
Reformat docstring for max line length
Jun 27 2019, 6:48 PM
twitu closed D1649: Reformat docstring for max line length.
Jun 27 2019, 6:48 PM
Herald added a reviewer for D1655: Remove unnecessary noqa: Reviewers.
Jun 27 2019, 6:48 PM
twitu updated the diff for D1649: Reformat docstring for max line length.

Reformat docstring for max line length

Jun 27 2019, 6:44 PM
twitu added inline comments to D1649: Reformat docstring for max line length.
Jun 27 2019, 4:13 PM
twitu updated the task description for T1836: Reformat docstrings that exceed 80 columns.
Jun 27 2019, 4:11 PM · Documentation, Easy hack
twitu abandoned D1648: Reformat docstring for max line length.
Jun 27 2019, 4:11 PM
twitu added a comment to D1648: Reformat docstring for max line length.

I'll close this revision.

Jun 27 2019, 4:11 PM
twitu added a comment to D1648: Reformat docstring for max line length.

I get it now, that's an ingenious way of keeping documentation up to date. However if this is evaluated why is adding whitespace changing the output, the dictionary and list should still have valid items.

Jun 27 2019, 3:55 PM
twitu added a comment to D1648: Reformat docstring for max line length.

Then swh-indexer does not require any changes I will close this diff. Wow, I never new docstrings could be evaluated. Why is this required though?

Jun 27 2019, 3:48 PM
twitu abandoned D1647: Reformat docstrings for max line length.
Jun 27 2019, 3:46 PM
twitu added a comment to D1647: Reformat docstrings for max line length.

I will close this diff since no changes required for swh-deposit. With complex cases like this there is no way this process can be automated.

Jun 27 2019, 3:45 PM
twitu added a comment to T1836: Reformat docstrings that exceed 80 columns.

I followed swh-docker-dev documentation to host the setup locally. But it only hosts the web portal, I couldn't access locally hosted documentation. How can I see the effect my changes are making?

Jun 27 2019, 6:51 AM · Documentation, Easy hack
Herald added a reviewer for D1650: Reformat docstring in utils.py: Reviewers.
Jun 27 2019, 6:48 AM
twitu updated the summary of D1649: Reformat docstring for max line length.
Jun 27 2019, 6:30 AM
twitu updated the summary of D1648: Reformat docstring for max line length.
Jun 27 2019, 6:30 AM
Herald added a reviewer for D1649: Reformat docstring for max line length: Reviewers.
Jun 27 2019, 6:23 AM
twitu added a comment to D1647: Reformat docstrings for max line length.

It is odd that py3 test cases are failing although I have only changed to formatting of the docstrings.

Jun 27 2019, 6:23 AM
Herald added a reviewer for D1648: Reformat docstring for max line length: Reviewers.
Jun 27 2019, 6:19 AM
twitu updated the summary of D1647: Reformat docstrings for max line length.
Jun 27 2019, 6:08 AM
Herald added a reviewer for D1647: Reformat docstrings for max line length: Reviewers.
Jun 27 2019, 6:08 AM

Jun 26 2019

twitu updated the task description for T1836: Reformat docstrings that exceed 80 columns.
Jun 26 2019, 7:11 PM · Documentation, Easy hack
twitu added a comment to T1836: Reformat docstrings that exceed 80 columns.

what is the expected formatting for snippets like these

the 80 character mark is -----------------------------------------------------------------|
@browse_route(r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/', # noqa
              r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/(?P<path>.+)/', # noqa
              r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/directory/', # noqa
              r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/directory/(?P<path>.+)/', # noqa
              r'origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/', # noqa
              r'origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/(?P<path>.+)/', # noqa
              r'origin/(?P<origin_url>.+)/directory/', # noqa
              r'origin/(?P<origin_url>.+)/directory/(?P<path>.+)/', # noqa
              view_name='browse-origin-directory')
def origin_directory_browse(request, origin_url, origin_type=None,
                            timestamp=None, path=None):
    """Django view for browsing the content of a directory associated
    to an origin for a given visit.
Jun 26 2019, 6:51 PM · Documentation, Easy hack
twitu added a comment to T1836: Reformat docstrings that exceed 80 columns.
"""Django view that produces an HTML display of a content identified
    by its hash value.
Jun 26 2019, 6:37 PM · Documentation, Easy hack

Jun 25 2019

twitu added a comment to T1839: Write glossary/taxonomy for push archival process and mechanism.

https://docs.softwareheritage.org/devel/apidoc/swh.model.html#module-swh.model.identifiers
This explains some of the fields associated with dates, timestamps and offset.

Jun 25 2019, 6:31 PM · Community Building, Documentation
twitu closed T1527: Have comments on all columns of all databases as Resolved.
Jun 25 2019, 6:25 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1613: Add a public API endpoint to get the metadata of an origin.

This task is completed it can be closed.

Jun 25 2019, 6:24 PM · Easy hack, Metadata workflow, Web app
twitu committed rDWAPPS9a8e1f98c9fd: Add origin_metadata_get API endpoint (authored by twitu).
Add origin_metadata_get API endpoint
Jun 25 2019, 6:10 PM
twitu closed D1623: Add origin_metadata_get API endpoint.
Jun 25 2019, 6:10 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Merge with master

Jun 25 2019, 5:57 PM
twitu added a comment to D1623: Add origin_metadata_get API endpoint.

Ready to land made all the required changes.

Jun 25 2019, 5:47 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Remove tox dependency

Jun 25 2019, 5:41 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Remove + symbol from end of line

Jun 25 2019, 5:40 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Fix typo

Jun 25 2019, 5:31 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Add assert_called_once_with to test case

Jun 25 2019, 5:18 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Fix test case. Silly mistakes are costly.

Jun 25 2019, 5:09 AM

Jun 24 2019

twitu added inline comments to D1623: Add origin_metadata_get API endpoint.
Jun 24 2019, 7:26 PM
twitu added a comment to T881: PostgreSQL backups based on pg_dump.

While I was adding comments to all the tables in the db, I experimented a bit with pgdump.

Some databases could benefit from some backups without the overhead of having point in time recovery set up for them

If I understand correctly, it means that recovery from all previous time stamps is not a concern here. In such a case, is a chron job running pgdump at regular intervals feasible?

Jun 24 2019, 6:54 PM · System administration
twitu updated the task description for T1527: Have comments on all columns of all databases.
Jun 24 2019, 6:28 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu updated the summary of D1623: Add origin_metadata_get API endpoint.
Jun 24 2019, 6:14 PM
twitu updated the summary of D1623: Add origin_metadata_get API endpoint.
Jun 24 2019, 6:13 PM
twitu added inline comments to D1623: Add origin_metadata_get API endpoint.
Jun 24 2019, 5:55 PM

Jun 23 2019

twitu added a comment to P448 Pytest error stack trace.

reference to D1623

Jun 23 2019, 2:57 PM
twitu created P448 Pytest error stack trace in the S1 Public space.
Jun 23 2019, 2:55 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.
  • Change key reference from origin_id to id
Jun 23 2019, 2:48 PM

Jun 22 2019

twitu created P447 Run a single test file in the S1 Public space.
Jun 22 2019, 12:26 PM
twitu added a comment to P446 Multiple context managers.
with patch('swh.web.common.service.idx_storage') as mock_idx_storage, \
             patch('swh.web.common.service.storage') as mock_storage:

Adding backslash solves it

Jun 22 2019, 10:06 AM
twitu added a comment to P446 Multiple context managers.
with patch('swh.web.common.service.idx_storage') as mock_idx_storage,
             patch('swh.web.common.service.storage') as mock_storage:

giving sytax error on mock_idx_storage

Jun 22 2019, 9:56 AM
twitu added a comment to P446 Multiple context managers.
Jun 22 2019, 9:56 AM