Page MenuHomeSoftware Heritage
Feed Advanced Search

Today

twitu updated the diff for D1693: Get skipped content that are missing data.

Rebase on master

Fri, Jul 19, 5:37 PM
twitu updated the diff for D1693: Get skipped content that are missing data.

Rebase and update

Fri, Jul 19, 5:30 PM
twitu updated the diff for D1693: Get skipped content that are missing data.
  • Change index storage mechanism
Fri, Jul 19, 5:20 PM
twitu added a comment to P447 Run a single test file.

Best solution is tox -- -k test_name

Fri, Jul 19, 5:07 PM

Tue, Jul 16

twitu updated the diff for D1693: Get skipped content that are missing data.
  • Add break to prevent multiple yields
Tue, Jul 16, 8:22 PM
twitu updated the diff for D1693: Get skipped content that are missing data.
  • Use all hashes in a content
Tue, Jul 16, 6:51 PM

Sat, Jul 13

twitu updated the diff for D1693: Get skipped content that are missing data.
  • Remove dependency
Sat, Jul 13, 9:21 AM
twitu updated the diff for D1693: Get skipped content that are missing data.
  • Modify in_memory content_add to add skipped_content
Sat, Jul 13, 9:21 AM
twitu added a comment to D1693: Get skipped content that are missing data.

I did not found any mechanism in db.py that is actually storing skipped_content. db.py line 51, is passed without implementation.

Sat, Jul 13, 9:15 AM
twitu added a comment to D1693: Get skipped content that are missing data.

In db.py line 128, the query does not compare blake2s256 despite content_hash_keys = ['sha1', 'sha1_git', 'sha256', 'blake2s256']. Does this mean that skipped content will never be hashed with blake2s256?

Sat, Jul 13, 8:43 AM
twitu added a comment to D1693: Get skipped content that are missing data.

I have a concern here, storage.py line 120. The function self.content_missing can throw an exception in case of a hash collision. Shouldn't line 120 be in a try except block to catch that error and ignore that particular content?

Sat, Jul 13, 6:25 AM

Thu, Jul 11

twitu updated the diff for D1720: Modify API output and test.
  • Modify output and test
  • In reference to T1433
  • In reference to T1433
  • Use db_transaction annotation
Thu, Jul 11, 8:24 PM
twitu updated the diff for D1720: Modify API output and test.

Use db_transaction annotation

Thu, Jul 11, 8:12 PM

Wed, Jul 10

twitu created P463 tox test stack trace for swh-indexer in the S1 Public space.
Wed, Jul 10, 6:11 PM
twitu retitled D1720: Modify API output and test from Modify output and test to Modify API output and test.
Wed, Jul 10, 5:39 PM
twitu updated the diff for D1720: Modify API output and test.

small edit and rebase

Wed, Jul 10, 5:36 PM
Herald added a reviewer for D1720: Modify API output and test: Reviewers.
Wed, Jul 10, 5:31 PM

Tue, Jul 9

twitu committed rDSTOC2ead4ce360ba: Added comments for all tables and columns (authored by twitu).
Added comments for all tables and columns
Tue, Jul 9, 3:07 PM
twitu added a comment to T1433: Refactor output of indexer storage's `get` methods..

I went through all the tests in test_storage.py. It appears that only content_fossology_license_get needs to be refactored. All other storage methods return a dictionary or a list of dictionaries, where each dictionary has multiple keys.

Tue, Jul 9, 5:28 AM · Easy hack, Indexer

Sun, Jul 7

twitu added a comment to T1433: Refactor output of indexer storage's `get` methods..

I am familiar with the web APIs and I went through the discussion in T782. When you say output a single dictionary, I believe you mean something like this

{
  sha1: [
    {tool: TOOL, licenses: [licences]},
    {tool: TOOL, licenses: [licences]}
  ],
Sun, Jul 7, 4:23 PM · Easy hack, Indexer

Sat, Jul 6

twitu added a comment to D1693: Get skipped content that are missing data.

This logic is similar to one being used in db.skipped_content_missing. However the current implementation in_memory._content_add does not populate _skipped_contents and _skipped_content_indexes.

Sat, Jul 6, 9:30 AM
Herald added a reviewer for D1693: Get skipped content that are missing data: Reviewers.
Sat, Jul 6, 9:25 AM

Tue, Jul 2

twitu added a comment to T1758: consistently document the configuration option of each module.

After re-reading the documentation, I realized that the configuration files given in swh-indexer is actually used by swh-scheduler and swh-storage, indicating that these two modules are at the root of the dependency tree.

Tue, Jul 2, 7:36 PM · Development documentation
twitu added a comment to T1758: consistently document the configuration option of each module.

I looked at how the docs look after being built, for e.g. take swh-indexer at https://docs.softwareheritage.org/devel/swh-indexer/dev-info.html. It seems like the configuration information along with instructions to run and test it are best suited to this page. Have you considered adding comments about configuration parameters in this page itself, rather than making a top level file, because because only someone hacking on swh-indexer would be interested in the configuration.

Tue, Jul 2, 4:45 PM · Development documentation
twitu added a comment to T1758: consistently document the configuration option of each module.

I can begin working on it, once I understand what is required. Is my interpretation of the task correct?

Tue, Jul 2, 12:55 PM · Development documentation
twitu closed T1864: Inkscape is not mentioned as dependency for building swh-docs as Resolved.
Tue, Jul 2, 4:51 AM · Development documentation
twitu committed rDDOC20790ca8a5b5: Add inkscape to required tools in README (authored by twitu).
Add inkscape to required tools in README
Tue, Jul 2, 4:50 AM
twitu closed D1673: Add inkscape to required tools in README.
Tue, Jul 2, 4:50 AM

Mon, Jul 1

Herald added a reviewer for D1673: Add inkscape to required tools in README: Reviewers.
Mon, Jul 1, 6:39 PM

Sat, Jun 29

twitu added a comment to T1758: consistently document the configuration option of each module.

This is related to 1388.

Sat, Jun 29, 2:41 PM · Development documentation
twitu created T1864: Inkscape is not mentioned as dependency for building swh-docs in the S1 Public space.
Sat, Jun 29, 2:36 PM · Development documentation

Fri, Jun 28

twitu added a reverting change for rDDEP216d0f74d8c3: Reformat docstrings for max line length: rDDEP4c4324788ae0: Revert "Reformat docstrings for max line length".
Fri, Jun 28, 3:04 PM
twitu committed rDDEP4c4324788ae0: Revert "Reformat docstrings for max line length" (authored by twitu).
Revert "Reformat docstrings for max line length"
Fri, Jun 28, 3:04 PM
twitu closed D1658: Revert "Reformat docstrings for max line length".
Fri, Jun 28, 3:04 PM
twitu updated the diff for D1658: Revert "Reformat docstrings for max line length".

Rebase from master

Fri, Jun 28, 3:03 PM
twitu updated subscribers of D1658: Revert "Reformat docstrings for max line length".

To expand on this further, I think the pre-push hook scripts you have configured only work when using git in a terminal. I am using vs code and curiously, I used the gui to push changes. I believe the scripts are not able to check such a situation and I can commit changes without review. This is probably a security flaw, that should be considered seriously. @zack

Fri, Jun 28, 5:47 AM
Herald added a reviewer for D1658: Revert "Reformat docstrings for max line length": Reviewers.
Fri, Jun 28, 5:35 AM
twitu added a reverting change for rDDEP216d0f74d8c3: Reformat docstrings for max line length: D1658: Revert "Reformat docstrings for max line length".
Fri, Jun 28, 5:35 AM
twitu closed T1836: Reformat docstrings that exceed 80 columns as Resolved.
Fri, Jun 28, 5:25 AM · Development documentation, Easy hack
twitu committed rDWAPPSb2555fabb0fc: Reformatted docstrings wherever possible (authored by twitu).
Reformatted docstrings wherever possible
Fri, Jun 28, 5:24 AM
twitu closed D1650: Reformat docstring in utils.py.
Fri, Jun 28, 5:24 AM
twitu committed rDDEP216d0f74d8c3: Reformat docstrings for max line length (authored by twitu).
Reformat docstrings for max line length
Fri, Jun 28, 5:23 AM
twitu added a comment to D1650: Reformat docstring in utils.py.

It seems like my revision is already in origin master. There are no changes to push. Should I close this revision.

Fri, Jun 28, 4:54 AM

Thu, Jun 27

twitu updated the diff for D1650: Reformat docstring in utils.py.

Rebased master

Thu, Jun 27, 8:16 PM
twitu added a comment to D1650: Reformat docstring in utils.py.

Aren't all doc strings rendered at some place with documentation for packages, sub modules the functions they contain?

Thu, Jun 27, 7:59 PM
twitu added a comment to D1650: Reformat docstring in utils.py.

While formatting the documents I realized that whoever was writing was try to keep line lengths short but they followed an arbitrary line length which was longer than 80 character. Since # noqa was applied these formatting errors did not pop up. I think a lot of this can be resolved if there can be a guideline to add a vertical ruler to the editor at 80 chars.

Thu, Jun 27, 7:51 PM
twitu updated the diff for D1650: Reformat docstring in utils.py.

Fix more docstrings

Thu, Jun 27, 7:50 PM
twitu updated the task description for T1836: Reformat docstrings that exceed 80 columns.
Thu, Jun 27, 7:40 PM · Development documentation, Easy hack
twitu committed rDLDSVN3667f7165353: Remove unnecessary noqa (authored by twitu).
Remove unnecessary noqa
Thu, Jun 27, 7:24 PM
twitu closed D1655: Remove unnecessary noqa.
Thu, Jun 27, 7:24 PM
twitu updated the diff for D1650: Reformat docstring in utils.py.

Reformat docstring wherever possible

Thu, Jun 27, 7:24 PM
twitu committed rDMODdde39f51c0fa: Reformat docstring for max line length (authored by twitu).
Reformat docstring for max line length
Thu, Jun 27, 6:48 PM
twitu closed D1649: Reformat docstring for max line length.
Thu, Jun 27, 6:48 PM
Herald added a reviewer for D1655: Remove unnecessary noqa: Reviewers.
Thu, Jun 27, 6:48 PM
twitu updated the diff for D1649: Reformat docstring for max line length.

Reformat docstring for max line length

Thu, Jun 27, 6:44 PM
twitu added inline comments to D1649: Reformat docstring for max line length.
Thu, Jun 27, 4:13 PM
twitu updated the task description for T1836: Reformat docstrings that exceed 80 columns.
Thu, Jun 27, 4:11 PM · Development documentation, Easy hack
twitu abandoned D1648: Reformat docstring for max line length.
Thu, Jun 27, 4:11 PM
twitu added a comment to D1648: Reformat docstring for max line length.

I'll close this revision.

Thu, Jun 27, 4:11 PM
twitu added a comment to D1648: Reformat docstring for max line length.

I get it now, that's an ingenious way of keeping documentation up to date. I will close this revision then.

Thu, Jun 27, 3:55 PM
twitu added a comment to D1648: Reformat docstring for max line length.

Then swh-indexer does not require any changes I will close this diff. Wow, I never new docstrings could be evaluated. Why is this required though?

Thu, Jun 27, 3:48 PM
twitu abandoned D1647: Reformat docstrings for max line length.
Thu, Jun 27, 3:46 PM
twitu added a comment to D1647: Reformat docstrings for max line length.

I will close this diff since no changes required for swh-deposit. With complex cases like this there is no way this process can be automated.

Thu, Jun 27, 3:45 PM
twitu added a comment to T1836: Reformat docstrings that exceed 80 columns.

I followed swh-docker-dev documentation to host the setup locally. But it only hosts the web portal, I couldn't access locally hosted documentation. How can I see the effect my changes are making?

Thu, Jun 27, 6:51 AM · Development documentation, Easy hack
Herald added a reviewer for D1650: Reformat docstring in utils.py: Reviewers.
Thu, Jun 27, 6:48 AM
twitu updated the summary of D1649: Reformat docstring for max line length.
Thu, Jun 27, 6:30 AM
twitu updated the summary of D1648: Reformat docstring for max line length.
Thu, Jun 27, 6:30 AM
Herald added a reviewer for D1649: Reformat docstring for max line length: Reviewers.
Thu, Jun 27, 6:23 AM
twitu added a comment to D1647: Reformat docstrings for max line length.

It is odd that py3 test cases are failing although I have only changed to formatting of the docstrings.

Thu, Jun 27, 6:23 AM
Herald added a reviewer for D1648: Reformat docstring for max line length: Reviewers.
Thu, Jun 27, 6:19 AM
twitu updated the summary of D1647: Reformat docstrings for max line length.
Thu, Jun 27, 6:08 AM
Herald added a reviewer for D1647: Reformat docstrings for max line length: Reviewers.
Thu, Jun 27, 6:08 AM

Wed, Jun 26

twitu updated the task description for T1836: Reformat docstrings that exceed 80 columns.
Wed, Jun 26, 7:11 PM · Development documentation, Easy hack
twitu added a comment to T1836: Reformat docstrings that exceed 80 columns.

what is the expected formatting for snippets like these

the 80 character mark is -----------------------------------------------------------------|
@browse_route(r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/', # noqa
              r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/(?P<path>.+)/', # noqa
              r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/directory/', # noqa
              r'origin/(?P<origin_type>[a-z]+)/url/(?P<origin_url>.+)/directory/(?P<path>.+)/', # noqa
              r'origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/', # noqa
              r'origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/(?P<path>.+)/', # noqa
              r'origin/(?P<origin_url>.+)/directory/', # noqa
              r'origin/(?P<origin_url>.+)/directory/(?P<path>.+)/', # noqa
              view_name='browse-origin-directory')
def origin_directory_browse(request, origin_url, origin_type=None,
                            timestamp=None, path=None):
    """Django view for browsing the content of a directory associated
    to an origin for a given visit.
Wed, Jun 26, 6:51 PM · Development documentation, Easy hack
twitu added a comment to T1836: Reformat docstrings that exceed 80 columns.
"""Django view that produces an HTML display of a content identified
    by its hash value.
Wed, Jun 26, 6:37 PM · Development documentation, Easy hack

Tue, Jun 25

twitu added a comment to T1839: Write glossary/taxonomy for push archival process and mechanism.

https://docs.softwareheritage.org/devel/apidoc/swh.model.html#module-swh.model.identifiers
This explains some of the fields associated with dates, timestamps and offset.

Tue, Jun 25, 6:31 PM · Scientific Community Building, SWORD deposit
twitu closed T1527: Have comments on all columns of all databases as Resolved.
Tue, Jun 25, 6:25 PM · Easy hack, Development documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1613: Add a public API endpoint to get the metadata of an origin.

This task is completed it can be closed.

Tue, Jun 25, 6:24 PM · Easy hack, Metadata workflow, Web app
twitu committed rDWAPPS9a8e1f98c9fd: Add origin_metadata_get API endpoint (authored by twitu).
Add origin_metadata_get API endpoint
Tue, Jun 25, 6:10 PM
twitu closed D1623: Add origin_metadata_get API endpoint.
Tue, Jun 25, 6:10 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Merge with master

Tue, Jun 25, 5:57 PM
twitu added a comment to D1623: Add origin_metadata_get API endpoint.

Ready to land made all the required changes.

Tue, Jun 25, 5:47 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Remove tox dependency

Tue, Jun 25, 5:41 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Remove + symbol from end of line

Tue, Jun 25, 5:40 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Fix typo

Tue, Jun 25, 5:31 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Add assert_called_once_with to test case

Tue, Jun 25, 5:18 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.

Fix test case

Tue, Jun 25, 5:09 AM

Mon, Jun 24

twitu added inline comments to D1623: Add origin_metadata_get API endpoint.
Mon, Jun 24, 7:26 PM
twitu added a comment to T881: PostgreSQL backups based on pg_dump.

While I was adding comments to all the tables in the db, I experimented a bit with pgdump.

Some databases could benefit from some backups without the overhead of having point in time recovery set up for them

If I understand correctly, it means that recovery from all previous time stamps is not a concern here. In such a case, is a chron job running pgdump at regular intervals feasible?

Mon, Jun 24, 6:54 PM · System administration
twitu updated the task description for T1527: Have comments on all columns of all databases.
Mon, Jun 24, 6:28 PM · Easy hack, Development documentation, Storage manager, Scheduling utilities, Indexer
twitu updated the summary of D1623: Add origin_metadata_get API endpoint.
Mon, Jun 24, 6:14 PM
twitu updated the summary of D1623: Add origin_metadata_get API endpoint.
Mon, Jun 24, 6:13 PM
twitu added inline comments to D1623: Add origin_metadata_get API endpoint.
Mon, Jun 24, 5:55 PM

Sun, Jun 23

twitu added a comment to P448 Pytest error stack trace.

reference to D1623

Sun, Jun 23, 2:57 PM
twitu created P448 Pytest error stack trace in the S1 Public space.
Sun, Jun 23, 2:55 PM
twitu updated the diff for D1623: Add origin_metadata_get API endpoint.
  • Change key reference from origin_id to id
Sun, Jun 23, 2:48 PM

Sat, Jun 22

twitu created P447 Run a single test file in the S1 Public space.
Sat, Jun 22, 12:26 PM
twitu added a comment to P446 Multiple context managers.
with patch('swh.web.common.service.idx_storage') as mock_idx_storage, \
             patch('swh.web.common.service.storage') as mock_storage:

Adding backslash solves it

Sat, Jun 22, 10:06 AM
twitu added a comment to P446 Multiple context managers.
with patch('swh.web.common.service.idx_storage') as mock_idx_storage,
             patch('swh.web.common.service.storage') as mock_storage:

giving sytax error on mock_idx_storage

Sat, Jun 22, 9:56 AM
twitu added a comment to P446 Multiple context managers.
Sat, Jun 22, 9:56 AM