Page MenuHomeSoftware Heritage

KShivendu (Kumar Shivendu)
User

Projects

User does not belong to any projects.

User Details

User Since
Feb 23 2021, 11:48 AM (7 w, 6 d)

Recent Activity

Yesterday

KShivendu added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

Do you some more tests or this task can be declared as resolved?

Mon, Apr 19, 7:09 PM · Easy hack, Journal

Sun, Apr 18

KShivendu closed D5529: Add test to ensure that an exception is raised if unique_key isn't implemented.
Sun, Apr 18, 3:17 PM
KShivendu committed rDJNL236a00262e4f: test: Catch errors in write_addition if unique_key isn't implemented (authored by KShivendu).
test: Catch errors in write_addition if unique_key isn't implemented
Sun, Apr 18, 3:17 PM

Sat, Apr 17

KShivendu added a comment to D5529: Add test to ensure that an exception is raised if unique_key isn't implemented.

Should I add the names of other contributors as well?

Sat, Apr 17, 7:29 AM

Fri, Apr 16

KShivendu added a comment to D5529: Add test to ensure that an exception is raised if unique_key isn't implemented.

Should I add the names of other contributors as well?

Fri, Apr 16, 11:15 AM
KShivendu updated the diff for D5529: Add test to ensure that an exception is raised if unique_key isn't implemented.

Rebase before pushing

Fri, Apr 16, 11:14 AM
KShivendu updated the diff for D5529: Add test to ensure that an exception is raised if unique_key isn't implemented.

Add same test for regular kafka writer

Fri, Apr 16, 6:50 AM
KShivendu updated the summary of D5529: Add test to ensure that an exception is raised if unique_key isn't implemented.
Fri, Apr 16, 6:48 AM
KShivendu retitled D5529: Add test to ensure that an exception is raised if unique_key isn't implemented from inmemory: Add test to detect exception in object unique_key to Add test to ensure that an exception is raised if unique_key isn't implemented.
Fri, Apr 16, 6:47 AM

Wed, Apr 14

KShivendu requested review of D5529: Add test to ensure that an exception is raised if unique_key isn't implemented.
Wed, Apr 14, 8:36 PM
KShivendu closed T2316: Align row deduplication of all _add endpoints on release_add as Resolved.
Wed, Apr 14, 5:59 PM · Easy hack, Storage manager
KShivendu updated the task description for T3225: Update the metadata indexer documentation.
Wed, Apr 14, 5:54 PM · Indexer, Documentation
KShivendu updated subscribers of T1946: Improve run_a_new_lister.rst file.

Hey @hm, I don't see any typo in https://forge.softwareheritage.org/source/swh-lister/browse/master/docs/run_a_new_lister.rst$50-51. If you do, please make a revision to fix the same.

Wed, Apr 14, 5:50 PM · Easy hack, Documentation, Lister
KShivendu closed T1946: Improve run_a_new_lister.rst file as Resolved.
Wed, Apr 14, 5:45 PM · Easy hack, Documentation, Lister
KShivendu closed T3132: loader-git: Bad formatting of the "Pack file too big" error message as Resolved.
Wed, Apr 14, 5:44 PM · Easy hack, Git loader

Mon, Apr 12

KShivendu closed D5419: Cassandra: Deduplicate lists passed to *_add endpoints.
Mon, Apr 12, 1:30 PM
KShivendu committed rDSTOc96942b40648: Cassandra: Deduplicate lists passed to *_add endpoints (authored by KShivendu).
Cassandra: Deduplicate lists passed to *_add endpoints
Mon, Apr 12, 1:30 PM
KShivendu updated the diff for D5419: Cassandra: Deduplicate lists passed to *_add endpoints.

Updating D5419: Cassandra: Deduplicate lists passed to *_add endpoints

Mon, Apr 12, 1:29 PM

Fri, Apr 9

KShivendu added a comment to T3227: DB Schema link broken in docs under swh-storage..

Hey @faux it's the same as T3145

Fri, Apr 9, 1:02 PM · Easy hack, Documentation
KShivendu updated the diff for D5419: Cassandra: Deduplicate lists passed to *_add endpoints.

Updating D5419: Cassandra: Deduplicate lists passed to *_add endpoints

Fri, Apr 9, 12:22 PM
KShivendu updated the diff for D5419: Cassandra: Deduplicate lists passed to *_add endpoints.

Updating D5419: Cassandra: Deduplicate lists passed to *_add endpoints

Fri, Apr 9, 10:54 AM

Thu, Apr 8

KShivendu added a comment to D5420: cli/identify: Add support for --recursive.

we should build a single model object for the top-level dir, and either output its SWHID, or traverse it (without recomputing SWHIDs) to output all of it

Thu, Apr 8, 7:31 PM
KShivendu added inline comments to D5420: cli/identify: Add support for --recursive.
Thu, Apr 8, 7:28 PM
KShivendu updated the diff for D5420: cli/identify: Add support for --recursive.

Updating D5420: cli/identify: Use TerminalColor Enum and change recursive flag's description

Thu, Apr 8, 7:21 PM
KShivendu added inline comments to D5419: Cassandra: Deduplicate lists passed to *_add endpoints.
Thu, Apr 8, 6:23 PM
KShivendu requested review of D5419: Cassandra: Deduplicate lists passed to *_add endpoints.
Thu, Apr 8, 6:19 PM
KShivendu planned changes to D5419: Cassandra: Deduplicate lists passed to *_add endpoints.

What do you think should be done for releases?

Thu, Apr 8, 6:13 PM
KShivendu updated the diff for D5419: Cassandra: Deduplicate lists passed to *_add endpoints.

Updating D5419: Cassandra: Fixed failing tests

Thu, Apr 8, 6:00 PM
KShivendu created T3225: Update the metadata indexer documentation.
Thu, Apr 8, 5:20 PM · Indexer, Documentation
KShivendu added a comment to D5419: Cassandra: Deduplicate lists passed to *_add endpoints.

These objects has an id attribute. You can use it for deduplication (eg. via a dict)

Not all objects had id so I used swhid. But some of the tests are failing.

Thu, Apr 8, 11:48 AM
KShivendu added a comment to D5416: Fix lister.yml location.

Hi @hm, Your changes have been accepted but they aren't merged yet. Please do a git push to get your commit(s) merged :)

Thu, Apr 8, 4:53 AM

Wed, Apr 7

KShivendu updated the diff for D5419: Cassandra: Deduplicate lists passed to *_add endpoints.

Updating D5419: Cassandra: Deduplicate lists passed to *_add endpoints

Wed, Apr 7, 4:49 PM
KShivendu updated the diff for D5419: Cassandra: Deduplicate lists passed to *_add endpoints.

Updating D5419: Cassandra: Deduplicate lists passed to *_add endpoints

Wed, Apr 7, 4:12 PM
KShivendu added a comment to D5419: Cassandra: Deduplicate lists passed to *_add endpoints.

I just discovered that tests failed because the set's internally used hash function throws an error for if a dictionary is passed.
Do you know any other trick which can do the de-duplication in one line? Or should I just create a common function to loop over the list and find the unique ones?

Wed, Apr 7, 2:11 PM
KShivendu requested review of D5419: Cassandra: Deduplicate lists passed to *_add endpoints.

I don't think you need to convert the sets back to lists

I did that and got type errors from mypy.
Imo, it's okay to leave it as list(set(..)) because anyhow it gets transformed back into a list within the next 2-3 lines.
What do you think?

Wed, Apr 7, 1:42 PM

Tue, Apr 6

KShivendu added inline comments to D5420: cli/identify: Add support for --recursive.
Tue, Apr 6, 11:16 AM

Mon, Apr 5

KShivendu added inline comments to D5420: cli/identify: Add support for --recursive.
Mon, Apr 5, 8:51 PM
KShivendu requested review of D5420: cli/identify: Add support for --recursive.
Mon, Apr 5, 8:40 PM
KShivendu added a comment to T1487: Add a public API endpoint to retrieve a set of files with a given name.

Hi guys. Any pointers on where to start?

Mon, Apr 5, 1:57 PM · Easy hack, Storage manager, Object storage
KShivendu added a comment to T1377: in-memory storage: compute all counters.

I might be wrong but, I think it has been completed. Check out these :

Mon, Apr 5, 12:24 PM · Easy hack, Storage manager

Sun, Apr 4

KShivendu added a comment to T3078: Index CITATION.cff files.

Hey @moranegg
I suggest the following modifications in adding-support-for-additional-metadata page:

  • python3 -m swh.indexer.metadata_dictionary MyMapping path/to/input/file doesn't work. Replace with swh indexer mapping translate cff path/to/input/file
  • Whenever adding new mappings, it has to be mentioned in the MAPPING_NAME variable of the swh/indexer/storage/__init__.py file which isn't mentioned in the documentation. (though it didn't throw error while testing or parsing) and in expected_output of test_cli_mapping_list
  • Mentioning a few examples about fields that are not string_fields
  • Elaborating how _traslate_dict function works. For example : it executes functions starting with 'normalize_'
  • Mention command : swh indexer mapping list-terms to display Supported CodeMeta terms
  • Add Youtube videos about JSONLD : JSON-LD Basics, JSON-LD: Core markup, Compaction and Expansion
Sun, Apr 4, 11:16 AM · Intrinsic metadata, Easy hack
KShivendu added a comment to T3132: loader-git: Bad formatting of the "Pack file too big" error message.

I am here to just say: swh-loader-git doesn't have a CONTRIBUTORS file. You may ask the contributor to add it as well :)

Sun, Apr 4, 10:56 AM · Easy hack, Git loader
KShivendu added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

Hey @vlorentz
How do I check https://forge.softwareheritage.org/source/swh-journal/browse/master/swh/journal/writer/inmemory.py$31. Do I have to pass dummy content, raw_extrinsic_metadata, origin_visit, et cetera as the object_ to write_addition function and before passing verify if they have unique_key function implemented ?

Sun, Apr 4, 10:13 AM · Easy hack, Journal
KShivendu added a comment to T2687: nixguix: Add support for downloads over FTP.

Hey @vlorentz, can you please give me some hints for this and an example URL for testing the code?

Sun, Apr 4, 9:26 AM · Nixguix loader, Easy hack
KShivendu added a comment to T2634: swh-core: missing test dependency on requests.

Hi @zack Is this task still valid? If so how do I reproduce the error. I tried running pytest in swh-fuse and swh-core but that doesn't throw any error.

Sun, Apr 4, 9:15 AM · Easy hack, Core & foundations
KShivendu added a comment to T2265: Building the documentation should not show any warning..

You can check if there is currently some warnings issued during the documentation build by following this link. Every reported warnings should be fixed.

Sun, Apr 4, 8:49 AM · Easy hack, Documentation

Sat, Apr 3

KShivendu closed D5273: swh-indexer : Add mapping for CITATION.cff files.
Sat, Apr 3, 6:26 PM
KShivendu committed rDCIDX8f1fb0f9310d: metadata_dictionary: Add mapping for CITATION.cff (authored by KShivendu).
metadata_dictionary: Add mapping for CITATION.cff
Sat, Apr 3, 6:26 PM
KShivendu added a comment to D5273: swh-indexer : Add mapping for CITATION.cff files.

If you want to add .vscode to .gitignore, you should do it in its own diff, and for *all* repositories.

Sat, Apr 3, 1:55 PM

Fri, Apr 2

KShivendu updated the diff for D5273: swh-indexer : Add mapping for CITATION.cff files.

Updating D5273: swh-indexer : Fix issues related to datePublished and codeRepository fields.
Migrate code related to authors into normalize_authors function
Add .vscode/ in .gitignore to avoid tracking launch.json and other VScode config files

Fri, Apr 2, 10:49 PM

Thu, Apr 1

KShivendu added inline comments to D5273: swh-indexer : Add mapping for CITATION.cff files.
Thu, Apr 1, 2:53 PM

Tue, Mar 30

KShivendu added a comment to T3078: Index CITATION.cff files.

Hey @sdruskat, I have a few questions : (1) Any idea of when will be the new version available and (2) when will be the crosswalk file updated to at least 1.1.0? (3) The newer version will be backwards compatible, right?

Tue, Mar 30, 4:31 PM · Intrinsic metadata, Easy hack
KShivendu updated the diff for D5273: swh-indexer : Add mapping for CITATION.cff files.

Updating D5273: Remove 'schema:' and limit test to two authors

Tue, Mar 30, 4:27 PM

Sat, Mar 27

KShivendu added a comment to D5273: swh-indexer : Add mapping for CITATION.cff files.

And the expected value in your test is not valid, because it uses the schema: prefix, without defining it.

Sat, Mar 27, 8:02 PM
KShivendu added a comment to T3078: Index CITATION.cff files.

Hey @sdruskat, I noticed that the current crosswalk.csv has CFF version 1.0.2. And I agree on keeping the crosswalk.csv file updated as much as possible. But to the best of my knowledge, updating the crosswalk.csv file later (when you are done with creating the new version) won't break anything here. Plus, I am new here and I am learning about metadata in order to create a better GSoC proposal for improving swh archive search. I am getting to learn a lot by working on this task.
Also, I can assure you that I will update the crosswalk.csv file myself as soon as you are done with it :)

Sat, Mar 27, 2:55 PM · Intrinsic metadata, Easy hack

Fri, Mar 26

KShivendu added a comment to T3078: Index CITATION.cff files.
Fri, Mar 26, 11:59 AM · Intrinsic metadata, Easy hack

Wed, Mar 24

KShivendu added a comment to D5273: swh-indexer : Add mapping for CITATION.cff files.

Any valid Codemeta (or even schema.org) is fine

Wed, Mar 24, 10:28 AM
KShivendu updated the diff for D5273: swh-indexer : Add mapping for CITATION.cff files.

Updating D5273: swh-indexer : Fix failing tests

Wed, Mar 24, 10:24 AM
KShivendu added a comment to D5273: swh-indexer : Add mapping for CITATION.cff files.

Any valid Codemeta (or even schema.org) is fine

So how does a parser decide which one to use for interpretation?

Wed, Mar 24, 10:15 AM
KShivendu updated the diff for D5273: swh-indexer : Add mapping for CITATION.cff files.

Updating D5273: swh-indexer : Add mapping for CITATION.cff files

Wed, Mar 24, 10:14 AM

Sun, Mar 21

KShivendu added a comment to D5273: swh-indexer : Add mapping for CITATION.cff files.

I am writing tests for CITATION.cff metadata mapping and I have 2 doubts :
(1) Are CITATION and CITATION.cff the same? (If so, cff-converter isn't able to parse our swh/indexer/data/codemeta/CITATION file maybe because it doesn't look like a YAML file)
(2) cff-converter's output is :

{
    "@context": [
        "https://doi.org/10.5063/schema/codemeta-2.0",
        "http://schema.org"
    ],
    "@type": "SoftwareSourceCode",
    "author": [
        {
            "@id": "https://orcid.org/0000-0002-7064-4069",
            "@type": "Person",
            "affiliation": {
                "@type": "Organization",
                "legalName": "Netherlands eScience Center"
            },
            "familyName": "Spaaks",
            "givenName": "Jurriaan H."
        },
        {
            "@type": "Person",
            "affiliation": {
                "@type": "Organization",
                "legalName": "Netherlands eScience Center"
            },
            "familyName": "Klaver",
            "givenName": "Tom"
        }
    ],
    "codeRepository": "https://github.com/citation-file-format/cff-converter-python",
    "datePublished": "2019-11-12",
    "identifier": "https://doi.org/10.5281/zenodo.1162057",
    "keywords": [
        "citation",
        "bibliography",
        "cff",
        "CITATION.cff"
    ],
    "license": "http://www.apache.org/licenses/LICENSE-2.0",
    "name": "cffconvert",
    "version": "1.4.0-alpha0"
}
Sun, Mar 21, 11:58 AM

Mar 20 2021

KShivendu updated the test plan for D5273: swh-indexer : Add mapping for CITATION.cff files.
Mar 20 2021, 10:14 AM
KShivendu retitled D5273: swh-indexer : Add mapping for CITATION.cff files from Add mapping for CITATION.cff to swh-indexer : Add mapping for CITATION.cff files.
Mar 20 2021, 10:05 AM
KShivendu updated the diff for D5273: swh-indexer : Add mapping for CITATION.cff files.

Updating D5273: Add mapping for CITATION.cff

Mar 20 2021, 9:52 AM
KShivendu closed D5282: swh-indexer: sync data/codemeta with official codemeta repo.
Mar 20 2021, 9:39 AM
KShivendu committed rDCIDX8e046c1900df: data/codemeta: sync with official codemeta repo (authored by KShivendu).
data/codemeta: sync with official codemeta repo
Mar 20 2021, 9:39 AM

Mar 19 2021

KShivendu updated the diff for D5282: swh-indexer: sync data/codemeta with official codemeta repo.

Updating D5282: swh-indexer: sync data/codemeta with official codemeta repo

Mar 19 2021, 9:29 PM
KShivendu updated the diff for D5282: swh-indexer: sync data/codemeta with official codemeta repo.

Updating D5282: swh-indexer: sync data/codemeta with official codemeta repo

Mar 19 2021, 9:26 PM
KShivendu retitled D5282: swh-indexer: sync data/codemeta with official codemeta repo from data/codemeta: sync with official codemeta repo to swh-indexer: sync data/codemeta with official codemeta repo.
Mar 19 2021, 9:17 PM
KShivendu retitled D5282: swh-indexer: sync data/codemeta with official codemeta repo from Update crosswalk.csv file to data/codemeta: sync with official codemeta repo.
Mar 19 2021, 9:16 PM
KShivendu updated the diff for D5282: swh-indexer: sync data/codemeta with official codemeta repo.

Updating D5282: Fix SWHID in CITATION

Mar 19 2021, 9:15 PM
KShivendu added a comment to D5282: swh-indexer: sync data/codemeta with official codemeta repo.

Hey @vlorentz I noticed that creator has been removed from codemeta.jsonld which is also part of our repository (Source : Github Commit)

Mar 19 2021, 6:42 PM
KShivendu updated the diff for D5282: swh-indexer: sync data/codemeta with official codemeta repo.

Updating D5282: Update data/codemeta

Mar 19 2021, 6:32 PM
KShivendu updated the diff for D5282: swh-indexer: sync data/codemeta with official codemeta repo.

Updating D5282: Update SWHID in CITATION and remove creator from codemeta.jsonld (in sync with the changes in the official codemeta repo)

Mar 19 2021, 6:26 PM

Mar 18 2021

KShivendu requested review of D5282: swh-indexer: sync data/codemeta with official codemeta repo.
Mar 18 2021, 7:44 PM
KShivendu added a comment to D5273: swh-indexer : Add mapping for CITATION.cff files.

Please give some more details on what problems this diff solves. While adding the
reference task is good practice (thanks), it's not enough.

Mar 18 2021, 10:40 AM
KShivendu added a comment to D5273: swh-indexer : Add mapping for CITATION.cff files.

Please don't update crosswalk.csv directly here, it should only be imported from Codemeta. Make sure you read their CONTRIBUTING.md first.

Mar 18 2021, 10:39 AM
KShivendu requested review of D5273: swh-indexer : Add mapping for CITATION.cff files.
Mar 18 2021, 7:18 AM
KShivendu added a revision to T3078: Index CITATION.cff files: D5273: swh-indexer : Add mapping for CITATION.cff files.
Mar 18 2021, 6:16 AM · Intrinsic metadata, Easy hack

Mar 17 2021

KShivendu updated the task description for T3145: Docs : Postgres DB schema missing .
Mar 17 2021, 8:56 AM · Storage manager, Documentation
KShivendu updated the task description for T3145: Docs : Postgres DB schema missing .
Mar 17 2021, 8:56 AM · Storage manager, Documentation
KShivendu triaged T3145: Docs : Postgres DB schema missing as Normal priority.
Mar 17 2021, 8:46 AM · Storage manager, Documentation

Mar 13 2021

KShivendu triaged T3123: double slash (/) in origin url leading to unexpected behaviours as Low priority.
Mar 13 2021, 12:49 PM · Web app
KShivendu added a comment to T2254: textual search language for the Web UI.

Hi @zack, we can consider using Elasticsearch string query format to achieve this feature without having to design the syntax from scratch. It is really powerful and can cover most of the use cases.

Mar 13 2021, 11:51 AM · Archive search, Web app

Mar 12 2021

KShivendu claimed T2028: swh-web should use a proper elasticsearch library to do its requests.
Mar 12 2021, 9:07 PM · Easy hack, Web app
KShivendu added a comment to T1475: Test more edge cases of metadata indexer mappings.

" AttributeError on @id with a colon but less than two slashes" https://github.com/digitalbazaar/pyld/issues/91

Mar 12 2021, 8:03 PM · Easy hack, Indexer
KShivendu updated subscribers of T3078: Index CITATION.cff files.

Hey @moranegg I had a discussion yesterday with @vlorentz on this. We came to the conclusion that the cff-converter should be used as it might cover some edge cases as well. So what should we do now? Please guide.

Mar 12 2021, 7:53 PM · Intrinsic metadata, Easy hack

Mar 11 2021

KShivendu moved T3078: Index CITATION.cff files from Backlog to In progress on the Easy hack board.
Mar 11 2021, 12:56 PM · Intrinsic metadata, Easy hack
KShivendu added a watcher for Easy hack: KShivendu.
Mar 11 2021, 12:31 PM
KShivendu triaged T3114: swh-docs : List out public URLs of web applications served by SWH instances as Low priority.
Mar 11 2021, 11:32 AM · Documentation
KShivendu created T3114: swh-docs : List out public URLs of web applications served by SWH instances.
Mar 11 2021, 11:31 AM · Documentation

Mar 10 2021

KShivendu closed T2731: scanner: strip the path passed as argument from output as Resolved.
Mar 10 2021, 6:59 PM · Easy hack, Code scanner
KShivendu closed D5213: swh/scanner : Strip root path from json output.
Mar 10 2021, 6:55 PM
KShivendu committed rDTSCNeb0442443402: CLI : Clean path in scan output (authored by KShivendu).
CLI : Clean path in scan output
Mar 10 2021, 6:55 PM
KShivendu moved T2731: scanner: strip the path passed as argument from output from Backlog to In progress on the Easy hack board.
Mar 10 2021, 5:34 PM · Easy hack, Code scanner
KShivendu updated the diff for D5213: swh/scanner : Strip root path from json output.

Updating D5213: swh/scanner : Add newline in CONTRIBUTORS file

Mar 10 2021, 5:32 PM
KShivendu updated the diff for D5213: swh/scanner : Strip root path from json output.

Updating D5213: swh/scanner : Add Kumar Shivedu to CONTRIBUTORS

Mar 10 2021, 5:29 PM
KShivendu added a watcher for Metadata workflow: KShivendu.
Mar 10 2021, 3:54 PM
KShivendu added a watcher for Web app: KShivendu.
Mar 10 2021, 2:43 PM
KShivendu added a watcher for Archive search: KShivendu.
Mar 10 2021, 2:25 PM