diff --git a/swh/indexer/data/codemeta/LICENSE b/swh/indexer/data/codemeta/LICENSE new file mode 100644 --- /dev/null +++ b/swh/indexer/data/codemeta/LICENSE @@ -0,0 +1,178 @@ +Copyright 2014-2018, The CodeMeta contributors https://github.com/codemeta/codemeta/blob/master/CONTRIBUTORS.MD + +Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS diff --git a/swh/indexer/data/codemeta/crosswalk.csv b/swh/indexer/data/codemeta/crosswalk.csv new file mode 100644 --- /dev/null +++ b/swh/indexer/data/codemeta/crosswalk.csv @@ -0,0 +1,77 @@ +Parent Type,Property,Type,Description,codemeta-V1,DataCite,OntoSoft,Zenodo,GitHub,Figshare,Software Ontology,Software Discovery Index,Dublin Core,R Package Description,Debian Package,Python Distutils (PyPI),Trove Software Map,Perl Module Description (CPAN::Meta),NodeJS,Java (Maven),Octave,Ruby Gem,ASCL,DOAP,Wikidata,Citation File Format Core (CFF-Core) 1.0.2 +schema:SoftwareSourceCode,codeRepository,URL,"Link to the repository where the un-compiled, human readable code and related code is located (SVN, github, CodePlex).",codeRepository,,,relatedLink,html_url,relatedLink,,,,URL,HomePage,url,,resouces.repository,repository,repositories,,homepage,site_list,repository,source code repository,repository-code +schema:SoftwareSourceCode,programmingLanguage,ComputerLanguage or Text,The computer programming language.,programmingLanguage,Format,hasProgrammingLanguage,,languages_url,,programming language,,,,,classifiers['Programming Language'],Programming Language,,,,,,,programming-language,programming language, +schema:SoftwareSourceCode,runtimePlatform,Text,"Runtime platform or script interpreter dependencies (Example - Java v1, Python2.3, .Net Framework 3.0). Supersedes runtime.",,,,,,,,,,,,,,,,,,platform,,platform,, +schema:SoftwareSourceCode,targetProduct,SoftwareApplication,"Target Operating System / Product to which the code applies. If applies to several versions, just the product name can be used.",,,,,,,,,,,,,,,,,,,,,, +schema:SoftwareApplication,applicationCategory,Text or URL,"Type of software application, e.g. 'Game, Multimedia'.",,,hasSoftwareCategory,communities,,categories,,,,,,classifiers['Topic'],Topic,Categories,,,Categories,,,,, +schema:SoftwareApplication,applicationSubCategory,Text or URL,"Subcategory of the application, e.g. 'Arcade Game'.",,,,,,,,,,,,,,,,,,,,,, +schema:SoftwareApplication,downloadUrl,URL,"If the file can be downloaded, URL to download the binary.",downloadLink,,,,archive_url,,,,,,,,,,,,,,,download-page,,repository-artifact +schema:SoftwareApplication,fileSize,Text,"Size of the application / package (e.g. 18MB). In the absence of a unit (MB, KB etc.), KB will be assumed.",,,,,,,,,,,,,,,,,,,,,, +schema:SoftwareApplication,installUrl,URL,"URL at which the app may be installed, if different from the URL of the item.",,,,,,,,,,,,,,,,,,,,download-mirror,, +schema:SoftwareApplication,memoryRequirements,Text or URL,Minimum memory requirements.,,,,,,,,,,,,,,,,,,,,,, +schema:SoftwareApplication,operatingSystem,Text,"Operating systems supported (Windows 7, OSX 10.6, Android 1.6).",operatingSystems,,SupportsOperatingSystem,,,,,,,,,classifiers['Operating System'],Operating System,OSNAMES,os,,,,,os,operating system, +schema:SoftwareApplication,permissions,Text,"Permission(s) required to run the app (for example, a mobile app may require full internet access or may run only on wifi).",,,,,,,,,,,,,,,,,,,,,, +schema:SoftwareApplication,processorRequirements,Text,Processor architecture required to run the application (e.g. IA64).,,,,,,,,,,,,,,,cpu / engines,,,,,,, +schema:SoftwareApplication,releaseNotes,Text or URL,Description of what changed in this version.,,,,,,,,,,,,,,,,,,,,,, +schema:SoftwareApplication,softwareHelp,CreativeWork,Software application help.,,,,,,,,,,,,,,,,,,,,,, +schema:SoftwareApplication,softwareRequirements,SoftwareSourceCode,Required software dependencies,depends,,hasDependency->Software,,,,,"""Platform, environment, and dependencies""",,"Depends, SystemRequirements",,install_requires,Database Environment,prereqs,dependencies / bundledDependencies / bundleDependencies / peerDependencies,prerequisites,"Depends, SystemRequirements","requirements, add_runtime_dependency",,,depends on software, +schema:SoftwareApplication,softwareVersion,Text,Version of the software instance.,,,,,,,,,,,,,,,,,,,,release,software version, +schema:SoftwareApplication,storageRequirements,Text or URL,Storage requirements (free space required).,,,,,,,,,,,,,,,,,,,,,, +schema:SoftwareApplication,supportingData,DataFeed,Supporting data for a SoftwareApplication.,,,,,,,,,,,,,,,,,,,,,, +schema:CreativeWork,author,Organization or Person,The author of this content or rating. Please note that author is special in that HTML 5 provides a special mechanism for indicating authorship via the rel tag. That is equivalent to this and may be used interchangeably.,agents,creators,,creators,login,,,,,[aut] in Author,,,,,author,,,author,,developer,,authors +schema:CreativeWork,citation,CreativeWork or URL,"A citation or reference to another creative work, such as another publication, web page, scholarly article, etc.",relatedLink,,,,,,,,,,,,,,,,,,,,, +schema:CreativeWork,contributor,Organization or Person,A secondary contributor to the CreativeWork or Event.,,,,,,,,,,[ctb] in Author,,,,,contributor,,,,,developer,, +schema:CreativeWork,copyrightHolder,Organization or Person,The party holding the legal copyright to the CreativeWork.,agents [role=copyrightHolder],,,,,,,,,,,,,,,,,,,,, +schema:CreativeWork,copyrightYear,Number,The year during which the claimed copyright for the CreativeWork was first asserted.,,,,,,,,,,,,,,,,,,,,,, +schema:CreativeWork,creator,Organization or Person,The creator/author of this CreativeWork. This is the same as the Author property for CreativeWork.,agent,,,,,,,,creator,[cre] in Author,,,,,author,,,,,,, +schema:CreativeWork,dateCreated,Date or DateTime,The date on which the CreativeWork was created or the item was added to a DataFeed.,dateCreated,date,,,created_at,,,,created,,Date,,,,,,,,,,, +schema:CreativeWork,dateModified,Date or DateTime,The date on which the CreativeWork was most recently modified or when the item's entry was modified within a DataFeed.,dateModified,date,,,updated_at,,,,,,,,last-updated,,,,,,,,, +schema:CreativeWork,datePublished,Date,Date of first broadcast/publication.,datePublished,publicationYear,,date_published,,date_retrieved,,,date,Date,,,,,,,Date,,,,publication date,date-released +schema:CreativeWork,editor,Person,Specifies the Person who edited the CreativeWork.,,,,,,,,,,,,,,,,,,,,,editor, +schema:CreativeWork,encoding,MediaObject,A media object that encodes this CreativeWork. This property is a synonym for associatedMedia. Supersedes encodings.,,,,,,,,,,,,,,,,,,,,,, +schema:CreativeWork,fileFormat,Text or URL,"Media type, typically MIME format (see IANA site) of the content e.g. application/zip of a SoftwareApplication binary. In cases where a CreativeWork has several media type representations, 'encoding' can be used to indicate each MediaObject alongside particular fileFormat information. Unregistered or niche file formats can be indicated instead via the most appropriate URL, e.g. defining Web page or a Wikipedia entry.",,Format,,,,,,,,,,,,,,,,,,,, +schema:CreativeWork,funder,Organization or Person,A person or organization that supports (sponsors) something through some kind of financial contribution.,fundingReference.funderName,,,contributors.Funder,,,,,,,,,,,,,,,,,, +schema:CreativeWork,keywords,Text,Keywords or tags used to describe this content. Multiple entries in a keywords list are typically delimited by commas.,controlledTerms,subject,hasDomainKeywords,keywords,,tags,,,,,,keywords,,keywords,keywords,,,,,category,,keywords +schema:CreativeWork,license,CreativeWork or URL,"A license document that applies to this content, typically indicated by URL.",licenseId,rights,License,license,license,License,software license,Software license,license,License,,license,license,license,license,licesnse,License,license/licenses,,license,license,license/license-url +schema:CreativeWork,producer,Organization or Person,"The person or organization who produced the work (e.g. music album, movie, tv/radio series etc.).",,,,,,,,,,,,,,,,,,,,,, +schema:CreativeWork,provider,Organization or Person,"The service provider, service operator, or service performer; the goods producer. Another party (a seller) may offer those services or goods on behalf of the provider. A provider may also serve as the seller. Supersedes carrier.",,,,,,,,,,,,,,,,,,,,,, +schema:CreativeWork,publisher,Organization or Person,The publisher of the creative work.,publisher,publisher,os:hasPublisher,,,,software publisher organization,,publisher,,,,,,,,,,,vendor,, +schema:CreativeWork,sponsor,Organization or Person,"A person or organization that supports a thing through a pledge, promise, or financial contribution. e.g. a sponsor of a Medical Study or a corporate sponsor of an event.",,,,,,,,,,,,,,,,,,,,,, +schema:CreativeWork,version,Number or Text,The version of the CreativeWork embodied by a specified resource.,version,version,hasSoftwareVersion,,,,Version,Software version,dcterms:hasVersion,,numeric_version,Version,version,,version,version,version,version,,,,version +schema:CreativeWork,isAccessibleForFree,Boolean,A flag to signal that the publication is accessible for free.,,,,,,,,,,,,,,,,,,,,,, +schema:CreativeWork,isPartOf,CreativeWork,Indicates a CreativeWork that this CreativeWork is (in some sense) part of. Reverse property hasPart,,,,,,,,,,,,,,,,,,,,,,references +schema:CreativeWork,hasPart,CreativeWork,Indicates a CreativeWork that is (in some sense) a part of this CreativeWork. Reverse property isPartOf,,,,,,,,,,,,,,,,,,,,,, +schema:CreativeWork,position,Integer or Text,"The position of an item in a series or sequence of items. (While schema.org considers this a property of CreativeWork, it is also the way to indicate ordering in any list (e.g. the Authors list). By default arrays are unordered in JSON-LD",,,,,,,,,,,,,,,,,,,,,, +schema:Thing,description,Text,A description of the item.,description,description,hasShortDescription,description/notes,description,Description,software,,description,Description,Description,"description, long_description",description,"abstract, description",description,description,Description,"summary, description",abstract,,,abstract +schema:Thing,identifier,PropertyValue or URL,"The identifier property represents any kind of identifier for any kind of Thing, such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See background notes for more details.",identifier,identifier,hasUniqueId,id,id,,,Persistent Identifier,identifier,Package,Package,,,,name,groupId,,,ascl_id,,,doi +schema:Thing,name,Text,"The name of the item (software, Organization)",name,,hasName,title,full_name,Title,SoftwareTitle,Software title,title,Title,,name,Title,name,name,name,name,name,title,,,title +schema:Thing,sameAs,URL,"URL of a reference Web page that unambiguously indicates the item's identity. E.g. the URL of the item's Wikipedia page, Wikidata entry, or official website.",,,,,,,,,,,,,,,,,,,,,, +schema:Thing,url,URL,URL of the item.,URL,,,,,,,,,URL,,,,,homepage,,URL,,,homepage,official website,url +schema:Thing,relatedLink,URL,"A link related to this object, e.g. related web pages",,RelateIdentifier,,,,,,,,,,,,,,,,,,,, +schema:Person,givenName,Text,"Given name. In the U.S., the first name of a Person. This can be used along with familyName instead of the name property",,givenName,,,,,,,,givenName,,,,,,,,,,,,person.given-names +schema:Person,familyName,Text,"Family name. In the U.S., the last name of an Person. This can be used along with givenName instead of the name property.",,familyName,,,,,,,,familyName,,,,,,,,,,,,person.name-particle + person.family-names + person.name-suffix +schema:Person,email,Text,Email address,email,,,,,,,,,email,,author_email,,email-address,author.email,,,email,email,,,person.email/entity.email +schema:Person,affiliation,Text,"An organization that this person is affiliated with. For example, a school/university",affiliation,affiliation,,affiliation,,,,,,,,,,,,,,,,,,person.affiliation +schema:Person,identifier,URL,"URL identifer, ideally an ORCID ID for individuals, a FundRef ID for funders",identifier,nameIdentifier,,ORCID,,ORCID,,,,,,,,,,,,,,,,person.orcid / entity.orcid +schema:Person,name,Text,"The name of an Organization, or if separate given and family names cannot be resolved for a Person",,,,name,,name,,,,,,,,author:contact-name,author.name,,,,,,,entity.name +schema:Person,address,PostalAddress or Text,Physical address of the item.,,,,,,,,,,,,,,,,,,,,,,person.address + person.city + person.region + person.post-code + person.country / entity.address + entity.city + entity.region + entity.post-code + entity.country +schema,type,Object Type (from context or URI),"The object type (e.g. ""Person"", ""Organization"", ""ScientificArticle"", ""SoftwareApplication"", etc).",,,,,,,,,,,,,,,,,,,,,,reference.type +schema,id,URL,Primary identifier for an object. Must be a resolvable URL or a string used to refer to this node elsewhere in the same document,,,,,,,,,,,,,,,,,,,,,, +codemeta:SoftwareSourceCode,softwareSuggestions,SoftwareSourceCode,"Optional dependencies , e.g. for optional features, code development, etc",suggests,,,,,,,,,Suggests,,,,,devDependencies / optionalDependencies,,BuildDepends,add_development_dependency,,,, +codemeta:SoftwareSourceCode,maintainer,Person,Individual responsible for maintaining the software (usually includes an email contact address),uploadedBy,,,,,,,,,Maintainer,,,,,,,,,,maintainer,, +codemeta:SoftwareSourceCode,contIntegration,URL,link to continuous integration service,contIntegration,,,,,,,,,,,,,,,ciManagement,,,,,, +codemeta:SoftwareSourceCode,buildInstructions,URL,link to installation instructions/documentation,buildInstructions,,,,,,,,,,,,,,,,,,,,, +codemeta:SoftwareSourceCode,developmentStatus,Text,"Description of development status, e.g. Active, inactive, supsended. See repostatus.org",developmentStatus,,activeDevelopment,,,,,,,,,classifiers['Development Status'],Development Status,release_status,,,,,,,, +codemeta:SoftwareSourceCode,embargoDate,Date,"Software may be embargoed from public access until a specified date (e.g. pending publication, 1 year from publication)",embargoDate,,,,,embargo_date,,,,,,,,,,,,,,,, +codemeta:SoftwareSourceCode,funding,Text,Funding source (e.g. specific grant),funding,,fundingReference.awardTitle or fundingReference.awardNumber,,,,,,,,,,,,,,,,,,, +codemeta:SoftwareSourceCode,issueTracker,URL,link to software bug reporting or issue tracking system,issueTracker,,,,issues_url,,,,,BugReports,,,,resources.bugtracker,bugs,issuesManagement,Problems,,,bug-database,bug tracking system,repository +codemeta:SoftwareSourceCode,referencePublication,ScholarlyArticle,An academic publication related to the software.,relatedPublications,,,,,,,,,,,,,,,,,,,blog,,references +codemeta:SoftwareSourceCode,readme,URL,link to software Readme file,readme,,,,,,,,,,,,,,,,,,,,, +,,,,relatedIdentifer,,,,,,,,,,,,,,,,,,,,, +,,,,relatedIdentiferType,,,,,,,,,,,,,,,,,,,,, +,,,,relationshipType,,,,,,,,,,,,,,,,,,,,, +,,,,title,,,,,,,,,,,,,,,,,,,,, +,,,,namespace,,,,,,,,,,,,,,,,,,,,, +,,,,role,,,,,,,,,,,,,,,,,,,,, +,,,,roleCode,,,,,,,,,,,,,,,,,,,,, +,,,,softwarePaperCitationIdenifiers,,,,,,,,,,,,,,,,,,,,, diff --git a/swh/indexer/metadata.py b/swh/indexer/metadata.py --- a/swh/indexer/metadata.py +++ b/swh/indexer/metadata.py @@ -118,7 +118,7 @@ ADDITIONAL_CONFIG = { 'tools': ('dict', { 'name': 'swh-metadata-detector', - 'version': '0.0.1', + 'version': '0.0.2', 'configuration': { 'type': 'local', 'context': ['npm', 'codemeta'] @@ -214,7 +214,7 @@ translated_metadata = [] tool = { 'name': 'swh-metadata-translator', - 'version': '0.0.1', + 'version': '0.0.2', 'configuration': { 'type': 'local', 'context': None diff --git a/swh/indexer/metadata_dictionary.py b/swh/indexer/metadata_dictionary.py --- a/swh/indexer/metadata_dictionary.py +++ b/swh/indexer/metadata_dictionary.py @@ -2,7 +2,47 @@ # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information + +import csv import json +import os.path + +import swh.indexer + +CROSSWALK_TABLE_PATH = os.path.join(os.path.dirname(swh.indexer.__file__), + 'data', 'codemeta', 'crosswalk.csv') + + +def read_crosstable(fd): + reader = csv.reader(fd) + try: + header = next(reader) + except StopIteration: + raise ValueError('empty file') + + data_sources = set(header) - {'Parent Type', 'Property', + 'Type', 'Description'} + assert 'codemeta-V1' in data_sources + + codemeta_translation = {data_source: {} for data_source in data_sources} + + for line in reader: # For each canonical name + canonical_name = dict(zip(header, line))['Property'] + for (col, value) in zip(header, line): # For each cell in the row + if col in data_sources: + # If that's not the parentType/property/type/description + for local_name in value.split('/'): + # For each of the data source's properties that maps + # to this canonical name + if local_name.strip(): + codemeta_translation[col][local_name.strip()] = \ + canonical_name + + return codemeta_translation + + +with open(CROSSWALK_TABLE_PATH) as fd: + CROSSWALK_TABLE = read_crosstable(fd) def convert(raw_content): @@ -89,26 +129,7 @@ """ dedicated class for NPM (package.json) mapping and translation """ - mapping = { - 'repository': 'codeRepository', - 'os': 'operatingSystem', - 'cpu': 'processorRequirements', - 'engines': 'processorRequirements', - 'dependencies': 'softwareRequirements', - 'bundleDependencies': 'softwareRequirements', - 'peerDependencies': 'softwareRequirements', - 'author': 'author', - 'contributor': 'contributor', - 'keywords': 'keywords', - 'license': 'license', - 'version': 'version', - 'description': 'description', - 'name': 'name', - 'devDependencies': 'softwareSuggestions', - 'optionalDependencies': 'softwareSuggestions', - 'bugs': 'issueTracker', - 'homepage': 'url' - } + mapping = CROSSWALK_TABLE['NodeJS'] def translate(self, raw_content): content_dict = convert(raw_content) @@ -119,17 +140,7 @@ """ dedicated class for Maven (pom.xml) mapping and translation """ - mapping = { - 'license': 'license', - 'version': 'version', - 'description': 'description', - 'name': 'name', - 'prerequisites': 'softwareRequirements', - 'repositories': 'codeRepository', - 'groupId': 'identifier', - 'ciManagement': 'contIntegration', - 'issuesManagement': 'issueTracker', - } + mapping = CROSSWALK_TABLE['Java (Maven)'] def translate(self, raw_content): content = convert(raw_content) diff --git a/swh/indexer/tests/test_metadata.py b/swh/indexer/tests/test_metadata.py --- a/swh/indexer/tests/test_metadata.py +++ b/swh/indexer/tests/test_metadata.py @@ -6,7 +6,7 @@ import unittest import logging -from swh.indexer.metadata_dictionary import compute_metadata +from swh.indexer.metadata_dictionary import compute_metadata, CROSSWALK_TABLE from swh.indexer.metadata_detector import detect_metadata from swh.indexer.metadata_detector import extract_minimal_metadata_dict from swh.indexer.metadata import ContentMetadataIndexer @@ -51,7 +51,7 @@ }, 'tools': { 'name': 'swh-metadata-detector', - 'version': '0.0.1', + 'version': '0.0.2', 'configuration': { 'type': 'local', 'context': 'npm' @@ -80,7 +80,7 @@ self.maxDiff = None self.content_tool = { 'name': 'swh-metadata-translator', - 'version': '0.0.1', + 'version': '0.0.2', 'configuration': { 'type': 'local', 'context': 'npm' @@ -88,6 +88,31 @@ } MockIndexerStorage.added_data = [] + def test_crosstable(self): + self.assertEqual(CROSSWALK_TABLE['NodeJS'], { + 'repository': 'codeRepository', + 'os': 'operatingSystem', + 'cpu': 'processorRequirements', + 'engines': 'processorRequirements', + 'dependencies': 'softwareRequirements', + 'bundleDependencies': 'softwareRequirements', + 'bundledDependencies': 'softwareRequirements', + 'peerDependencies': 'softwareRequirements', + 'author': 'creator', + 'author.email': 'email', + 'author.name': 'name', + 'contributor': 'contributor', + 'keywords': 'keywords', + 'license': 'license', + 'version': 'version', + 'description': 'description', + 'name': 'name', + 'devDependencies': 'softwareSuggestions', + 'optionalDependencies': 'softwareSuggestions', + 'bugs': 'issueTracker', + 'homepage': 'url' + }) + def test_compute_metadata_none(self): """ testing content empty content is empty @@ -112,7 +137,7 @@ content = b""" { "name": "test_metadata", - "version": "0.0.1", + "version": "0.0.2", "description": "Simple package.json test for indexer", "repository": { "type": "git", @@ -122,7 +147,7 @@ """ declared_metadata = { 'name': 'test_metadata', - 'version': '0.0.1', + 'version': '0.0.2', 'description': 'Simple package.json test for indexer', 'codeRepository': { 'type': 'git', @@ -143,7 +168,7 @@ # given metadata_list = [{ 'name': 'test_1', - 'version': '0.0.1', + 'version': '0.0.2', 'description': 'Simple package.json test for indexer', 'codeRepository': { 'type': 'git', @@ -152,7 +177,7 @@ 'other': {} }, { 'name': 'test_0_1', - 'version': '0.0.1', + 'version': '0.0.2', 'description': 'Simple package.json test for indexer', 'codeRepository': { 'type': 'git', @@ -161,7 +186,7 @@ 'other': {} }, { 'name': 'test_metadata', - 'version': '0.0.1', + 'version': '0.0.2', 'author': 'moranegg', 'other': {} }] @@ -172,7 +197,7 @@ # then expected_results = { "developmentStatus": None, - "version": ['0.0.1'], + "version": ['0.0.2'], "operatingSystem": None, "description": ['Simple package.json test for indexer'], "keywords": None, @@ -238,7 +263,7 @@ 'issueTracker': { 'url': 'https://github.com/npm/npm/issues' }, - 'author': + 'creator': 'Isaac Z. Schlueter (http://blog.izs.me)', 'codeRepository': { 'type': 'git',