Page MenuHomeSoftware Heritage

Add various markdown variants to list of intrinsic metadata files to be indexed
Open, NormalPublic

Description

There is a steady trend of new projects using a markdown/rst/html version of the traditional text files that hold information about the projects: one finds README.md, AUTHORS.rst, INSTALL.txt, even LICENSE.html, with all variants of upcase/lowercase in the name. Sometimes these variants sit side by side with the plain text ones (README, AUTHORS, etc.), sometimes they just replace them.

The metadata indexing pipeline needs to look for all these variants to make sure we do not miss relevant metadata information.

This is related to T2064, but not limited to deposits.

Event Timeline

rdicosmo raised the priority of this task from Low to Normal.Apr 9 2021, 4:45 PM

Hey, I would like to work on this issue.

Sometimes these variants site side by side with the plain text ones

I didn't understand what site here means.
Could anyone help me with more info about the issue?

Hey, I would like to work on this issue.

Sometimes these variants site side by side with the plain text ones

I didn't understand what site here means.

Thanks for reporting the typo, this is fixed now (s/site/sit/)

@VickyMerzOwn Sorry, this task shouldn't be tagged as "easy hack", because T2270 needs to be resolved first.