Launchpadlib
Pros
The library is available on the Debian stretch.
Easier and faster to get all the branches of a project as it returns at one go whereas bare API returns in an indexing fashion.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
May 28 2019
- Add test cases for find_tarball and remove_unnecessary_directories function
- Change rcran name to cran
- Convert oneliner R command to a R script.
May 27 2019
To check the working of the code, I made a separate script using the same code like that in this lister with the functions which are particular to GNU lister like find_all_tarball and list_packages etc . I ran the script to ensure the working the algorithm I am using. It worked fine.
Here is the complete output -
https://forge.softwareheritage.org/P408
- Change download method of tree.json file
- Add functionality to list all the tarball for a package.
- Change download method of tree.json file
- Add functionality to list all the tarball for a package.
- Change download method of tree.json file
- Add functionality to list all the tarball for a package.
- Change download method of tree.json file
- Add functionality to list all the tarball for a package.
May 26 2019
@vlorentz I only added vital information which is very particular to the project and is not present anywhere else.
I have not added more details like api_good_response file or the whole process of writing tests because it may make the tutorial extremely lengthy a new contributor can figure it out by reading the code, whereas adding lister in the main conftest.py is not mentioned anywhere else and cannot be quickly figured out by new contributor.
Add testing section in tutorial docs
Added testing section in tutorial doc
Let's just agree to disgree here. Some of us know how to do this with packages, and have done for years if not decades.
@eddelbuettel As you recommended, I did some preprocessing only to get the useful data.
Also, I changed from a separate file to a single line R script because it is difficult to determine the location of the R script to execute it when the module will be shipped as a package or run in production.
- Optimize R Script to make it scalable
- Replace R script with a single line command.
May 24 2019
Extending on what I wrote in the previous comment, I did a bit more research about this.
May 23 2019
As recommended by @olasd I checkout out Maven Central index ( https://repo.maven.apache.org/maven2/.index/) this is a
May 22 2019
Comment by @olasd
- the repository format is "Maven"; "Maven Central" is only one instance of a Maven repository. There's many more public Maven repositories that would be useful to index, for instance Clojars or the Google Android maven repo : https://www.deps.co/guides/public-maven-repositories/. You'll need to rename the lister to "maven", and to modify the code to avoid hard-coding the maven repository root, making it an argument to the task instead (as we will want to list projects for several instances).
- you went for a scraping approach, which is fine as a last resort. However, a quick search for "maven central index" brought up https://maven.apache.org/repository/central-index.html. Looks like these indexes are available to allow importing the full
It looks like these indexes are available at least for the following maven repositories :
- Maven Central : https://repo.maven.apache.org/maven2/.index/
- Clojars : https://repo.clojars.org/.index/
- JBoss : https://repository.jboss.org/nexus/content/repositories/releases/.index/nexus-maven-repository-index.gz (there's no file index in the .index directory, but the expected files are there)
The index also provides an incremental version (referenced in a properties file) which would allow for incremental updates without having to re-download the full index.
The Google repo also has an index https://developer.android.com/studio/build/dependencies.html#google-maven but it looks very different from the other maven repos I've found. However, it's fairly small compared to the others, so it shouldn't be too hard to sort it out as well.
Please investigate the format of these repository indexes, and the data they provide, and see whether that would be suitable for use as the data source for the lister.
Thanks for a heads up, I didn't knew about this. I will go through the repository indexes and their provided data and inform you about it by latest.
- Change variable names according to convention.
- Add GNU lister in README and cli.py
- Add functions necessary in abstract attribute
@douardda Thanks for helping me out to improve commit messages.
Although I was wondering before landing the diff we usually squash all the commits to one single one, so what is the need to follow strict guidelines for commit messages in the process of improving the diff. I mean at the end they all are going to be squashed to one single commit. :)
- Improve commit messages by using imperative form
May 21 2019
- Added functions necessary in to be present because of @abc.abstractmethod
- Added Maven Central lister in README.md and cli.py
- Added rcran lister in readme and cli.py
- Changed print to stdout in R script and impoved commit messages
Removed useless commit messages
May 20 2019
Browser - Chromium
OS - Ubuntu 18.04
Zoom level in chromium - 100%
Screen Resolution - 1920 * 1080
May 19 2019
May 17 2019
And these will be passed to loaders
Not loaders, only 1 loader, the gnu one.
I sense some communication gap, I need to state more clearly what I am thinking of doing so that you can help me more effectively. Please correct me if am I wrong somewhere or there is a better method.
I think It would be great if loader-tar can be shifted to the core.
May 16 2019
As suggested by @olasd, what was done in 2015 to ingest packages -
- Create origins for all the folders indiscriminately
- Only import things that look like tarballs (i.e. that end with .tar.something)
@olasd recommended trying the listing approach for NuGET lister we discussed(to fetch for repository key in the api response), As recommended, I tried the approach on small dataset. I tried it on 1412 repositories are all of them were quite latest. I found 0 repository URL in them and in 900 of them repository key was empty(ie they were blank string). I think we need to change our approach.
As discussed on IRC the source code link for the repository is in very few of the repositories and the version control system used by repositories is not mentioned in the API response.
One way is Repository URL and the repository type field are present in .nuspec file for each project, so we have to download that file for each project and get source URL but the problem with this is downloading all binary packages to get a small chance to find a link to a source repository sounds like a lot of work, bandwidth and computing power for not much gain and that would only cover one of the ways package maintainers can set the source code information; the aforementioned blog post listed at least four
Thanks, @anlambert, for your help and guidance. As it was my first lister, I would have never been able to complete it without your help. You review assisted me in making this lister more robust and also helping me understand the basics of Lister.
Once again, thanks for your patience and guidance.
May 15 2019
@anlambert As you mentioned in your previous comment, to remove None from the list I have added the function filter_before_inject() in the lister as you recommended to do.
And I have rebased the branch on origin/master
made all the changes recommended
Removed None from the final list
API Documentation -
https://docs.microsoft.com/en-us/nuget/api/catalog-resource#base-url
@olasd I do not have any familiarity with R language. Learning some basics and making this script would take me around a week. I was wondering it is possible that someone in Software Heritage who have some experience with R can write this script as it would be a matter of minutes to the person who knows R.
Is it possible to do so?
May 14 2019
squash commits
squash commits
- Updated README according to new standard
May 13 2019
Here is an implementation plan for making R-CRAN lister.
I have taken inspiration from the pypi lister.
To make lister.py for R-CRAN, we need to inherit SimpleLister class and override ingest_data() function and change its first line (where safely_issue_request() is called) to call the function which would run R script to return a json response.
Then after that it is quite like any normal response, we just need to implement following function list_packages, compute url, get_model_from_repo, task_dict and transport_response_simplified.
- Updated testcase in phabricator lister
@faux on IRC mentioned that there is a public DB dump (https://cran.r-project.org/web/dbs) which might be helpful for the purpose.
This DB dump contains files with .rds extension which is used by R language. Here are a couple of rows from that DB dump https://forge.softwareheritage.org/P396
- Fixed a typo in phabricator lister