Fedora provides its package metadata (as HTML) on https://packages.fedoraproject.org while their source code is hosted on a custom forge called [Pagure](https://src.fedoraproject.org/).
https://packages.fedoraproject.org hosts 67603 packages, but the corresponding Pagure instance hosts only 35709 (git) repos. This is because of redundancies.
For example: [4ti2/4ti2](https://packages.fedoraproject.org/pkgs/4ti2/4ti2/index.html), [4ti2/4ti2-devel](https://packages.fedoraproject.org/pkgs/4ti2/4ti2-devel/index.html), and [4ti2/4ti2-libs](https://packages.fedoraproject.org/pkgs/4ti2/4ti2-libs/index.html) have different names and metadata but they point to the [same git repo](https://src.fedoraproject.org/rpms/4ti2) on Pagure.
Here's the approach I think we should take to ingest Fedora packages:
- Fetch the git repositories using the [/projects](https://src.fedoraproject.org/api/0/#projects-tab) endpoint. The default page size is 50 but can be increased to 100. Also, each page in the response provides a "next" URL in the response body.
- Extract package name from each repository and visit https://packages.fedoraproject.org/pkgs/<pkg>/ (Assuming <pkg> is the extracted package name).
- The list contains the names and URLs of all the packages associated with the repository.
- Visit each of the package URLs and extract the metadata using BeautifulSoup.
This will list each Fedora package as a separate origin while allowing multiple origins to point to the same repo.