Just to be clear, you're looking to keep these URL working, but turn them into redirects over to swhid-centric URLs with context parameters (and drop the original view code from these URLs), correct?
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
All Stories
Oct 7 2021
This should stay pending until we resolve the archiving policy discussion in T3627, so I'm marking it as such.
Thanks for your feedback @olasd. I see three main arguments raised there: (1) the raciness of archiving those data via other means (= related forks), (2) the completeness of our canvassing of synthetic refs, (3) annotating rather than not archiving "synthetic" refs.
Awesome, thanks for confirming this!
In T3608#71803, @olasd wrote:I'm asking this because using predictable origin-centric URLs is generally much more user friendly than having to use multiple APIs to look up the SWHID of a given object before being able to construct the URL, and one would have to always to dynamic API calls to generate the URL for browsing the "latest archival" of a given origin.
For instance, the "archived origin" SWH badge https://www.softwareheritage.org/2020/01/13/the-swh-badges-are-here/ uses an origin-centric URL.
In T3608#71802, @olasd wrote:Just to be clear, you're looking to keep these URL working, but turn them into redirects over to swhid-centric URLs with context parameters (and drop the original view code from these URLs), correct?
I'm asking this because using predictable origin-centric URLs is generally much more user friendly than having to use multiple APIs to look up the SWHID of a given object before being able to construct the URL, and one would have to always to dynamic API calls to generate the URL for browsing the "latest archival" of a given origin.
Just to be clear, you're looking to keep these URL working, but turn them into redirects over to swhid-centric URLs with context parameters (and drop the original view code from these URLs), correct?
@anlambert do you think we can deprecate following routes as well? I think they can be redirected to the corresponding swh/web/browse/views/<object_type>.py routes.
While we're at it, we should probably be adding some thresholds in the buffer proxy for:
- cumulated length of messages for revisions and releases
- cumulated number of parents for revisions
(this also matches the fact that we've seen, on our main ingestion database, directory_add operations that would take multiple hours, and have knock-on effects on backups and replications because of the long-running insertion transactions)
So, after doing some more analysis of memory usage patterns on these edge case repositories, my suspicion is that the high memory usage is generally being caused by the loader processing batches of large directories, closely packed together, at the same time.
This should stay pending until we resolve the archiving policy discussion in T3627, so I'm marking it as such.
In T3627#71790, @rdicosmo wrote:Yes, we must filter this stuff out (we discussed this issue with @zack some time ago)
In D6405#166673, @ardumont wrote:This looks like an okay thing to do, but instead of only ignoring results (which would only cut down a third of the messages), we should probably be deactivating events completely on these workers.
Yes, I started with that config because i did not initially found the way to configure the send_events to False (or something).
A first run of bitbucket origins have been scheduled and mostly ingested now [1]
(remains only 13 large ones ongoing).
Oct 6 2021
Yes, we must filter this stuff out (we discussed this issue with @zack some time ago, and you may see Torvalds' opinion too https://www.zdnet.com/article/linux-boosts-microsoft-ntfs-support-as-linus-torvalds-complains-about-github-merges/ )
Looks fine (i.e. the identifiers DeprecationWarnings are gone in tox, except for one that gets triggered by some pytest internal assertion rewrite).
update commit message
jsyk, that's the kind of slight adaptations will want to add in the webapp so we can let
yannick access the deposit moderation view at some point. Create a new role, assign that
role to specific users (in keycloak) and slightly adapt the webapp code to check the
logged in user has the proper role to let them access. (I don't recall the task id if
there is one ;)
lgtm
- factorize the exported configuration
- use the right exporter port on met
Rather than doing this, we should probably disable worker task events altogether (that is, run celery worker without the --events/--task-events flag)
This looks like an okay thing to do, but instead of only ignoring results (which would only cut down a third of the messages), we should probably be deactivating events completely on these workers.
This looks like an okay thing to do, but instead of only ignoring results (which would only cut down a third of the messages), we should probably be deactivating events completely on these workers.
i absolutely do not remember what those are.
@olasd these are the failed dependencies you told me to expect, right? The missing package is ... libcmph-dev.
rebase
ahah, nice!
thanks!
I'd like to create a new package ( swh-objstorage-hash) and https://docs.softwareheritage.org/devel/tutorials/add-new-package.html is presumably the guide to do that. I however do not have the required permissions: would someone be so kind as to work with me on this?
FTR without D6401, the packfile received from GH for the CocoaPods/Specs repo contains 21162 references, 21146 of which are starting with /refs/pull/ and 7126 are ending with /merge (even if those have been explicitly not asked thanks to the filtering in RepoRepresentation.determine_wanted().
When D6401 is applied, we only get the 20-ish references that are not pull request related.
Build is green
make it more readable as suggested
I think the issue can be closed.
The pros are:
- it simplify the cluster management (create, configuration and most of all, kubernetes upgrades)
- centralize the global view of the cluster and what is running on it
- OSS and transparent policy
So I'm actually proposing that we filter out all branches whose name start with refs/pulls (with no other conditions attached).