Change Details

When we save an unknown origin due to a Save code now request, we schedule a one-shot task for the ingestion, but don't add the origin for future crawling. It might make sense to do both. It is possibly also the only reasonable place where we can have heuristics to de-duplicate URLs that point to the same repo, e.g., non-canonical GitHub repos URLs. (Thanks @singpolyma for the heads-up.)

When we save an unknown origin due to a Save code now request, we schedule a one-shot task for the ingestion, but don't add the origin for future crawling. It might make sense to do both. It is possibly also the only reasonable place where we can have heuristics to de-duplicate URLs that point to the same repo, e.g., non-canonical GitHub repos URLs. (Thanks @singpolyma for the heads-up.) Related to T1110 Related to T2187