Change Details

This should not impact how small to medium repositories are ingesintroduces the means to configure the packfile fetching policy. The default, as before, is to fetch one packfile to ingest everything unknown out of it. When fetch_multiple_packfiles is True (and the ingestion passes through the 'smart' protocol), the ingestion uses packfiles (with a given number_of_heads_per_packfile). After each packfile is loaded, a 'partial' (because incomplete) and 'incremental' (as in gathering seen refs so far) snapshot is created. The end goal is to decrease the current memory pressure when loading a large repositoryEven if the new fetching policy were activated, this should not impact how small to medium repositories are ingested. To improve the current loading, we ensure that we retrieve unknown remote referencesThe end goal is to decrease the potential issues of failure during loading large into batch of 200 references (by default, it's also configurable arepositories (with large packfiles) and to allow the eventual next loader instantiationing to pick up time)where the last loading failure occurred. It's not perfect yet because it also depends on how the repository git graph is (for connectivity (for example, if it happens that first 200 references are fully connected, then we will then we will retrieve everything in one round anyway). Implementation wise, this adapts the current graph walker (which is the one resolving the missing local references from the remote references) so weit won't walk over already fetched references when multiple iterations is needed. This also makes the loader git explicitely create partial visit when fetching packfiles. That is, the loader now creates partial visits with snapshot after each packfile consumed. The end goal being to decrease the work the loader would have to do again if the initial visit would not complete for some reasons. Related to T3625