(3) should be ideally implemented in a way that guarantees that extid that were resolvable in previous versions of the mapping will always be resolvable in future versions

I don't understand. Option 3 is to remove relations between extids and SWHID, so it won't be resolvable anymore.

Jul 1 2021, 9:01 AM · Storage manager, Mercurial loader

Jun 30 2021

vlorentz added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

So if that mapping change, but always give back an object in the archive (pointed by a SWHID)

Jun 30 2021, 7:58 PM · Storage manager, Mercurial loader

zack added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

I've the feeling that option (1) will lead in the long run to an explosion on the size of the mapping which will make us eventually converge (slowly) toward option (3).

Jun 30 2021, 7:33 PM · Storage manager, Mercurial loader

vlorentz added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

and it would probably be kind of a mess from a kafka perspective

Jun 30 2021, 6:59 PM · Storage manager, Mercurial loader

olasd added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

The "mapping version field" is the most fleshed out proposal as it would be my preference. My rationale for it against changing extid_type for backwards incompatible changes is that the extid_type is a property of the external artifact, while the mapping version is a property of our archiving infrastructure.

Jun 30 2021, 6:55 PM · Storage manager, Mercurial loader

vlorentz updated the task description for T3418: Decide a consistent policy on having multiple archived objects for the same extid.

Jun 30 2021, 6:55 PM · Storage manager, Mercurial loader

olasd triaged T3418: Decide a consistent policy on having multiple archived objects for the same extid as Unbreak Now! priority.

Jun 30 2021, 6:49 PM · Storage manager, Mercurial loader

Jun 24 2021

ardumont closed T2750: mercurial loader fails on save code now as Resolved.

It no longer does...
We have an end-to-end checkpoint for the mercurial loading and it's green now.

Jun 24 2021, 3:48 PM · Mercurial loader

Jun 21 2021

olasd closed T3336: Deploy swh.loader.mercurial 2.1 in staging, a subtask of T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial, as Resolved.

Jun 21 2021, 12:06 PM · System administration, Mercurial loader

olasd closed T3336: Deploy swh.loader.mercurial 2.1 in staging as Resolved.

Now that the branch structure has landed, I've deployed this latest version. After some cleanup of the duplicate extids left over from an earlier deployment, everything seems to be fine and the loader is ready for production.

Jun 21 2021, 12:06 PM · System administration, Mercurial loader

olasd renamed T3336: Deploy swh.loader.mercurial 2.1 in staging from Deploy swh.loader.mercurial 1.1 in staging to Deploy swh.loader.mercurial 2.1 in staging.

Jun 21 2021, 12:05 PM · System administration, Mercurial loader

Jun 3 2021

olasd added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

In T3352#65755, @Alphare wrote:

Here's what I gather to be the most up-to-date version:

HEAD [required] either the node pointed by the @ bookmark or the tip of default branch

branch-tip/<branch-name> [required] the tipmost head of each open branch

bookmarks/<bookmark_name> [optional] hold the bookmark mapping if any

branch-heads/<branch_name>/0..n [optional] for any branch with multiple open head, list all open heads

branch-closed-heads/<branch_name>/0..n [optional] for any branch with at least one closed head, list all closed heads

tags/<tag-name> [optional] record tags

Note that the current patch sent in D5816 still has the refs/hg/ prefix, does not use the @ bookmark by default nor does it have a namespace for tags. I'll be integrating these things in the next update to the patch.

Jun 3 2021, 6:17 PM · Mercurial loader

marmoute added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

In T3352#65758, @Alphare wrote:

I think @marmoute's intention was to more closely convey the semantics of Mercurial's branching system. A branch tip or head are not a branches themselves, so it would be "wrong" to put them under branches/. Thus, since the plural of "a branch head" is "branch heads" I don't feel like the second change would be appropriate either.

But then again, you have the final say, and I won't die on this hill. :)

Jun 3 2021, 2:35 PM · Mercurial loader

zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

That explains it, and it's good enough for me, thanks :)

Jun 3 2021, 2:30 PM · Mercurial loader

zack updated the task description for T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

Jun 3 2021, 2:30 PM · Mercurial loader

Alphare added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

I think @marmoute's intention was to more closely convey the semantics of Mercurial's branching system. A branch tip or head are not a branches themselves, so it would be "wrong" to put them under branches/. Thus, since the plural of "a branch head" is "branch heads" I don't feel like the second change would be appropriate either.

Jun 3 2021, 2:26 PM · Mercurial loader

zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

My remaining question then is: how about, instead of branch-{tip,heads,closed-heads}/name we use branches/{heads,closed,tip}/name ?

Jun 3 2021, 2:20 PM · Mercurial loader

Alphare updated the task description for T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

Jun 3 2021, 2:03 PM · Mercurial loader

Alphare added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

Here's what I gather to be the most up-to-date version:

Jun 3 2021, 2:01 PM · Mercurial loader

zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

In T3352#65753, @marmoute wrote:

They are all branch heads (git "branch" are about heads too, bookmarks too), so a heads/ prefix does not bring much.

Jun 3 2021, 1:37 PM · Mercurial loader

marmoute added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

In T3352#65752, @zack wrote:

My point here is for user looking at the structure to easily distinguish between the different mapping format. Something based on the "visit data" and associated documentation seems quite fragile.

But a version number for the mapping format will be completely meaningless for a user. It's too Software Heritage-specific. The more I think of it the more I'm convinced we should just offer names that are as natural and self-explanatory as possible. Our notion of what is self-explanatory might change in the future (also based on user feedback), but so be it. It is not going to be a new problem in the archive, as @olasd pointed out, so we will not be making it any worse.

In practical terms here I concur that that means dropping the "refs/hg" prefix.

Jun 3 2021, 1:18 PM · Mercurial loader

zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

My point here is for user looking at the structure to easily distinguish between the different mapping format. Something based on the "visit data" and associated documentation seems quite fragile.

Jun 3 2021, 1:11 PM · Mercurial loader

marmoute added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

In T3352#65749, @olasd wrote:

In T3352#65655, @zack wrote:

I'm happy with the HEAD branch being an alias to a name that's more representative of the corresponding mercurial concept (which would be tip, I guess).

Jun 3 2021, 12:42 PM · Mercurial loader

olasd added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

In T3352#65655, @zack wrote:

HEAD is also a name that is Git-specific, but I understand that we want that notion and probably that name is as good as any. (Unless there is a Mercurial generic name for something similar.)

Jun 3 2021, 11:42 AM · Mercurial loader

Jun 2 2021

Alphare added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

I have implemented @marmoute's version of the mapping at D5816. Since the exact naming scheme is just one search-and-replace away, we can still change it easily. Implementing this has highlighted a flaw in the handling of multiple open heads, which is now fixed.

Jun 2 2021, 7:20 PM · Mercurial loader

Jun 1 2021

marmoute added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

In T3352#65657, @zack wrote:

So, to pivot the question around, what is the minimal (also in the sense that it is shorter / has less cruft) naming scheme that would allow us to represent without ambiguity all the Mercurial naming aspects that you want to capture?

Jun 1 2021, 12:29 AM · Mercurial loader

May 31 2021

zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

Understood. To explain my thinking here, the refs/... structure is something we picked to represent git branch names as faithfully as possible, adding as little as possible on top of it. In trying to represent branch names from another VCS, as a first approximation I'd rather reuse the same *approach* than a *result* that is similar, if that makes sense. So, to pivot the question around, what is the minimal (also in the sense that it is shorter / has less cruft) naming scheme that would allow us to represent without ambiguity all the Mercurial naming aspects that you want to capture?

May 31 2021, 10:24 PM · Mercurial loader

marmoute added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

In T3352#65655, @zack wrote:

Is the ability to recognize that a snapshot comes from Mercurial an actual goal here? I don't think we care about "clashes" between snapshot created from different VCS, but maybe I'm missing something.

May 31 2021, 9:26 PM · Mercurial loader

zack added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

Is the ability to recognize that a snapshot comes from Mercurial an actual goal here? I don't think we care about "clashes" between snapshot created from different VCS, but maybe I'm missing something.

May 31 2021, 9:03 PM · Mercurial loader

olasd added a comment to T3338: Load the archived bitbucket mercurial repositories.

As mentioned in T3336, we've now passed 3000 repos loaded successfully in staging. We've had two failures due to attempting to add two identical objects concurrently, which is something my simple test script wouldn't catch, but would be handled properly by an actual worker process.

May 31 2021, 8:00 PM · System administration, Mercurial loader

marmoute added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

@Alphare proposal is missing the branch-name part for branch-tip. I would ajust it as such:

May 31 2021, 5:13 PM · Mercurial loader

olasd updated subscribers of T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

I'm pinging @zack as I think his feedback on this naming scheme would be valuable.

May 31 2021, 4:44 PM · Mercurial loader

Alphare updated subscribers of T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

Agreed! That would look like:

May 31 2021, 4:38 PM · Mercurial loader

marmoute added a comment to T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

I would go: refs/hg/branch-tip/ instead of just tip

May 31 2021, 4:34 PM · Mercurial loader

Alphare updated the task description for T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip).

May 31 2021, 3:53 PM · Mercurial loader

Alphare triaged T3352: Define a ref mapping naming scheme for all Mercurial "pointers" (heads, closed heads, bookmarks, tip) as High priority.

May 31 2021, 3:40 PM · Mercurial loader

Alphare added a comment to T3338: Load the archived bitbucket mercurial repositories.

The run from this week-end, detailed in T3336, appears to have worked fine. (just making sure it's obvious from this task also)

May 31 2021, 10:37 AM · System administration, Mercurial loader

Alphare added a comment to T3336: Deploy swh.loader.mercurial 2.1 in staging.

Great news! Let me know if I can help in any way.

May 31 2021, 10:16 AM · System administration, Mercurial loader

olasd added a comment to T3336: Deploy swh.loader.mercurial 2.1 in staging.

After the weekend, the loader ran a few thousand loading tasks (out of 235k total). Out of those, only 2 failed for already known concurrency reasons. We should be good to go to production on this loader.

May 31 2021, 10:15 AM · System administration, Mercurial loader

May 28 2021

olasd added a comment to T3336: Deploy swh.loader.mercurial 2.1 in staging.

base_dir=/srv/storage/space/mirrors/boatbucket
tail -n +10000 $base_dir/mapping-to-repos.txt | head -10000 | while read dir url; do
    repo_dir="$base_dir/$dir"
    visit_date=`stat -c %z $repo_dir/.hg/blackbox.log | sed -E 's/ \+0000/+0000/'`
    SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_mercurial.yml swh --log-level=DEBUG loader run mercurial_from_disk $url directory=$repo_dir visit_date="\"$visit_date\""
done 2>&1 | tee -a bitbucket-archive.2.log

May 28 2021, 5:02 PM · System administration, Mercurial loader

olasd changed the status of T3336: Deploy swh.loader.mercurial 2.1 in staging, a subtask of T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial, from Open to Work in Progress.

May 28 2021, 4:44 PM · System administration, Mercurial loader

olasd changed the status of T3336: Deploy swh.loader.mercurial 2.1 in staging from Open to Work in Progress.

After packaging swh.loader.mercurial 1.1 with @Alphare 's changes, all seems well on the staging environment (at least the inconsistencies I had noticed are not there anymore).

May 28 2021, 4:44 PM · System administration, Mercurial loader

olasd renamed T3336: Deploy swh.loader.mercurial 2.1 in staging from Deploy swh.loader.mercurial 1.0 in staging to Deploy swh.loader.mercurial 1.1 in staging.

May 28 2021, 4:43 PM · System administration, Mercurial loader

ardumont added a comment to T3338: Load the archived bitbucket mercurial repositories.

That's awesome news, thanks for the heads up \o/.

May 28 2021, 11:55 AM · System administration, Mercurial loader

Alphare added a comment to T3338: Load the archived bitbucket mercurial repositories.

For posterity, I have tested that all corrupted and "verify failed" repositories in the archive load correctly, as well as the humongous Mozilla-unified, PyPy and about a few thousand random other ones from the archive. Aside from the incremental loading issues detailed in T3336 (that should be fixed in today's run), everything seems fine.

May 28 2021, 11:53 AM · System administration, Mercurial loader

May 27 2021

Alphare updated subscribers of T3336: Deploy swh.loader.mercurial 2.1 in staging.

D5793 should fix the remaining issue. We had a discussion with @marmoute about whether considering closed branches (done in D5790) is *actually* a good idea in terms of presentation, but D5793 fixes the underlying issue, so we'll see about this issue next week.

May 27 2021, 10:58 PM · System administration, Mercurial loader

May 24 2021

olasd added a comment to T3336: Deploy swh.loader.mercurial 2.1 in staging.

Reproduction for the duplicate nodeids in the extid table:

May 24 2021, 4:52 PM · System administration, Mercurial loader

olasd added a comment to T3336: Deploy swh.loader.mercurial 2.1 in staging.

So that's been done on Friday, and things seem to work in general, but there is a bunch of issues:

May 24 2021, 1:01 PM · System administration, Mercurial loader

May 21 2021

Alphare added a comment to T3338: Load the archived bitbucket mercurial repositories.

The mapping file is located (on the boatbucket machine) at /srv/boatbucket/mapping-to-repos.txt. It does *not* contain the (very few) outright corrupted repositories, I might have to do some digging and even bother the BB team again to get the URL for those.

May 21 2021, 11:34 AM · System administration, Mercurial loader

olasd added a parent task for T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial: T3338: Load the archived bitbucket mercurial repositories.

May 21 2021, 10:20 AM · System administration, Mercurial loader

olasd added a subtask for T3338: Load the archived bitbucket mercurial repositories: T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial.

May 21 2021, 10:20 AM · System administration, Mercurial loader

olasd triaged T3338: Load the archived bitbucket mercurial repositories as High priority.

May 21 2021, 10:18 AM · System administration, Mercurial loader

olasd added a subtask for T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial: T3336: Deploy swh.loader.mercurial 2.1 in staging.

May 21 2021, 10:16 AM · System administration, Mercurial loader

olasd added a parent task for T3336: Deploy swh.loader.mercurial 2.1 in staging: T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial.

May 21 2021, 10:16 AM · System administration, Mercurial loader

olasd triaged T3337: Smoke test ingestion of bitbucket repositories with latest loader mercurial as High priority.