Page MenuHomeSoftware Heritage

Investigate revision reconstruction discrepancy with subversion export
Closed, MigratedEdits Locked

Description

When trying to load again some subversion repositories into the archive, the loader detects that the history got altered and
abort the loading process, see T3694#73221 and related sentry issue.

For some rare cases, it can come from the subversion history being rewritten server side using svnadmin command
but most of the times it means that the filesystem associated to a revision and reconstructed using the svn_ra API
diverges from the one obtained by a svn export operation on the same revision.

Let's try to analyze and fix those issues in that task.

Event Timeline

ardumont triaged this task as Normal priority.Oct 28 2021, 10:12 AM
ardumont created this task.
ardumont renamed this task from loading svn origins from scratch raise to loading svn origin while ignoring history raises.Oct 28 2021, 10:49 AM
ardumont updated the task description. (Show Details)

Ah, it's not for all origins though...
I tried with other origins which demonstrates the same issue [1] and they did not fail...

swhworker@worker17:~$ swh loader run svn https://svn.code.sf.net/p/unimacro/code start_from_scratch=True
INFO:swh.loader.svn.SvnLoader:Load origin 'https://svn.code.sf.net/p/unimacro/code' with type 'svn'
INFO:swh.loader.svn.SvnLoader:Processing revisions [1-614] for {'swh-origin': 'https://svn.code.sf.net/p/unimacro/code', 'remote_url': 'https://svn.code.sf.net/p/unimacro/code', 'local_url': b'/tmp/swh.loader.svn.a4fjpaqs-105156/code', 'uuid': b'df0dbeab-7b48-0410-a972-c90e96de496b'}
{'status': 'eventful'}
swhworker@worker17:~$ swh loader run svn https://svn.code.sf.net/p/open-chord/code start_from_scratch=True
INFO:swh.loader.svn.SvnLoader:Load origin 'https://svn.code.sf.net/p/open-chord/code' with type 'svn'
INFO:swh.loader.svn.SvnLoader:Processing revisions [1-424] for {'swh-origin': 'https://svn.code.sf.net/p/open-chord/code', 'remote_url': 'https://svn.code.sf.net/p/open-chord/code', 'local_url': b'/tmp/swh.loader.svn.0r3xuoo0-107295/code', 'uuid': b'5a381aae-974a-0410-8b17-85268456560c'}
{'status': 'eventful'}

[1] https://sentry.softwareheritage.org/share/issue/84433e1cd9974eb293f0ba3a9ee44fd1/

ardumont renamed this task from loading svn origin while ignoring history raises to loading some svn origin while ignoring history sometimes raises.Oct 28 2021, 12:49 PM

The loading issue for the repository in T3694 description has been identified: a commit message has been modified
server side between two save code now requests leading to a discrepancy in computed hashes for revisions.

Nervertheless, the subversion loader also detects repositories with altered history when trying to load them again
(see related sentry issue) while no modification has been performed server side since last loading.

Below is a little Python script I wrote to extract the URLs of those repositories using Sentry REST API.

import os

import requests

sentry_api_base_url = "https://sentry.softwareheritage.org/api/0"
sentry_issue_events_url = f"{sentry_api_base_url}/issues/8064/events/"


sentry_api_token = os.environ["SENTRY_TOKEN"]
auth_header = {"Authorization": f"Bearer {sentry_api_token}"}
origin_urls = set()

while True:
    response = requests.get(sentry_issue_events_url, headers=auth_header)
    events = response.json()
    if not events:
        break
    for event in events:
        sentry_event_data_url = f"{sentry_api_base_url}/projects/swh/swh-loader-svn/events/{event['eventID']}/"
        sentry_event_data = requests.get(
            sentry_event_data_url, headers=auth_header
        ).json()
        origin_urls.add(sentry_event_data["context"]["celery-job"]["kwargs"]["url"])

    sentry_issue_events_url = response.links.get("next", {}).get("url")

for origin_url in origin_urls:
    print(origin_url)

We obtain a list of 135 repositories that could not be properly loaded into the archive.

https://svn.code.sf.net/p/tu-testbed/code
https://svn.code.sf.net/p/qpegps/code
https://svn.code.sf.net/p/phpesp/code
https://svn.code.sf.net/p/purebiblesearch/code
https://svn.code.sf.net/p/syntax-desktop/svn
https://svn.code.sf.net/p/frame2/code
https://svn.code.sf.net/p/emarket/code
https://svn.code.sf.net/p/vstats/code
https://svn.code.sf.net/p/runasadmin/code
https://svn.code.sf.net/p/osads/code
https://svn.code.sf.net/p/launchy/code
https://svn.code.sf.net/p/wikindx/svn
https://svn.code.sf.net/p/pyfltk/code
https://svn.code.sf.net/p/phpwebapp/code
https://svn.code.sf.net/p/txtfl/code
https://svn.code.sf.net/p/tpabbrevia/code
https://svn.code.sf.net/p/mp3roaster/code
https://svn.code.sf.net/p/battle4mandicor/code
https://svn.code.sf.net/p/blinkensisters/code
https://svn.code.sf.net/p/simspec/code
https://svn.code.sf.net/p/securecollect/code
https://svn.code.sf.net/p/pymerase/code
https://svn.code.sf.net/p/zambia/code
https://svn.code.sf.net/p/sneek-modmii/code
https://svn.code.sf.net/p/unbbayes/code
https://svn.code.sf.net/p/vassalengine/svn
https://svn.code.sf.net/p/jump-pilot/code
https://svn.code.sf.net/p/sakura-editor/code
https://svn.code.sf.net/p/proxytunnel/code
https://svn.code.sf.net/p/beenuts/code
https://svn.code.sf.net/p/personalbackup/code
https://svn.code.sf.net/p/sivp/code
https://svn.code.sf.net/p/gpsdrive/code
https://svn.code.sf.net/p/scorched/code
https://svn.code.sf.net/p/ultrastardx/svn
https://svn.code.sf.net/p/dccss/code
https://svn.code.sf.net/p/fable/code
https://svn.code.sf.net/p/mailmanager/code
https://svn.code.sf.net/p/dnssec-tools/code
https://svn.code.sf.net/p/axiomengine/svn
https://svn.code.sf.net/p/brim/code
https://svn.code.sf.net/p/opensimwiredux/code
https://svn.code.sf.net/p/agilereview/code
https://svn.code.sf.net/p/as2lib/code
https://svn.code.sf.net/p/wxjs/code
https://svn.code.sf.net/p/riverock/code
https://svn.code.sf.net/p/capa/code
https://svn.code.sf.net/p/rosegarden/code
https://svn.code.sf.net/p/tcnopen/trdp
https://svn.code.sf.net/p/jason/svn
https://svn.code.sf.net/p/charon-suite/code
https://svn.code.sf.net/p/arm2rus-dict/code
https://svn.code.sf.net/p/jet/code
https://svn.code.sf.net/p/equtemper/code
https://svn.code.sf.net/p/fellow/subversion
https://svn.code.sf.net/p/opentk/code
https://svn.code.sf.net/p/ujac/svn
https://svn.code.sf.net/p/shellqueue/code
https://svn.code.sf.net/p/extjs-orm/code
https://svn.code.sf.net/p/openvpn-admin/code
https://svn.code.sf.net/p/sqlanyware/code
https://svn.code.sf.net/p/pgsqlformac/code
https://svn.code.sf.net/p/freedos/svn
https://svn.code.sf.net/p/ede/code
https://svn.code.sf.net/p/onlinesongbook/code
https://svn.code.sf.net/p/otlkcon/code
https://svn.code.sf.net/p/tvbrowser/code
https://svn.code.sf.net/p/unimacro/code
https://svn.code.sf.net/p/htppu/svn
https://svn.code.sf.net/p/colorer/svn
https://svn.code.sf.net/p/sharp3d/code
https://svn.code.sf.net/p/linux-karma/code
https://svn.code.sf.net/p/toastpp/code
https://svn.code.sf.net/p/tracclient/code
https://svn.code.sf.net/p/pdfcat/code
https://svn.code.sf.net/p/abbot/svn
https://svn.code.sf.net/p/planeshift/code
https://svn.code.sf.net/p/zencart-german/svn
https://svn.code.sf.net/p/liegkat-archiv/code
https://svn.code.sf.net/p/drakecms/code
https://svn.code.sf.net/p/domainobjects/code
https://svn.code.sf.net/p/fancypants/svn-test
https://svn.code.sf.net/p/vscweb/code
https://svn.code.sf.net/p/jupload/code
https://svn.code.sf.net/p/jedit/svn
https://svn.code.sf.net/p/unnoc/code
https://svn.code.sf.net/p/alchemi/code
https://svn.code.sf.net/p/mars/code
https://svn.code.sf.net/p/ninan/code
https://svn.code.sf.net/p/wxcode/code
https://svn.code.sf.net/p/pio/code
https://svn.code.sf.net/p/e107dutch/code
https://svn.code.sf.net/p/sneek/code
https://svn.code.sf.net/p/open-chord/code
https://svn.code.sf.net/p/codeblocks/code
https://svn.code.sf.net/p/manufacture/cygwin
https://svn.code.sf.net/p/ace/code
https://svn.code.sf.net/p/luxrender/code
https://svn.code.sf.net/p/qingy/code
https://svn.code.sf.net/p/rtse/code
https://svn.code.sf.net/p/xajax/code
https://svn.code.sf.net/p/oorexx/code-0
https://svn.code.sf.net/p/reduce-algebra/code
https://svn.code.sf.net/p/doc-book/code
https://svn.code.sf.net/p/wsjt/wsjt
https://svn.code.sf.net/p/wsmx/code
https://svn.code.sf.net/p/pysces/code
https://svn.code.sf.net/p/joda-time/svn
https://svn.code.sf.net/p/xforceffd/code
https://svn.code.sf.net/p/mgis/code
https://svn.code.sf.net/p/ceres-os/code
https://svn.code.sf.net/p/empact/code
https://svn.code.sf.net/p/rptools/svn
https://svn.code.sf.net/p/montypybot/code
https://svn.code.sf.net/p/mp3splt/code
https://svn.code.sf.net/p/open-fvs/code
https://svn.code.sf.net/p/padict/code
https://svn.code.sf.net/p/stategraph/code
https://svn.code.sf.net/p/nltk/svn
https://svn.code.sf.net/p/xmds/code
https://svn.code.sf.net/p/yanim/code
https://svn.code.sf.net/p/tmda/code
https://svn.code.sf.net/p/synce/code
https://svn.code.sf.net/p/desmume/code
https://svn.code.sf.net/p/drjava/code
https://svn.code.sf.net/p/dftanalysissiestacastep/code
https://svn.code.sf.net/p/telaen/code
https://svn.code.sf.net/p/processdash/code
https://svn.code.sf.net/p/wired/code
https://svn.code.sf.net/p/tvscript/code
https://svn.code.sf.net/p/liquidpcb/code
https://svn.code.sf.net/p/jtimeseries/code
https://svn.code.sf.net/p/yodl/svn
https://svn.code.sf.net/p/nethelp/code
https://svn.code.sf.net/p/wxmathplot/svn

Almost all of them got impacted by a bug in the subversion loader related to svn:eol-style property handling (D6589).
Once the fix applied, computed hashes for objects to archive are no longer inconsistent.

There is also a couple of repositories impacted by another issue that seems related to svn links handling
in the loader: see P1213 and sentry issue, I need to dig further on this.

anlambert renamed this task from loading some svn origin while ignoring history sometimes raises to Investigate revision reconstruction discrepancy with subversion export.Nov 3 2021, 4:31 PM
anlambert updated the task description. (Show Details)

I've triggered runs on those origins with the patched loader btw.

Since D6608 got landed and deployed to production, revision reconstruction discrepancy is detected earlier and there is already a couple reported on sentry,
so keeping this opened.

If i'm not mistaken, there has been some more work about this.
This got deployed with the v0.10 package btw.