Page MenuHomeSoftware Heritage

Investigate svn loading failure
Closed, MigratedEdits Locked

Description

From an email:

Dear support,

today I saved my open-source project CSTNU
(https://profs.scienze.univr.it/posenato/svn/sw/CSTNU) in SoftwareHeritage.
The process was successful and ended at 10/20/2021, 12:08:10 PM.

After few minutes, I need to save again the repository and, therefore, I made
another 'Save' request.
The request was scheduled at 10/20/2021, 12:12:28 PM but, then. failed.
Log:
2021-10-20 11:16:23,942: INFO/ForkPoolWorker-1] Task
swh.loader.svn.tasks.DumpMountAndLoadSvnRepository[709297c6-4de6-4a16-b7bb-a242c9ca8234]
succeeded in 235.8857956090942s: {'status': 'failed'}


Then, I verified that my SVN server was running correctly.
I found no problem.

Then, I tried again a save action, but I always received a 'failed' status.

The last try was schedule at 10/20/2021, 12:12:28 PM and there is no log, just
failed status.

Is it possible to know which is the error?

Event Timeline

ardumont triaged this task as Normal priority.Oct 28 2021, 9:33 AM
ardumont created this task.

From the top of my head, 2 possible issues:

  1. svn external properties on the origin. That will get detected by the loader and

raise.

  1. history tampering happened on the origin after the first loading. The loader detects

it and raise such issue and stops.

  1. is still T611, should be taken care of at some point by T3692
  1. For this one, i had a thought about it recently and it should be enough to trigger a

run from scratch (loader has the flag for it which is tested but got never used).

Cheers,

So i've had a look in the end [1], it's indeed the history altered issue (2.).

Well, that and one other issue regarding no long disk space. Independently from the out of disk issue, subsequent visits fails on history alrered detection issue.

Logs extracted as csv [2].

[1] http://kibana0.internal.softwareheritage.org:5601/goto/62e3ce35d4492400dbbd633c101a91d7

[2]

"@timestamp",message,priority,hostname
"[""2021-10-20T11:16:23.942Z"",""2021-10-20T11:16:23.942Z""]","[2021-10-20 11:16:23,942: INFO/ForkPoolWorker-1] Task swh.loader.svn.tasks.DumpMountAndLoadSvnRepository[709297c6-4de6-4a16-b7bb-a242c9ca8234] succeeded in 235.8857956090942s: {'status': 'failed'}",6,worker15
"[""2021-10-20T11:16:22.823Z"",""2021-10-20T11:16:22.823Z""]","[2021-10-20 11:16:22,818: ERROR/ForkPoolWorker-1] Loading failure, updating to `failed` status
Traceback (most recent call last):
  File ""/usr/lib/python3/dist-packages/swh/loader/core/loader.py"", line 339, in load
    self.store_data()
  File ""/usr/lib/python3/dist-packages/swh/loader/svn/loader.py"", line 489, in store_data
    revision=self._last_revision, snapshot=self._snapshot
  File ""/usr/lib/python3/dist-packages/swh/loader/svn/loader.py"", line 525, in generate_and_load_snapshot
    ""generate_and_load_snapshot called with null revision and snapshot!""
ValueError: generate_and_load_snapshot called with null revision and snapshot!",3,worker15
"[""2021-10-20T11:16:22.818Z"",""2021-10-20T11:16:22.818Z""]","[2021-10-20 11:16:22,817: ERROR/ForkPoolWorker-1] History of svn file:///tmp/tmpz9rde58m/swh.loader.svn.k4hkq8ka-1239536/tmpt4qtuquh@607 altered. Skipping...",3,worker15
"[""2021-10-20T11:12:28.213Z"",""2021-10-20T11:12:28.213Z""]","[2021-10-20 11:12:28,213: INFO/ForkPoolWorker-1] Load origin 'https://profs.scienze.univr.it/posenato/svn/sw/CSTNU' with type 'svn'",6,worker15
"[""2021-10-20T11:08:10.088Z"",""2021-10-20T11:08:10.088Z""]","[2021-10-20 11:08:10,087: INFO/ForkPoolWorker-1] Task swh.loader.svn.tasks.DumpMountAndLoadSvnRepository[39bd536d-fc0e-4d06-a229-46bba2a033d2] succeeded in 268.00870383530855s: {'status': 'eventful'}",6,worker06
"[""2021-10-20T11:07:01.963Z"",""2021-10-20T11:07:01.963Z""]","[2021-10-20 11:07:01,962: INFO/ForkPoolWorker-1] Processing revisions [566-607] for {'swh-origin': 'https://profs.scienze.univr.it/posenato/svn/sw/CSTNU', 'remote_url': 'file:///tmp/tmpbdl0vo4y/swh.loader.svn.9l3j76ga-347873/tmpb3_qh849', 'local_url': b'/tmp/swh.loader.svn.53habc1i-347873/tmpb3_qh849', 'uuid': b'782453a1-1937-45d1-8845-2a6fcc2839b7'}",6,worker06
"[""2021-10-20T11:03:43.090Z"",""2021-10-20T11:03:43.090Z""]","[2021-10-20 11:03:43,090: INFO/ForkPoolWorker-1] Load origin 'https://profs.scienze.univr.it/posenato/svn/sw/CSTNU' with type 'svn'",6,worker06

For this one, i had a thought about it recently and it should be enough to trigger a
run from scratch (loader has the flag for it which is tested but got never used).

I tried and it failed.
So a fix is probably in order first.
That'd be T3695.

Hi,
if you need to have an SVN log from the SVN server, I can ask our sysadmins to have an extract of the log.

@posenato, I did some deeper analysis on why your svn repository could not be loaded anymore into the archive
and it turned out @ardumont is right, the commit log of your repository got altered between two save code now requests.

The image below show the commit log of your repository as currently archived by SWH.

The image below show the commit log of your repository loaded in our SWH docker environment used to develop and test new features.

I have highlighted the commit that got altered between the two loads, as you can see the commit message was initially Revision 4.3
but it was later modified to Preparing Revision 4.3.

As a consequence, the revision hashes differ between the two loads as commit message is used to generate revision identifier.

Because the subversion loader works in an incremental way to avoid reprocessing the full commit log in each loading, it first checks
if svn repository state is consistent as it was during previous loadings. If a discrepancy is detected, the loading is aborted.

So the only way to make the loading works again in production archive is to do a full reload of the repository.
I guess we could trigger that from our side manually until we decided a policy about such edge cases (but we
need to wait for @ardumont being back from vacation first).

@anlambert, thank you for excellent analysis!
Yes, I made an adjustment of commit logs because there was an error during the release and I discover it very soon, so I preferred to make the release again adjusting the comments instead of to make a new release!

I never thought that such a log modification can have a such important impact.
Lesson learned.

I hope you can full reload the repository ASAP.

Thanks again for your great work.

Roberto

All fixes have currently landed, got packaged in v0.7.3.
I've deployed it and triggered a run [1] for the origin (ongoing).

[1] P1214

It has just finished saving the repository.
It worked perfectly.
Thank you a lot for your support.

Roberto

@posenato Hello, i was gonna check this morning and ping you to let you know.

I never thought that such a log modification can have a such important impact. Lesson
learned.

On both side! :) The awesome outcome is now we have improved the loader svn on this
current limitation. So thank you as well for asking the question in the first place.

I'm closing the issue now.

Cheers,

ardumont claimed this task.