Page MenuHomeSoftware Heritage

Make the replayer not crash on kafka messages that fail to be converted as model objects
ClosedPublic

Authored by douardda on Oct 21 2022, 2:20 PM.

Details

Summary

for example, there are a few kafka directory messages which entries
contain the same entry name several times, preventing the Directory
model object from being created at all, which make the replayer crash.

This makes the replayer able to handle such cases. When the model object
creation fails with a ValueError, the error is reported in the (redis)
error reporter, but the replaying process continue.

Since there is no model object, the error is reported with a crafted
error key of the form "{object_type}:{object_id}" if an object id is
present in the data sctructure, or "{object_type}:uuid:{uuid4}" if such
an id is not even present. For the record, the standard error key in
redis for a model object is it's swhid (if any).

Diff Detail

Event Timeline

Build is green

Patch application report for D8751 (id=31535)

Rebasing onto 784f730e3a...

First, rewinding head to replay your work on top of it...
Applying: Add a comment that should have been "kept" from 850a7553b
Applying: Make the replayer not crash on kafka messages that fail to be converted as model objects
Changes applied before test
commit 07a3708e3c38d5fdb1c8d13b9f84ace3d5de9a5d
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Oct 21 14:09:07 2022 +0200

    Make the replayer not crash on kafka messages that fail to be converted as model objects
    
    for example, there are a few kafka directory messages which entries
    contain the same entry name several times, preventing the Directory
    model object from being created at all, which make the replayer crash.
    
    This makes the replayer able to handle such cases. When the model object
    creation fails with a ValueError, the error is reported in the (redis)
    error reporter, but the replaying process continue.
    
    Since there is no model object, the error is reported with a crafted
    error key of the form "{object_type}:{object_id}" if an object id is
    present in the data sctructure, or "{object_type}:uuid:{uuid4}" if such
    an id is not even present. For the record, the standard error key in
    redis for a model object is it's swhid (if any).

commit 30d45f0a7383f0a5f8cf9cbebf12ba747835d6c5
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Oct 21 14:05:23 2022 +0200

    Add a comment that should have been "kept" from 850a7553b

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1682/ for more details.

Build is green

Patch application report for D8751 (id=31536)

Rebasing onto 784f730e3a...

First, rewinding head to replay your work on top of it...
Applying: Add a comment that should have been "kept" from 850a7553b
Applying: Make the replayer not crash on kafka messages that fail to be converted as model objects
Changes applied before test
commit 8f03fd26b3ebb84fb1a43af68fdda274d30888f3
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Oct 21 14:09:07 2022 +0200

    Make the replayer not crash on kafka messages that fail to be converted as model objects
    
    for example, there are a few kafka directory messages in the current
    production kafka cluster which entries contain the same name twice,
    preventing the Directory model object from being created at all,
    which makes the replayer crash.
    
    This change makes the replayer able to handle such cases. When the model
    object creation fails with a ValueError, the error is reported in the
    (redis) error reporter, but the replaying process continue.
    
    Since there is no model object, the error is reported with a crafted
    error key of the form "{object_type}:{object_id}" if an object id is
    present in the data sctructure, or "{object_type}:uuid:{uuid4}" if such
    an id is not even present. For the record, the standard error key in
    redis for a model object is it's swhid (if any).

commit 1c0f4c9621cbbeff68eff7383e34a35f810f5311
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Oct 21 14:05:23 2022 +0200

    Add a comment that should have been "kept" from 850a7553b

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1683/ for more details.

anlambert added a subscriber: anlambert.

Looks good to me , just noticed some remaining typos: one in code and another in commit message (s/sctructure/structure).

swh/storage/replay.py
69

s/is/if/

This revision is now accepted and ready to land.Oct 21 2022, 3:42 PM

Looks good to me , just noticed some remaining typos: one in code and another in commit message (s/sctructure/structure).

thx

fix typos as reported by anlambert

Build is green

Patch application report for D8751 (id=31538)

Rebasing onto 784f730e3a...

First, rewinding head to replay your work on top of it...
Applying: Add a comment that should have been "kept" from 850a7553b
Applying: Make the replayer not crash on kafka messages that fail to be converted as model objects
Changes applied before test
commit 0f6590f33448eb106a7dcd4c48d549d2bd14d6c3
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Oct 21 14:09:07 2022 +0200

    Make the replayer not crash on kafka messages that fail to be converted as model objects
    
    for example, there are a few kafka directory messages in the current
    production kafka cluster which entries contain the same name twice,
    preventing the Directory model object from being created at all,
    which makes the replayer crash.
    
    This change makes the replayer able to handle such cases. When the model
    object creation fails with a ValueError, the error is reported in the
    (redis) error reporter, but the replaying process continue.
    
    Since there is no model object, the error is reported with a crafted
    error key of the form "{object_type}:{object_id}" if an object id is
    present in the data structure, or "{object_type}:uuid:{uuid4}" if such
    an id is not even present. For the record, the standard error key in
    redis for a model object is it's swhid (if any).

commit 508a2e4959afddd7bc437805f2f7f1fd0474003f
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Oct 21 14:05:23 2022 +0200

    Add a comment that should have been "kept" from 850a7553b

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1684/ for more details.

Build is green

Patch application report for D8751 (id=31542)

Rebasing onto 784f730e3a...

Current branch diff-target is up to date.
Changes applied before test
commit fe0eaee8718bfd33d8ba383242e3abb09ad59634
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Oct 21 14:09:07 2022 +0200

    Make the replayer not crash on kafka messages that fail to be converted as model objects
    
    for example, there are a few kafka directory messages in the current
    production kafka cluster which entries contain the same name twice,
    preventing the Directory model object from being created at all,
    which makes the replayer crash.
    
    This change makes the replayer able to handle such cases. When the model
    object creation fails with a ValueError, the error is reported in the
    (redis) error reporter, but the replaying process continue.
    
    Since there is no model object, the error is reported with a crafted
    error key of the form "{object_type}:{object_id}" if an object id is
    present in the data structure, or "{object_type}:uuid:{uuid4}" if such
    an id is not even present. For the record, the standard error key in
    redis for a model object is it's swhid (if any).

commit 242e37a7be35b931986724c5adc75dc90284f0fd
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Oct 21 14:05:23 2022 +0200

    Add a comment that should have been "kept" from 850a7553b

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1685/ for more details.