Page MenuHomeSoftware Heritage

common/converters: Harmonize UTF-8 decoding errors handling
ClosedPublic

Authored by anlambert on Sep 22 2020, 12:22 PM.

Details

Summary

Remove special UTF-8 decoding errors handling for the "message" value
of a swh revision dictionary and use global decoding errors handler
instead.

As a reminder, the global handler for UTF-8 decoding errors performs
the following actions:

  • It puts all key names of a dictionary where UTF-8 decoding of values failed in a list and store it under a new key "decoding_failures".
  • A string that could not be decoded will have the bytes of its invalid UTF-8 sequences escaped.

Also add section about UTF-8 decoding errors in top level API documentation.

Closes T2617

Diff Detail

Repository
rDWAPPS Web applications
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D4003 (id=14123)

Rebasing onto c45a6af2f7...

Current branch diff-target is up to date.
Changes applied before test
commit 1694f6b03f9bd5283918dc7bbcb681700167b8c4
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Tue Sep 22 12:08:14 2020 +0200

    templates/apidoc-header: Add section about UTF-8 decoding errors
    
    Closes T2617

commit c56cbbb64bdbec58e10d43128d7af777f1d02335
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Tue Sep 22 12:07:36 2020 +0200

    common/converters: Harmonize UTF-8 decoding errors handling
    
    Remove special UTF-8 decoding errors handling for the "message" value
    of a swh revision dictionary and use global decoding errors handler
    instead.
    
    As a reminder, the global handler for UTF-8 decoding errors performs
    the following actions:
    
      - It puts all key names of a dictionary where UTF-8 decoding of values failed
        in a list and store it under a new key "decoding_failures".
    
      - A string that could not be decoded will have the bytes of its invalid
        UTF-8 sequences escaped.
    
    Related to T2617

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/379/ for more details.

This revision is now accepted and ready to land.Sep 22 2020, 1:30 PM