Fail gracefully if the revision decoding process fails
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	jbertran
	Jun 9 2016, 2:07 PM

Description

Currently, we assume that the revision data (except its message) are always utf8-encoded. We would like to have the ability to catch any and all decoding failures within the converter, and to provide another manner of accessing the content we were not able to decode (downloading the raw data?), in the same fashion as the revision message is being handled.

Revisions and Commits

rDWAPPS Web applications
	Closed	D45 T435-c
rDSTO Storage manager
	Closed	D44 T435-b
rDCORE Foundations and core functionalities
	Closed	D43 T435-a

Related Objects

Mentioned In: T297: Try to decode the revision's message data and fail gracefully
D44: T435-b
D43: T435-a

Event Timeline

jbertran triaged this task as Normal priority.Jun 9 2016, 2:07 PM

jbertran created this task.

jbertran created this object in space S1 Public.

Some of the fields currently assumed to be UTF-8 are:

author/committer name
author/committer email
author/committer full name

However, for those fields, I believe a "raw download" is a bit too much and we should rather look at somehow escaping the field.

You should also make sure that the same process is applied to releases.

Finally, there are also some occurrence "branch names" that aren't proper UTF-8.

For inspiration, swh.storage.converters.decode_with_escape converts raw bytes into a backslash-escaped unicode codepoint sequence that is valid for JSON serialization. Its purpose is to allow serializing arbitrary byte sequences into a PostgreSQL jsonb field, but could probably be moved into swh.core and reused for that purpose.

jbertran mentioned this in D43: T435-a.Jun 10 2016, 2:16 PM

jbertran added a revision: D43: T435-a.Jun 10 2016, 2:19 PM

jbertran added a revision: D44: T435-b.Jun 10 2016, 3:02 PM

jbertran mentioned this in D44: T435-b.

jbertran added a revision: D45: T435-c.Jun 10 2016, 4:06 PM

jbertran closed this task as Resolved.Jun 13 2016, 4:01 PM

jbertran mentioned this in T297: Try to decode the revision's message data and fail gracefully.Jun 15 2016, 3:44 PM

This task has been migrated to GitLab.

Fail gracefully if the revision decoding process failsClosed, MigratedEdits LockedActions

Description

Revisions and Commits

Related Objects

Event Timeline

Fail gracefully if the revision decoding process fails
Closed, MigratedEdits Locked
Actions