Page MenuHomeSoftware Heritage

Add support for recursive multipart messages
ClosedPublic

Authored by vlorentz on Apr 7 2022, 11:44 AM.

Details

Summary

Before this commit, parsing the test file would just return
and more plain text because it is the largest text part.

This uses message.get_payload() instead of message.walk(), because
message.walk() implements a bottom-up DFS; but with no easy way of
propagating information between nodes.

Depends on D7517.

Diff Detail

Repository
rDWAPPS Web applications
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7518 (id=27280)

Could not rebase; Attempt merge onto 17cd03607c...

Updating 17cd0360..9ae798a6
Fast-forward
 swh/web/inbound_email/utils.py                     | 87 ++++++++++++++--------
 .../{multipart.eml => multipart_alternative.eml}   |  0
 ...nly.eml => multipart_alternative_html_only.eml} |  0
 .../resources/multipart_alternative_recursive.eml  | 45 +++++++++++
 ...nly.eml => multipart_alternative_text_only.eml} |  0
 .../inbound_email/resources/multipart_mixed.eml    | 23 ++++++
 .../inbound_email/resources/multipart_mixed2.eml   | 25 +++++++
 .../resources/multipart_mixed_text_only.eml        | 36 +++++++++
 swh/web/tests/inbound_email/test_utils.py          | 40 ++++++++--
 9 files changed, 220 insertions(+), 36 deletions(-)
 rename swh/web/tests/inbound_email/resources/{multipart.eml => multipart_alternative.eml} (100%)
 rename swh/web/tests/inbound_email/resources/{multipart_html_only.eml => multipart_alternative_html_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_alternative_recursive.eml
 rename swh/web/tests/inbound_email/resources/{multipart_text_only.eml => multipart_alternative_text_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed2.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed_text_only.eml
Changes applied before test
commit 9ae798a6830b7194b5582a8c249bc0f24d57152a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 11:40:38 2022 +0200

    Add support for recursive multipart messages
    
    Before this commit, parsing the test file would just return
    `and more plain text` because it is the largest text part.
    
    This uses `message.get_payload()` instead of `message.walk()`, because
    `message.walk()` implements a bottom-up DFS; but with no easy way of
    propagating information between nodes.

commit 4f5f426891391a1e72f61ce53978503bd219cd0c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 10:56:08 2022 +0200

    Add support for multipart/mixed + better fallback for multipart/*

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1691/ for more details.

rebase + anonymize test data + add test for multipart/related

Build is green

Patch application report for D7518 (id=27285)

Could not rebase; Attempt merge onto 17cd03607c...

Updating 17cd0360..ae9a460c
Fast-forward
 swh/web/inbound_email/utils.py                     | 87 ++++++++++++++--------
 .../{multipart.eml => multipart_alternative.eml}   |  0
 ...nly.eml => multipart_alternative_html_only.eml} |  0
 .../resources/multipart_alternative_recursive.eml  | 45 +++++++++++
 ...nly.eml => multipart_alternative_text_only.eml} |  0
 .../inbound_email/resources/multipart_mixed.eml    | 23 ++++++
 .../inbound_email/resources/multipart_mixed2.eml   | 25 +++++++
 .../resources/multipart_mixed_text_only.eml        | 36 +++++++++
 .../inbound_email/resources/multipart_related.eml  | 42 +++++++++++
 swh/web/tests/inbound_email/test_utils.py          | 49 ++++++++++--
 10 files changed, 271 insertions(+), 36 deletions(-)
 rename swh/web/tests/inbound_email/resources/{multipart.eml => multipart_alternative.eml} (100%)
 rename swh/web/tests/inbound_email/resources/{multipart_html_only.eml => multipart_alternative_html_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_alternative_recursive.eml
 rename swh/web/tests/inbound_email/resources/{multipart_text_only.eml => multipart_alternative_text_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed2.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed_text_only.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_related.eml
Changes applied before test
commit ae9a460cbaaf94e7e3c990355f2764f8afc89912
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 11:40:38 2022 +0200

    Add support for recursive multipart messages
    
    Before this commit, parsing the test file would just return
    `and more plain text` because it is the largest text part.
    
    This uses `message.get_payload()` instead of `message.walk()`, because
    `message.walk()` implements a bottom-up DFS; but with no easy way of
    propagating information between nodes.

commit 5444626178f72c89d08072f41716bf4427772c24
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 10:56:08 2022 +0200

    Add support for multipart/mixed + better fallback for multipart/*

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1693/ for more details.

This revision is now accepted and ready to land.Apr 8 2022, 11:34 AM
olasd added a subscriber: olasd.

Thanks!

This revision was landed with ongoing or failed builds.Apr 13 2022, 1:55 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D7518 (id=27410)

Could not rebase; Attempt merge onto e5115fdbdf...

Updating e5115fdb..120e48ba
Fast-forward
 swh/web/inbound_email/utils.py                     | 87 ++++++++++++++--------
 .../{multipart.eml => multipart_alternative.eml}   |  0
 ...nly.eml => multipart_alternative_html_only.eml} |  0
 .../resources/multipart_alternative_recursive.eml  | 45 +++++++++++
 ...nly.eml => multipart_alternative_text_only.eml} |  0
 .../inbound_email/resources/multipart_mixed.eml    | 23 ++++++
 .../inbound_email/resources/multipart_mixed2.eml   | 25 +++++++
 .../resources/multipart_mixed_text_only.eml        | 36 +++++++++
 .../inbound_email/resources/multipart_related.eml  | 42 +++++++++++
 swh/web/tests/inbound_email/test_utils.py          | 49 ++++++++++--
 10 files changed, 271 insertions(+), 36 deletions(-)
 rename swh/web/tests/inbound_email/resources/{multipart.eml => multipart_alternative.eml} (100%)
 rename swh/web/tests/inbound_email/resources/{multipart_html_only.eml => multipart_alternative_html_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_alternative_recursive.eml
 rename swh/web/tests/inbound_email/resources/{multipart_text_only.eml => multipart_alternative_text_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed2.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed_text_only.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_related.eml
Changes applied before test
commit 120e48badec244b9820efbc0cc7c2c5af08831c9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 11:40:38 2022 +0200

    Add support for recursive multipart messages
    
    Before this commit, parsing the test file would just return
    `and more plain text` because it is the largest text part.
    
    This uses `message.get_payload()` instead of `message.walk()`, because
    `message.walk()` implements a bottom-up DFS; but with no easy way of
    propagating information between nodes.

commit ae8b3148611bb44cd1b16126c49efc40097aca7f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 10:56:08 2022 +0200

    Add support for multipart/mixed + better fallback for multipart/*

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1712/ for more details.