Page MenuHomeSoftware Heritage

Add support for recursive multipart messages
ClosedPublic

Authored by vlorentz on Apr 7 2022, 11:44 AM.

Details

Summary

Before this commit, parsing the test file would just return
and more plain text because it is the largest text part.

This uses message.get_payload() instead of message.walk(), because
message.walk() implements a bottom-up DFS; but with no easy way of
propagating information between nodes.

Depends on D7517.

Diff Detail

Repository
rDWAPPS Web applications
Branch
email
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 28210
Build 44167: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 44166: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D7518 (id=27280)

Could not rebase; Attempt merge onto 17cd03607c...

Updating 17cd0360..9ae798a6
Fast-forward
 swh/web/inbound_email/utils.py                     | 87 ++++++++++++++--------
 .../{multipart.eml => multipart_alternative.eml}   |  0
 ...nly.eml => multipart_alternative_html_only.eml} |  0
 .../resources/multipart_alternative_recursive.eml  | 45 +++++++++++
 ...nly.eml => multipart_alternative_text_only.eml} |  0
 .../inbound_email/resources/multipart_mixed.eml    | 23 ++++++
 .../inbound_email/resources/multipart_mixed2.eml   | 25 +++++++
 .../resources/multipart_mixed_text_only.eml        | 36 +++++++++
 swh/web/tests/inbound_email/test_utils.py          | 40 ++++++++--
 9 files changed, 220 insertions(+), 36 deletions(-)
 rename swh/web/tests/inbound_email/resources/{multipart.eml => multipart_alternative.eml} (100%)
 rename swh/web/tests/inbound_email/resources/{multipart_html_only.eml => multipart_alternative_html_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_alternative_recursive.eml
 rename swh/web/tests/inbound_email/resources/{multipart_text_only.eml => multipart_alternative_text_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed2.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed_text_only.eml
Changes applied before test
commit 9ae798a6830b7194b5582a8c249bc0f24d57152a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 11:40:38 2022 +0200

    Add support for recursive multipart messages
    
    Before this commit, parsing the test file would just return
    `and more plain text` because it is the largest text part.
    
    This uses `message.get_payload()` instead of `message.walk()`, because
    `message.walk()` implements a bottom-up DFS; but with no easy way of
    propagating information between nodes.

commit 4f5f426891391a1e72f61ce53978503bd219cd0c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 10:56:08 2022 +0200

    Add support for multipart/mixed + better fallback for multipart/*

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1691/ for more details.

rebase + anonymize test data + add test for multipart/related

Build is green

Patch application report for D7518 (id=27285)

Could not rebase; Attempt merge onto 17cd03607c...

Updating 17cd0360..ae9a460c
Fast-forward
 swh/web/inbound_email/utils.py                     | 87 ++++++++++++++--------
 .../{multipart.eml => multipart_alternative.eml}   |  0
 ...nly.eml => multipart_alternative_html_only.eml} |  0
 .../resources/multipart_alternative_recursive.eml  | 45 +++++++++++
 ...nly.eml => multipart_alternative_text_only.eml} |  0
 .../inbound_email/resources/multipart_mixed.eml    | 23 ++++++
 .../inbound_email/resources/multipart_mixed2.eml   | 25 +++++++
 .../resources/multipart_mixed_text_only.eml        | 36 +++++++++
 .../inbound_email/resources/multipart_related.eml  | 42 +++++++++++
 swh/web/tests/inbound_email/test_utils.py          | 49 ++++++++++--
 10 files changed, 271 insertions(+), 36 deletions(-)
 rename swh/web/tests/inbound_email/resources/{multipart.eml => multipart_alternative.eml} (100%)
 rename swh/web/tests/inbound_email/resources/{multipart_html_only.eml => multipart_alternative_html_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_alternative_recursive.eml
 rename swh/web/tests/inbound_email/resources/{multipart_text_only.eml => multipart_alternative_text_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed2.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed_text_only.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_related.eml
Changes applied before test
commit ae9a460cbaaf94e7e3c990355f2764f8afc89912
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 11:40:38 2022 +0200

    Add support for recursive multipart messages
    
    Before this commit, parsing the test file would just return
    `and more plain text` because it is the largest text part.
    
    This uses `message.get_payload()` instead of `message.walk()`, because
    `message.walk()` implements a bottom-up DFS; but with no easy way of
    propagating information between nodes.

commit 5444626178f72c89d08072f41716bf4427772c24
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 10:56:08 2022 +0200

    Add support for multipart/mixed + better fallback for multipart/*

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1693/ for more details.

This revision is now accepted and ready to land.Apr 8 2022, 11:34 AM
olasd added a subscriber: olasd.

Thanks!

This revision was landed with ongoing or failed builds.Apr 13 2022, 1:55 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D7518 (id=27410)

Could not rebase; Attempt merge onto e5115fdbdf...

Updating e5115fdb..120e48ba
Fast-forward
 swh/web/inbound_email/utils.py                     | 87 ++++++++++++++--------
 .../{multipart.eml => multipart_alternative.eml}   |  0
 ...nly.eml => multipart_alternative_html_only.eml} |  0
 .../resources/multipart_alternative_recursive.eml  | 45 +++++++++++
 ...nly.eml => multipart_alternative_text_only.eml} |  0
 .../inbound_email/resources/multipart_mixed.eml    | 23 ++++++
 .../inbound_email/resources/multipart_mixed2.eml   | 25 +++++++
 .../resources/multipart_mixed_text_only.eml        | 36 +++++++++
 .../inbound_email/resources/multipart_related.eml  | 42 +++++++++++
 swh/web/tests/inbound_email/test_utils.py          | 49 ++++++++++--
 10 files changed, 271 insertions(+), 36 deletions(-)
 rename swh/web/tests/inbound_email/resources/{multipart.eml => multipart_alternative.eml} (100%)
 rename swh/web/tests/inbound_email/resources/{multipart_html_only.eml => multipart_alternative_html_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_alternative_recursive.eml
 rename swh/web/tests/inbound_email/resources/{multipart_text_only.eml => multipart_alternative_text_only.eml} (100%)
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed2.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_mixed_text_only.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_related.eml
Changes applied before test
commit 120e48badec244b9820efbc0cc7c2c5af08831c9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 11:40:38 2022 +0200

    Add support for recursive multipart messages
    
    Before this commit, parsing the test file would just return
    `and more plain text` because it is the largest text part.
    
    This uses `message.get_payload()` instead of `message.walk()`, because
    `message.walk()` implements a bottom-up DFS; but with no easy way of
    propagating information between nodes.

commit ae8b3148611bb44cd1b16126c49efc40097aca7f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Apr 7 10:56:08 2022 +0200

    Add support for multipart/mixed + better fallback for multipart/*

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1712/ for more details.