Page MenuHomeSoftware Heritage

inbound_email: add function to extract the plaintext from a mail
ClosedPublic

Authored by olasd on Apr 5 2022, 1:17 PM.

Details

Summary

This function uses the html part if no text part is available.

Depends on D7499
Related to T3999

Test Plan

tests added

Diff Detail

Repository
rDWAPPS Web applications
Branch
inbound-email-extract-plaintext
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 28193
Build 44142: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 44141: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D7503 (id=27218)

Could not rebase; Attempt merge onto 902039b683...

Updating 902039b6..c49d22e2
Fast-forward
 .pre-commit-config.yaml                            |   1 +
 MANIFEST.in                                        |   1 +
 requirements-test.txt                              |   2 +-
 swh/web/inbound_email/utils.py                     | 157 +++++++++++++++++++--
 swh/web/tests/inbound_email/__init__.py            |   0
 swh/web/tests/inbound_email/resources/__init__.py  |   0
 .../tests/inbound_email/resources/multipart.eml    |  24 ++++
 .../tests/inbound_email/resources/plaintext.eml    |  15 ++
 swh/web/tests/inbound_email/test_utils.py          | 152 ++++++++++++++++++++
 9 files changed, 343 insertions(+), 9 deletions(-)
 create mode 100644 swh/web/tests/inbound_email/__init__.py
 create mode 100644 swh/web/tests/inbound_email/resources/__init__.py
 create mode 100644 swh/web/tests/inbound_email/resources/multipart.eml
 create mode 100644 swh/web/tests/inbound_email/resources/plaintext.eml
Changes applied before test
commit c49d22e291cdb95e5cc9cbf784fc3648b56fc3bb
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Apr 1 17:04:30 2022 +0200

    inbound_email: add function to extract the plaintext from a mail
    
    This function uses the html part if no text part is available.

commit 9a5e6bc41eb1255648883085981befaa08cbee90
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Mar 31 16:01:56 2022 +0200

    inbound_email: add support for signed email addresses
    
    These utilities allow us to generate addresses of the form
    `<localpart>+<integer>.<signature>@<domain>`, where the integer is the
    primary key of a given object for which we want to track email
    exchanges. The signature prevents the addresses from being forged, that
    is, the addresses have to be explicitly generated and displayed by the
    web app to be discovered.
    
    The counterpart function retrieves all relevant email addresses from the
    list of recipients of an email message, and validates the signatures to
    recover the integer values that won't have been tampered with.
    
    These new utilities are expected to be used in the views and signal
    handlers pertaining to processing of inbound email messages.

commit a594dd506f581afddf1ca483ba49afe3e6695daf
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Apr 1 15:16:11 2022 +0200

    inbound_email: split recipient matching logic out
    
    This allows calling the function on a single recipient rather than on a
    whole message, when one isn't available.

commit e46b75a9c5afe5ae58b67d1bf4a5c67e8363c1a0
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Apr 1 15:15:05 2022 +0200

    Restrict pytest-postgresql to < 4.0.0
    
    Other modules still need psycopg2 and pytest-postgresql 4 introduced a
    hard dependency on psycopg3.
    
    This restriction has only been needed since a recent dependency
    upgrade (or maybe a pip upgrade); pip has stopped being able to solve it
    itself for some reason.

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1673/ for more details.

olasd requested review of this revision.Apr 5 2022, 1:45 PM
ardumont added a subscriber: ardumont.

lgtm, one question inline.

swh/web/inbound_email/utils.py
195

since you are here already, can't you also to add the html fallback coverage as well?

This revision is now accepted and ready to land.Apr 5 2022, 2:33 PM
  • Rebase
  • Add coverage for the html-only case
  • Add coverage for the "multiple ambiguous parts length" behavior

Build was aborted

Patch application report for D7503 (id=27259)

Could not rebase; Attempt merge onto 8126ea65db...

Updating 8126ea65..60b30653
Fast-forward
 .pre-commit-config.yaml                            |   1 +
 MANIFEST.in                                        |   1 +
 swh/web/inbound_email/utils.py                     | 126 +++++++++++++++-
 swh/web/tests/inbound_email/__init__.py            |   0
 swh/web/tests/inbound_email/resources/__init__.py  |   0
 .../tests/inbound_email/resources/multipart.eml    |  24 +++
 .../resources/multipart_html_only.eml              |  21 +++
 .../resources/multipart_text_only.eml              |  27 ++++
 .../tests/inbound_email/resources/plaintext.eml    |  15 ++
 swh/web/tests/inbound_email/test_utils.py          | 166 +++++++++++++++++++++
 10 files changed, 380 insertions(+), 1 deletion(-)
 create mode 100644 swh/web/tests/inbound_email/__init__.py
 create mode 100644 swh/web/tests/inbound_email/resources/__init__.py
 create mode 100644 swh/web/tests/inbound_email/resources/multipart.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_html_only.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_text_only.eml
 create mode 100644 swh/web/tests/inbound_email/resources/plaintext.eml
Changes applied before test
commit 60b306530a2612a131368677cbad83a342e2e5e0
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Apr 1 17:04:30 2022 +0200

    inbound_email: add function to extract the plaintext from a mail
    
    This function uses the html part if no text part is available. If
    multiple plain text or html parts are available, it uses the largest
    one.

commit 5829b8ba8dc80e2cf60b7c7d95a20daa2a87aacc
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Mar 31 16:01:56 2022 +0200

    inbound_email: add support for signed email addresses
    
    These utilities allow us to generate addresses of the form
    `<localpart>+<integer>.<signature>@<domain>`, where the integer is the
    primary key of a given object for which we want to track email
    exchanges. The signature prevents the addresses from being forged, that
    is, the addresses have to be explicitly generated and displayed by the
    web app to be discovered.
    
    The counterpart function retrieves all relevant email addresses from the
    list of recipients of an email message, and validates the signatures to
    recover the integer values that won't have been tampered with.
    
    These new utilities are expected to be used in the views and signal
    handlers pertaining to processing of inbound email messages.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1686/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1686/console

Build was aborted

Patch application report for D7503 (id=27265)

Could not rebase; Attempt merge onto 120076eea4...

Updating 120076ee..17cd0360
Fast-forward
 .pre-commit-config.yaml                            |   1 +
 MANIFEST.in                                        |   1 +
 swh/web/inbound_email/utils.py                     | 126 +++++++++++++++-
 swh/web/tests/inbound_email/__init__.py            |   0
 swh/web/tests/inbound_email/resources/__init__.py  |   0
 .../tests/inbound_email/resources/multipart.eml    |  24 +++
 .../resources/multipart_html_only.eml              |  21 +++
 .../resources/multipart_text_only.eml              |  27 ++++
 .../tests/inbound_email/resources/plaintext.eml    |  15 ++
 swh/web/tests/inbound_email/test_utils.py          | 166 +++++++++++++++++++++
 10 files changed, 380 insertions(+), 1 deletion(-)
 create mode 100644 swh/web/tests/inbound_email/__init__.py
 create mode 100644 swh/web/tests/inbound_email/resources/__init__.py
 create mode 100644 swh/web/tests/inbound_email/resources/multipart.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_html_only.eml
 create mode 100644 swh/web/tests/inbound_email/resources/multipart_text_only.eml
 create mode 100644 swh/web/tests/inbound_email/resources/plaintext.eml
Changes applied before test
commit 17cd03607c4e0e26feddb1b5ed82c17c69584719
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Apr 1 17:04:30 2022 +0200

    inbound_email: add function to extract the plaintext from a mail
    
    This function uses the html part if no text part is available. If
    multiple plain text or html parts are available, it uses the largest
    one.

commit 841919a3c8a6a74b063d8c2b97f458ba6318e71a
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Mar 31 16:01:56 2022 +0200

    inbound_email: add support for signed email addresses
    
    These utilities allow us to generate addresses of the form
    `<localpart>+<integer>.<signature>@<domain>`, where the integer is the
    primary key of a given object for which we want to track email
    exchanges. The signature prevents the addresses from being forged, that
    is, the addresses have to be explicitly generated and displayed by the
    web app to be discovered.
    
    The counterpart function retrieves all relevant email addresses from the
    list of recipients of an email message, and validates the signatures to
    recover the integer values that won't have been tampered with.
    
    These new utilities are expected to be used in the views and signal
    handlers pertaining to processing of inbound email messages.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1689/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1689/console

This revision was landed with ongoing or failed builds.Apr 6 2022, 6:37 PM
This revision was automatically updated to reflect the committed changes.