Email Parsing: Don't Parse text/plain

A few weeks ago I worked on a feature over at Zaarly which allowed our users to reply directly to messaging-related emails. This was made relatively straightforward by using the Griddler gem along with Sendgrid's parse API.

Almost everything worked right out of the box and we even quietly deployed the feature without realizing there was a rather noticable bug in the code.

Pesky newlines

Email replies via the Mac Mail App came through as expected but replies via Gmail's web interface came through looking... a bit off.

Email comparison

The newlines are clearly present in the text/plain part, which Griddler prefers to the text/html part.

----==_mimepart_528a5caa75e8a_72e03fc1b3271e746546d
Content-Type: text/plain;  
charset=UTF-8  
Content-Transfer-Encoding: quoted-printable

Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed  
diam nonummy  
nibh euismod tincidunt ut laoreet dolore magna aliquam erat  
volutpat. Ut  
wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper  
suscipit  
lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel  
eum iriure  
dolor in hendrerit in vulputate velit esse molestie consequat,  
vel illum  
dolore eu feugiat nulla facilisis at vero eros et accumsan et  
iusto odio  
...

Others noticed the same thing. It turns out Section 2.1.1 of RFC 2822 recommends that lines do not exceed 78 lines, so Gmail isn't wrong in modifying the content of the email:

There are two limits that this standard places on the number of characters in a line. Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters, excluding the CRLF.

Newlines are newlines

I initially explored two routes:
1. Make sure the width of our emails and conversation view allowed for 78 characters, which it currently did not.
2. Somehow remove the newlines that were added by Gmail.

The first required both native mobile apps, messaging-related emails, and our web interface to be updated. Yikes. The second wasn't really a solution either. Newlines are newlines and those added by users can't be differentiated from those added by Gmail.

How have I or anyone else on the team never noticed this before? Surely we would have noticed newlines being added to all of our emails all these years. Well, the newlines are only added to the text/plain part and email clients generally render the text/html part.

Premailer gem

If the newlines were not added to the text/html part then maybe we could strip all HTML from the part and tell Griddler to prefer it to the text/plain part? Yep, that's exactly what we ended up doing using the #to_plain_text method provided by the Premailer gem:

The strip_tags helper could also be used to strip HTML tags in theory, but in practice it would occasionally strip actual content along with the tags.

Solution

Griddler requires you to provide a processing class which responds to .process by default. All that gets passed in as a parameter is an instance of Griddler::Email, which has already been nicely sanitized. Unfortunately this means that it's already too late to specify a preference of whether to prefer the text/html to the text/plain part.

Although a bit of a hack since #send is used, the following works like a charm: