Bug 4070 - Outlook HTML e-mails are converted to text by claws upon replying - badly
Summary: Outlook HTML e-mails are converted to text by claws upon replying - badly
Status: RESOLVED WONTFIX
Alias: None
Product: Claws Mail (GTK 2)
Classification: Unclassified
Component: UI/Message View (show other bugs)
Version: 3.16.0
Hardware: PC Linux
: P3 normal
Assignee: users
URL:
Depends on:
Blocks:
 
Reported: 2018-08-13 10:15 UTC by Arthur HUILLET
Modified: 2021-02-17 15:30 UTC (History)
0 users

See Also:


Attachments
HTML mail from Outlook (2.91 KB, text/html)
2018-08-13 10:15 UTC, Arthur HUILLET
no flags Details
"correct" output by firefox (350 bytes, text/plain)
2018-08-13 10:16 UTC, Arthur HUILLET
no flags Details
"incorrect" output by Claws and many others (408 bytes, text/plain)
2018-08-13 10:17 UTC, Arthur HUILLET
no flags Details
My workaround patch (466 bytes, patch)
2018-08-13 10:18 UTC, Arthur HUILLET
no flags Details | Diff
new patch looking for MS-specific syntax (3.13 KB, patch)
2021-02-17 15:30 UTC, Thomas Orgis
no flags Details | Diff

Description Arthur HUILLET 2018-08-13 10:15:26 UTC
Created attachment 1900 [details]
HTML mail from Outlook

MS Outlooks likes to send HTML-only e-mails. Its HTML is a little strange and each line is in a <p> section. 
This isn't handled well by Claws. When Claws displays HTML as text, but more importantly when it quotes an HTML e-mail in a reply, its conversion will add extra newlines, making the text harder to read: there will be an empty line between each line of the original e-mail.
 
I'm attaching an example HTML file generated by Outlook. This HTML file is rendered "incorrectly" (as described above) by Claws, Links, Lynx and W3M, but rendered correctly by Firefox. Attaching example outputs of Claws and Firefox to show the difference. This is particularly painful when Claws quotes an e-mail by prepending >.

The "fix" that I'm carrying locally is attached in a patch. It works for me but may not be the most correct solution.
Comment 1 Arthur HUILLET 2018-08-13 10:16:13 UTC
Created attachment 1901 [details]
"correct" output by firefox
Comment 2 Arthur HUILLET 2018-08-13 10:17:15 UTC
Created attachment 1902 [details]
"incorrect" output by Claws and many others
Comment 3 Arthur HUILLET 2018-08-13 10:18:20 UTC
Created attachment 1903 [details]
My workaround patch
Comment 4 Arthur HUILLET 2018-11-06 09:04:12 UTC
I have been using my patch for a few months now and observed no problem. Would you please merge it?
Comment 5 wwp 2018-11-06 09:20:09 UTC
I'm still unsatisfied how CM shows text out of HTML, adding extraneous newlines between paragraphs. Your patch is fixing some (newlines were added at every single line) but not all, and until I (or somebody else) can afford spending more time on this topic, I'd prefer holding the patch on for a while.
Comment 6 Arthur HUILLET 2018-11-06 09:42:44 UTC
Do you have examples of incorrect output that my patch isn't fixing? I can take a look, but I haven't noticed any problem that my patch isn't fixing.

Either way, my current patch is a net improvement, and you can always revert it later if a more complete fix comes in. As it is upstream, replies to Outlook-sent e-mails look super crappy. This ought to be fixed for more than just me.
Comment 7 Arthur HUILLET 2019-09-30 16:07:16 UTC
The patch has been held for quite "a while" now. Would you please merge it?
Comment 8 Paul 2019-09-30 19:11:06 UTC
It hasn't been merged because it seems like a workaround rather than a fix, working around microsoft's non-standard html tags, whose only purpose is to faciliate ms office.
Comment 9 Arthur HUILLET 2019-09-30 20:03:01 UTC
I don't think there is a fix in the client that isn't a workaround; but there's a usability problem in Claws if it can't display this properly. It wouldn't be the only one of course.

Yes, the HTML passed in input is crap. It shouldn't be like that. Sadly, you don't usually have a choice in what e-mail client people use to write you, and it's not practical in most settings to tell them not to use Outlook. 

There's a chance to improve Claws here, without hurting anything that I could see in more than a year of testing, and after asking for examples. I think it should be considered. Rest assured the distaste of Outlook is well-shared...
Comment 10 Paul 2019-10-02 07:39:52 UTC
> There's a chance to improve Claws here, without hurting anything that I could see in more than a year of testing

Your patch does break it. For example, in a simple html file, with the following in its body:

  <p>my first paragraph</p>
  <p>my second paragraph</p>
  <p>my third paragraph</p>

There should be a blank line between each paragraph. Currently there is. With your patch there is not.
Comment 11 Thomas Orgis 2021-02-17 14:20:42 UTC
After discussion on IRC, I'd like to reopen this.

I am working on a patch that will

1. detect MS-originated HTML
2. apply the workaround of avoiding extra line breaks on
   <p class=MsoPlainText>…</p>

This seems to be the scheme Outlook is following: Converting plaintext mail lines
to those paragraphs of the given class, then display them in HTML. The class is
configured with margin:0 in CSS so that the pseudo-paragraphs look like lines
again. On converting to plaintext again, Outlook itself forgets about the fake
nature of these paragraphs and creates extra empty lines between the initial
plaintext lines in quotes inside the reply it generates.

Especially with the top-posting style prevalent in corporate environments, and
which is _sometimes_ actually useful to get someone new into the conversation
with some history by CC-ing all the cruft, this leads to discussion threads where
the plaintext mails from Claws stick out by being strangely formatted with extra
line breaks. The reaction is: „What's wrong with that person? Cannot they use a
proper mail client and send nicely formatted HTML like the rest of us?”

So my plan is two-fold. I am actually hoping that MS will fix their behaviour,
at least in creating the plaintext version from their HTML. But that may still
leave the HTML part being badly displayed by Claws, because it does not know
the CSS that turns paragraphs into lines again (not using an actual web engine,
just the reduction to text that Claws does without plugin). So I want what the
initial reporter of this bug inteded, but with a bit more finesse to only apply
it to the kind of fake paragraphs MS creates.
Comment 12 Thomas Orgis 2021-02-17 15:30:40 UTC
Created attachment 2177 [details]
new patch looking for MS-specific syntax

OK, done and tested on a Ubuntu 20.04 x86-64 system.

This avoids the extra line break only when this is a MS HTML part and
for paragraphs with the proper class.

Note You need to log in before you can comment on or make changes to this bug.