Bug 4224 - Characters showing as underscores
Summary: Characters showing as underscores
Status: NEW
Alias: None
Product: Claws Mail (GTK 2)
Classification: Unclassified
Component: UI/Message View (show other bugs)
Version: 3.17.4
Hardware: PC Linux
: P3 normal
Assignee: users
URL:
: 4229 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-06-23 15:33 UTC by Peter
Modified: 2022-12-25 19:23 UTC (History)
3 users (show)

See Also:


Attachments
screenshot (154.10 KB, image/png)
2019-06-23 15:33 UTC, Peter
no flags Details
Patch to use ringbuffer and fix bug (1.97 KB, patch)
2021-11-06 14:56 UTC, Viktor
no flags Details | Diff

Description Peter 2019-06-23 15:33:46 UTC
Created attachment 1993 [details]
screenshot

See the attached screenshot.

In some messages some characters show as underscores. I have noticed this a lot in feed https://bivol.bg/feed. Other RSS readers show these same articles without issues.

I also notice that opening the message source in KWrite shows readable characters. So it is something in the way Claws Mail renders the text.

Note: It is not just about Cyrillic. I have seen it (though rarely) in English articles too.
Comment 1 Andrej Kacian 2019-08-26 19:23:13 UTC
*** Bug 4229 has been marked as a duplicate of this bug. ***
Comment 2 Andrej Kacian 2019-08-26 19:33:21 UTC
This is caused by our html-to-text parser not being aware of multi-byte characters. It parses the messages piece by piece, and sometimes multi-byte character ends up split between the two separate parts. That causes rest of the message to be displayed as underscores.

This is easily observable by increasing or decreasing value of SC_HTMLBUFSIZE in html.c by just one. The message(s) which were displayed incorrectly before, will be displayed correctly now, or the underscores will be shown starting from different position in their text.

Unfortunately, the parser works in a very convoluted way, and I can't find a way to easily fix this without completely rewriting it. Hopefully someone else can.
Comment 3 Peter 2019-08-27 12:23:05 UTC
I also hope that.

Additionally I hope you can raise the priority of this one as it is a serious bug which makes messages unreadable.
Comment 4 Viktor 2019-09-01 22:58:27 UTC
Some details:
- Cyrillic unreadable both in message headers (sometimes) and message body (sometimes, unrelated toheaders); headers not in html, but in qouted-printable strings;
- Unreadable characters with HTML-viewing off in config;
I think it may be two different problems.
Comment 5 Paul 2020-01-08 14:19:18 UTC
*** Bug 4292 has been marked as a duplicate of this bug. ***
Comment 6 Peter 2020-01-31 22:49:33 UTC
Some more info. STR:

1. Pick an RSS message which shows underscores
2. Open its source in a text editor
3. Add a single character anywhere in the body, save
4. In Claws - click another message, then back the problematic one

Result:

No underscores!

5. Undo the editing, save.
6. Repeat 4.

Result:

No underscores.

The problem with this workaround though is that on next refresh of the feed, the message may be replaced with its original.

I have been looking at the source code of RSSyl but couldn't find anything. Too complicated for me. I hope an expert can have a look.
Comment 7 Peter 2020-01-31 22:49:58 UTC
*When I say "no underscores" I mean text becomes readable.
Comment 8 Viktor 2021-04-08 11:58:36 UTC
Look, all messages with length < SC_HTMLBUFSIZE (8192 bytes) display corrected.I think sc_html_read_line() has bug.
Comment 9 Viktor 2021-11-06 14:56:26 UTC
Created attachment 2255 [details]
Patch to use ringbuffer and fix bug
Comment 10 Viktor 2022-12-25 19:23:10 UTC
Effect appear on plain text utf-8 messages > 8K. For me it's often RSS messages.

Note You need to log in before you can comment on or make changes to this bug.