Bug 4428 - punctuation stripping from URLs in text emails strips $ signs from end of URL
Summary: punctuation stripping from URLs in text emails strips $ signs from end of URL
Status: NEW
Alias: None
Product: Claws Mail (GTK 2)
Classification: Unclassified
Component: UI/Message View (show other bugs)
Version: 3.18.0
Hardware: PC All
: P3 enhancement
Assignee: users
URL:
Depends on:
Blocks:
 
Reported: 2020-12-31 15:59 UTC by Roland Haas
Modified: 2020-12-31 15:59 UTC (History)
0 users

See Also:


Attachments

Description Roland Haas 2020-12-31 15:59:21 UTC
When encountering an URL like:

https://urldefense.com/v3/__https://eff.org/r.o9g6__;!!DZ3fjg!srHW_CI4QzWk_Et7SAcZwTL_6C2bVOKE-ZLz9eesJB6afpP_kdt4-QeDMY9WyOMX$

in an email then the punctuation stripping code in get_uri_part will remove the trailing "$" signs since they are considered a real punctuation by the IS_REAL_PUNCT macro defined in that function (src/common/utils.c):

#define IS_REAL_PUNCT(ch)       (g_ascii_ispunct(ch) && !strchr("/?=-_~)", ch))

Unfortunately urldefense, used by my institution, has recently started to construct their redirection emails to all end in "$" so I have to manually copy each URL to a browser address bar and add the $ to it, rendering the URL detection useless.

I simple fix would be to extend the list of characters in the strchr call in the macro to include "$". A better one might be to use a list of punctuation characters instead and make that list user configurable for cases of non-English languages where the claws-authors might not know what is likely to be a punctuation character (e.g. « and » in French).

Note that this is apparently something that is a know(ish) issue based on the comment just above the macro:

/* FIXME: this stripping of trailing punctuations may bite with other URIs.
 * should pass some URI type to this function and decide on that whether
 * to perform punctuation stripping */

Given that punctuation stripping seems to be based on a heuristic of what likely is expected to end a URL in an email, I do not, of course, know how likely it is to find emails where "$" really should not be considered part of the link, eg:

$$$Earn money now https://example.com/earn$$$

would likely indicate that one wold want to consider "$" to designate the end of the URL here (in particular if bad HTML-text conversion was at work on the sender's side).

Note You need to log in before you can comment on or make changes to this bug.