3579 – URL detection in messages broken

Bug 3579 - URL detection in messages broken

Summary: URL detection in messages broken

Status:	RESOLVED PATCHESWELCOME

Alias:	None

Product:	Claws Mail (GTK 2)
Classification:	Unclassified
Component:	UI/Message View (show other bugs)
Version:	3.11.0
Hardware:	PC Linux

Importance:	P3 minor
Assignee:	users

URL:

Depends on:
Blocks:

Reported:	2015-12-17 09:32 UTC by Jazz Fan
Modified:	2015-12-18 13:38 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Jazz Fan 2015-12-17 09:32:20 UTC

Hi,

I just read a newsgroup post in which there is a website cited:

-----
For many years, my bash page (tiswww.case.edu/~chet/bash/bashtop.html) has
sported a bash logo that someone whose name I have lost donated long ago.
...
-----

Note that the URL in parentheses is wrongly detected as 

www.case.edu/~chet/bash/bashtop.html

although the "tis" is really part of it. The message headers include "Content-Type: text/plain; charset=utf-8" so this is not a "bug" of the author of that post having wrongly specified an <a href...> html tag.

Comment 1 Ricardo Mones 2015-12-17 16:16:40 UTC

This is probably because the URL detector is trying to be smart to help you, but in fact that string is not even a valid URL since it lacks the protocol part.

Comment 2 Jazz Fan 2015-12-18 12:18:55 UTC

(In reply to comment #1)
> This is probably because the URL detector is trying to be smart to help you,
> but in fact that string is not even a valid URL since it lacks the protocol
> part.

Sure, I agree with your analysis. But even though the URL is invalid, it's interpretation is so, too. I guess the detector uses a regexp containing something like '\bwww\.' which is too restrictive. Protocol-less URLs are far too common and usually refer to http. So I think the detector should either except the URL from the OP or not accept any protocol-less URLs at all.

Note You need to log in before you can comment on or make changes to this bug.