4170 – add option(s) for (per-feed) curl options

Bug 4170 - add option(s) for (per-feed) curl options

Summary: add option(s) for (per-feed) curl options

Status:	REOPENED

Alias:	None

Product:	Claws Mail (GTK 2)
Classification:	Unclassified
Component:	Plugins/RSSyl (show other bugs)
Version:	3.17.4
Hardware:	PC Linux

Importance:	P3 enhancement
Assignee:	users

URL:

Depends on:
Blocks:

Reported:	2019-03-10 12:56 UTC by George
Modified:	2019-10-02 17:14 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description George 2019-03-10 12:56:44 UTC

version 3.17.3git134

STR:

1. Use Internet through TOR proxy
2. Run Claws Mail with

export http_proxy=socks5://127.0.0.1:9050 && export https_proxy=socks5://127.0.0.1:9050 && claws-mail

3. Try to refresh an RSS feed for a website which uses Clouldflare (example: https://scotthelme.co.uk/feed/)

Expected:

Just like Tor Browser is able to open the feeds, Claws Mail should also be able to do that.

Actual:

The feeds cannot be updated through Tor and display a 403 error (possibly hitting Cloudflare's recaptcha challenge in the background).

I am unaware what special Tor Browser may be doing but the fact is it does open feeds which CM cannot.

Comment 1 Andrej Kacian 2019-03-11 19:52:14 UTC

First of all, Cloudflare's captcha protection is something optional that site owners can enable for their site. So this is not really about Cloudflare, this is about individual website owners preventing users from accessing their site's feed. Furthermore, it is not an all-or-nothing situation - it is possible to only enable it selectively[1].

So, if a site's feed is "protected" by this captcha, from purely technical point of view it simply means that the site is publishing a feed URL that is broken for some visitors.

If we take a step back, it means that the site owner does not want robots accessing an URL that is primarily and implicitly meant to be accessed by robots. The site owner only wants live people, using full web browsers accessing it. What you should do is to respect their wish, and simply not consume their feed (or to look at it via your browser only).

Now, I suspect that many individual site owners do not even realize this distinction (a resource intended for live humans vs. a resource intended for robots), and that they could be persuaded to modify settings for their site accordingly.

The Tor Browser probably bypasses this by a not-yet-standardized protocol, using a plugin mentioned here[2], but it is just a hacky workaround, and not a real solution to people misconfiguring their website.

1. https://support.cloudflare.com/hc/en-us/articles/200170096-How-do-I-turn-off-the-Cloudflare-Captcha-challenge-page-
2. https://www.theregister.co.uk/2016/10/05/cloudflare_tor/

Comment 2 George 2019-03-11 22:14:08 UTC

I am familiar with CF's captcha challenge and how it is enabled as I manage a few websites which also use this. The page which you show [1] explains how to enable it selectively by creating a page rule in CF's control panel. However there are a few more things to consider here. You may be familiar or not with some of them but I will still list them as this may be helpful info:

- Tor browser does *not* use Cloudflare's browser plugin which helps bypass CF's own challenge. As Cloudflare is a form of centralized instance which decrypts traffic, that would be against the whole concept of the Tor browser.

- Most websites using CF use its free plan on which the number of page rules is very restricted (only 3), so it is unlikely to waste a rule just to block RSS access through Tor, especially considering that:

- It is in website's interest to have their RSS feed crawlable (better SEO), so there is no reason to explicitly block the RSS feed. Also when a particular website presents a challenge it is for all pages, not just for the feed.

- It is possible to enable the captcha challenge not per URL pattern but by IP address range which may be either a single IP address, a CIDR notation or also there is a built in preset called "Tor", i.e. it is possible to present challenge only to Tor network users - this is what many do and the proof about it is the fact that the same page opens when Tor network is not used

- CF can automatically present the challenge based on "suspicious traffic", regardless of settings for particular website. That may be based on previous history from a particular IP address/range which CF has logged.

- Opening the same CF managed website in Tor web browser and in a regular (e.g. Chrome) web browser which is configured to use Tor proxy doesn't give the same result, even if it is configured similar to the Tor browser, even if multiple different IP addresses are tested in both browsers. Tor browser will open the URL, the other browser will hit a captcha. How does CF decide when to present the challenge? - I don't know. I think it may be a combination based on the HTTP request headers. Although I have tried to copy literally the HTTP request headers of Tor browser and use them in curl, curl also hits the challenge. However I am not sure if this is the right thing to do because of this special __cfduid cookie which I am not quite sure how to handle in curl. In any case this has nothing to do with JavaScript as in both browsers (Tor and Chrome) JS is completely disabled.

Considering this info I was hoping that someone who is more experienced may look into it. As the network (Tor IP range) is the same, perhaps fine tuning the rest of the parameters will if not remove completely at least reduce the number of 403 responses (the captcha challenge hits).

Comment 3 George 2019-03-23 11:49:53 UTC

May be related to bug#4184

Comment 4 Andrej Kacian 2019-05-13 14:50:22 UTC

To anyone interested, feel free to investigate the issue in depth and reopen if a clearly described, reasonable technical solution appears.

Meanwhile, I do not consider this a bug, see my earlier comment.

Comment 5 Santa Claws 2019-10-02 15:54:45 UTC

This is an issue which many Tor users have faced, not just those of Claws Mail.

I will not use the URL provided by the original reporter because Scott Helme is known to additionally strengthen his headers, so perhaps not a good example and here is another one.

The symptom seems to be the captcha:

$ curl -s --socks5-hostname 127.0.0.1:9050 https://www.pine64.org/feed/ | grep -i "<title>.*</title>" | head -n 1

<title>Attention Required! | Cloudflare</title>


An in-depth direct discussion with Cloudflare was shared on GitHub:

https://github.com/Eloston/ungoogled-chromium/issues/783

where someone shared a simple solution:

https://github.com/Eloston/ungoogled-chromium/issues/783#issuecomment-536205751

through the command:

curl --http2 -A "Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0" -v --socks5-hostname 127.0.0.1:9050 --max-redirs 0 --tls-max 1.2 --ciphers ECDHE-ECDSA-AES128-GCM-SHA256 <URI>

Testing this shows that the actual solution is to specify explicitly the ciphers (--ciphers) which Cloudflare supports (can be checked on SSL Labs for the host) and a popular user agent ("Mozilla..."). So:

$ curl -s --http2 -A "Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0" --socks5-hostname 127.0.0.1:9050 --ciphers ECDHE-ECDSA-AES128-GCM-SHA256 https://www.pine64.org/feed/ | grep -i "<title>.*</title>" | head -n 1

        <title>PINE64</title>

This obviously solves the issue.

The actual solution is simple: give an option to the users to specify per-feed specific curl options. So this is a valid feature request which may solve both this issue and bug#4184.

Andrej,

I see no option to reopen this (at least with JS disabled). Please reopen and comment to confirm.

Note You need to log in before you can comment on or make changes to this bug.