Bug 3971 - Deleted rss feed item reappears as unread on feed refreshing
Summary: Deleted rss feed item reappears as unread on feed refreshing
Status: REOPENED
Alias: None
Product: Claws Mail
Classification: Unclassified
Component: Plugins/RSSyl (show other bugs)
Version: 3.17.0
Hardware: PC Linux
: P3 normal
Assignee: users
URL:
Depends on:
Blocks:
 
Reported: 2018-02-22 15:41 CET by George
Modified: 2019-07-20 02:46 CEST (History)
0 users

See Also:


Attachments
screenshot for feed settings (50.53 KB, image/png)
2018-09-19 00:13 CEST, George
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 George 2018-02-22 15:43:37 CET
Doesn't happen when refreshing this feed:

http://seclists.org/rss/fulldisclosure.rss
Comment 2 Paul 2018-02-23 14:35:25 CET
Those feed items don't have a GUID, and deleted feed items are identified by the GUID.
Comment 3 George 2018-02-23 14:53:54 CET
I don't know what this means but in newsbeuter there is no problem with those same feeds. (in case you want to reconsider)
Comment 4 users 2018-02-23 22:29:10 CET
Changes related to this bug have been committed.
Please check latest Git and update the bug accordingly.
You can also get the patch from:
http://git.claws-mail.org/

++ ChangeLog	2018-02-23 22:29:10.106450835 +0100
http://git.claws-mail.org/?p=claws.git;a=commitdiff;h=525d5a8a91773df1c5305cc0626c7a82610692f6
Merge: 4ca6903 a0d936f
Author: Colin Leroy <colin@colino.net>
Date:   Fri Feb 23 22:29:08 2018 +0100

    Merge branch 'master' of file:///home/git/claws

http://git.claws-mail.org/?p=claws.git;a=commitdiff;h=a0d936fb78ceaf27132f72b90e02a7967cb8d46a
Author: Andrej Kacian <ticho@claws-mail.org>
Date:   Fri Feb 23 22:27:35 2018 +0100

    RSSyl: fix deleted item checking when modified or published time is missing
    
    Fixes bug #3971.
Comment 5 Andrej Kacian 2018-02-23 22:33:41 CET
Actually, there was a bug in RSSyl that caused the deleted items even with IDs to appear, after a second refresh. Fixed now in git.

Deleted items reappearing in the hyperbola.info feed can't be fixed on our side, though - they give their feed items empty ID string, so there's nothing we can do to reliably identify individual items to determine that they have been deleted before. Not without breaking other feeds, which use ID strings correctly.
Comment 7 George 2018-07-14 23:51:47 CEST
Also with

http://rss.slashdot.org/Slashdot/slashdotMain
Comment 8 users 2018-07-15 00:33:10 CEST
Changes related to this bug have been committed.
Please check latest Git and update the bug accordingly.
You can also get the patch from:
http://git.claws-mail.org/

++ ChangeLog	2018-07-15 00:33:09.923174007 +0200
http://git.claws-mail.org/?p=claws.git;a=commitdiff;h=6bca3f1417b14748c0c95fa741c3d1ccdbf091cb
Merge: dd72700 d670b55
Author: Colin Leroy <colin@colino.net>
Date:   Sun Jul 15 00:33:06 2018 +0200

    Merge branch 'master' of file:///home/git/claws

http://git.claws-mail.org/?p=claws.git;a=commitdiff;h=d670b55a109ab0285453389d33e7c8ba8f649b58
Author: Andrej Kacian <ticho@claws-mail.org>
Date:   Sun Jul 15 00:31:18 2018 +0200

    RSSyl: rework matching deleted items to better handle feeds without id
    
    Fixes bug #3971 again.
Comment 9 Andrej Kacian 2018-07-15 00:34:17 CEST
Should be fixed now, after the abovementioned commit.
Comment 10 George 2018-07-15 00:57:20 CEST
That was quick. Thanks! :)

Recompiled and now everything seems to work as expected.

One more change which I notice: now I have to switch to online mode to refresh RSS feeds. Does the current commit fix bug#4014 too?
Comment 11 George 2018-07-15 00:58:32 CEST
I am sorry. Forget about the question. Bug#4014 remains.
Comment 12 George 2018-07-16 20:45:54 CEST
Still seems to happen sometimes.

Example:

https://www.schneier.com/blog/index.rdf

Earlier today I received this news from the feed and deleted it:

<link rel="alternate" type="text/html" href="https://www.schneier.com/blog/archives/2018/07/reasonably_clev.html" />
<published>2018-07-16T11:30:23Z</published>
<updated>2018-07-16T12:56:51Z</updated>

Now (some hours later) when I refreshed all feeds recursively it reappeared as unread.

Another example which I am still monitoring for the sake of this bug:

https://www.youtube.com/feeds/videos.xml?channel_id=UCkK9UDm_ZNrq_rIXCz3xCGA

I have deleted all times all news in it, yet some of them reappear with new (today's) date upon refresh. One example which I have deleted at least 3 times in the last day:

<title>Linux Thursday "Classic" - June 14th, 2018</title>
...
<published>2018-07-13T21:18:17+00:00</published>
<updated>2018-07-16T16:02:40+00:00</updated>


For both examples CM shows the "updated", not the "published" date. FWIW in settings for the feeds I have "If an item changes - Never mark it as new" (but perhaps that has nothing to do with the issue).

Another thing which I noticed from the very beginning: deleted items don't reappear if I refresh within a few minutes after deletion. Some time needs to pass (hours, or a day).
Comment 13 George 2018-07-17 11:24:54 CEST
Another update from the YouTube feed. I deleted this after my last post here:

<title>Linux Thursday "Classic" - June 14th, 2018</title>

Now it reappeared in CM with timestamp matching the "update" about the entry:

<published>2018-07-13T21:18:17+00:00</published>
<updated>2018-07-17T02:10:38+00:00</updated>

Similarly 5 other entries reappeared in the same feed.

Another case:

https://www.zdnet.com/news/rss.xml

Yesterday I deleted item:

<title>
Microsoft will end support for Skype 'Classic' after September 1
</title>

Today it reappeared. Looking at details I see:

<pubDate>Mon, 16 Jul 2018 16:51:00 +0000</pubDate>

The same timestamp shows in CM.
Comment 14 Andrej Kacian 2018-07-17 18:42:33 CEST
It's because of the changed "last modified" timestamp on some of the items. The logic behind deleted items in RSSyl is that title, ID and modified time all have to match - that way one can still be notified if an item that was deleted is updated with a new content, because such item would reappear in RSSyl.

At the time, I thought it a good idea, but seeing as youtube feeds tend to change the timestamps semi-randomly, perhaps I will change it so that modification time is not taken into consideration.
Comment 15 George 2018-07-17 18:53:25 CEST
Yes, it is better if you change it because the generic logic seems to be: "I have deleted this, so I don't need it any more (even if the author of the article decides to fix typos 3 months later)."

If you want to give user more control perhaps you can tie it to the setting "Never mark it as new". Or to a new, additional checkbox next to that dropdown which "Apply to deleted items too". Then one can choose to be updated about deleted things if one wants that so much (which is unlikely).
Comment 16 Andrej Kacian 2018-07-17 19:23:47 CEST
Change committed.
Comment 17 George 2018-07-17 19:48:57 CEST
Thanks!
Recompiled.
Comment 18 George 2018-07-28 12:07:58 CEST
It seems certain RSS feeds still behave incorrectly. Exaple:

https://feeds.feedburner.com/CoinDesk

Example item (deleted yesterday, reappeared as unread upon refresh today):

<title>Left, Right and Center: Crypto Isn't Just for Libertarians Anymore</title>
<pubDate>Fri, 27 Jul 2018 04:00:08 +0000</pubDate>

+ some others in the same feed.

version 3.16.0git248
Comment 19 George 2018-07-28 12:14:20 CEST
Another example:

Feed:
https://developers.google.com/web/updates/atom.xml

<title>Introducing NoState Prefetch</title>
<published>2018-07-20T00:00:00Z</published>
<updated>2018-07-19T00:00:00Z</updated>

This one was not deleted but it was marked as read. This is the third time I see it reappearing as unread upon feed refresh. In feed properties I have If an item changes = "Never mark it as new"
Comment 20 George 2018-08-21 11:49:23 CEST
Something new which I noticed in the recent days:

I moved my ~/.claws-mail and ~/mail dirs from my desktop machine to my laptop which runs the same version of CM (version 3.16.0git257). Then I refreshed the RSS feeds and in some feeds again old (deleted long ago) items reappeared.

I kept using only the laptop for a some days and the issue didn't appear. After a week I moved the ~/.claws-mail and ~/mail dirs back to my desktop system. When I refreshed the RSS feeds again some old items (deleted long ago and deleted on the laptop too) reappeared as new and unread.

Of course during those moves I made sure that I don't mix old and new directory contents, i.e. it was a clean *move*, not copying inside existing dirs.


One particular feed which I can give as an example is:

https://news.opensuse.org/feed/

Right now I see it fetched 10 items from 2018-06-14 till 2018-08-16.
Comment 21 George 2018-09-18 00:04:45 CEST
Another feed for which deleted items reappear as unread:

http://rss.frognews.bg/
Comment 22 Paul 2018-09-18 09:28:54 CEST
(In reply to comment #21)
> Another feed for which deleted items reappear as unread:
> 
> http://rss.frognews.bg/

No such problem with this feed for me. Which version of claws-mail are you using now?
Comment 23 George 2018-09-18 10:34:02 CEST
[~]: /opt/claws-mail/bin/claws-mail -V
Claws Mail version 3.17.0-53-g102ceb
runtime GTK+ 2.24.32 / GLib 2.54.3
buildtime GTK+ 2.24.32 / GLib 2.54.3
Compiled-in features:
 compface
 Enchant
 GnuTLS
 IPv6
 iconv
 LDAP
 libetpan 1.8
 libSM
 NetworkManager
 librSVG 2.42.3
Comment 24 George 2018-09-18 19:09:10 CEST
Right after my previous reply I deleted again all messages for that feed and now I refreshed it. The oldest message I see is with time:

Date: Mon, 17 Sep 2018 15:39:31 GMT

+ there are others with timestamps from before my previous reply.
Comment 25 Andrej Kacian 2018-09-18 20:21:06 CEST
I suggest upgrading to latest git version - I just made some changes to how feed item timestamps are handled, perhaps it will help you here. You will probably see some items duplicated, that is a one-time glitch caused by the change.

That said, I couldn't reproduce the described issue with the frognews feed even before this. Deleted items stay deleted, only newly published items appear after a feed refresh.
Comment 26 George 2018-09-18 22:10:05 CEST
Thanks Andrej. Updated and will keep an eye on it.

> I couldn't reproduce the described issue with the frognews feed

I have been thinking: Do you think this may be because we may be in different time zones (and CM not handling that properly)?

Right now I am deleting all in frognews again at:

[~]: date
Tue Sep 18 23:09:28 EEST 2018

Will check how things are tomorrow.
Comment 27 George 2018-09-18 23:23:56 CEST
[~]: date
Wed Sep 19 00:21:19 EEST 2018

Refreshing the feed shows me 2 old messages as unread:

Date: Tue, 18 Sep 2018 16:32:18 GMT
Date: Tue, 18 Sep 2018 18:41:57 GMT
Comment 28 Paul 2018-09-18 23:58:55 CEST
(In reply to comment #26)
> I have been thinking: Do you think this may be because we may be in
> different time zones (and CM not handling that properly)?

Do you have reason to think that CM does not handle timezones properly? I don't. For example, I use a different timezone from Andrej, but neither of us could reproduce your problem.
Comment 29 George 2018-09-19 00:13:46 CEST
Created attachment 1921 [details]
screenshot for feed settings

> Do you have reason to think that CM does not handle timezones properly?

I was just speculating around the facts and wondering what may be different between you guys and me. So the time zone difference was the only one I could think about.

Another thing: are we using the same settings for the feed? I am attaching a screenshot of mine.

Also: are you doing the same as me? I.e.

1. Delete all messages
2. Wait a few hours
3. Refresh the feed

If the answer to both questions is yes - it is a real mystery and I am open to suggestions about how to investigate further.
Comment 30 George 2018-09-19 17:38:58 CEST
Here is another feed which shows the issue:

http://www.bacula.org/git/cgit.cgi/bacula/atom/?h=Branch-9.2

5-6 hours ago I deleted all messages in it. Now I refreshed it and it shows me old messages, oldest one is from:

Date: Sat, 18 Aug 2018 05:22:04 GMT
Comment 31 George 2018-09-21 14:37:40 CEST
Update about frognews:

Today I tried something different. I marked all messages as read and after 2-3 hours I refreshed the feed:

Result: For some messages new duplicates appeared as unread (identical timestamps_. Checking the content of those messages I see one difference: The URL has changed. Looking at http://rss.frognews.bg/ though shows the particular message only once.

I wonder if "Never mark it as new" may be failing and causing this particular issue.
Comment 32 George 2018-09-29 18:25:32 CEST
Another feed which just showed the issue:

https://security.googleblog.com/feeds/posts/default

Some hours ago I deleted message with timestamp:

Date: Thu, 20 Sep 2018 09:51:05 GMT

I refreshed the feed now and it appeared as unread.
Comment 33 George 2018-09-29 18:27:01 CEST
Another one:

https://feedproxy.google.com/blogspot/amDG

Messages:

Date: Wed, 29 Aug 2018 15:02:47 GMT
Date: Tue,  4 Sep 2018 10:05:20 GMT
Comment 34 George 2018-10-02 21:29:56 CEST
Another feed seen to show similar behavior:

https://blog.chromium.org/feeds/posts/default
Comment 35 George 2018-10-25 11:45:01 CEST
More feeds showing the issue:

https://blog.bacula.org/feed/
https://photoflowblog.blogspot.com/feeds/posts/default
Comment 36 George 2019-03-15 18:00:38 CET
This keeps happening every now and then with different feeds.

Today it happened with https://puri.sm/feed/ and CM which loaded 230 "new" and "unread" items on the feed (back to ones from 2014).

I allow myself the liberty to change priority from "minor" to "normal" as this is really annoying and has been happening for a long time. I hope that's OK.
Comment 37 Santa Claws 2019-07-19 13:54:52 CEST
I can confirm this bug too.

Just happened with https://jabber.at/feed/en/rss.xml and happens with other feeds too every now and then. Very annoying.

I see this has been started more than a year ago. Any hope to have it fixed any time soon?
Comment 38 Little Girl 2019-07-20 02:46:52 CEST
I can also confirm it with a.o.linux.ubuntu and a also with a private server a friend of mine has up.

Ubuntu MATE 16.04.6 LTS
Claws Mail 3.13.2