i have setup RSS feeds for slashdot and freshmeat/freecode and as they have high volume of changes and i dont expire old news, their folders have ~10.000 items. When RSSyl tries to fetch/update the feed, the machine load jumps to ~50 and i can see high IO requests from claws. Trying to check what is happening i get this: write(1, "feed.c:911:", 11) = 11 write(1, "Appending 'Greg KH Leaves SUSE F"..., 53) = 53 write(1, "feed.c:909:", 11) = 11 write(1, "RSSyl: starting to parse '42143'"..., 33) = 33 write(1, "feed.c:662:", 11) = 11 write(1, "RSSyl: parsing '/home/higuita/.c"..., 64) = 64 open("/home/higuita/.claws-mail/RSSyl/Slashdot/42143", O_RDONLY) = 31 fstat(31, {st_mode=S_IFREG|0600, st_size=3859, ...}) = 0 read(31, "Date: Sun, 19 Feb 2012 20:11:00 "..., 3859) = 3859 close(31) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3439, ...}) = 0 write(1, "feed.c:719:", 11) = 11 write(1, "RSSyl: got date \n", 17) = 17 write(1, "feed.c:712:", 11) = 11 write(1, "RSSyl: got author 'samzenpus'\n", 30) = 30 write(1, "feed.c:725:", 11) = 11 write(1, "RSSyl: got title 'Canada's Onlin"..., 84) = 84 write(1, "feed.c:768:", 11) = 11 write(1, "RSSyl: updated title to 'Canada'"..., 100) = 100 write(1, "feed.c:732:", 11) = 11 write(1, "RSSyl: got link 'http://rss.slas"..., 148) = 148 write(1, "feed.c:741:", 11) = 11 write(1, "RSSyl: got id 'http://rss.slashd"..., 146) = 146 write(1, "feed.c:697:", 11) = 11 write(1, "RSSyl: finished parsing headers\n", 32) = 32 write(1, "feed.c:789:", 11) = 11 write(1, "Leading html tag found at line 1"..., 34) = 34 write(1, "feed.c:796:", 11) = 11 write(1, "Trailing html tag found at line "..., 35) = 35 write(1, "feed.c:911:", 11) = 11 write(1, "Appending 'Canada's Online Surve"..., 86) = 86 write(1, "feed.c:909:", 11) = 11 write(1, "RSSyl: starting to parse '43813'"..., 33) = 33 write(1, "feed.c:662:", 11) = 11 write(1, "RSSyl: parsing '/home/higuita/.c"..., 64) = 64 open("/home/higuita/.claws-mail/RSSyl/Slashdot/43813", O_RDONLY) = 31 fstat(31, {st_mode=S_IFREG|0600, st_size=3829, ...}) = 0 read(31, "Date: Tue, 3 Apr 2012 18:31:00 "..., 3829) = 3829 close(31) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3439, ...}) = 0 write(1, "feed.c:719:", 11) = 11 write(1, "RSSyl: got date \n", 17) = 17 write(1, "feed.c:712:", 11) = 11 write(1, "RSSyl: got author 'samzenpus'\n", 30) = 30 So looks that RSSyk is trying to read ALL the items on those folders. So the more items we have, the more this is a issue
This is inevitable, since every new item from currently parsed feed update needs to be checked against existing items, to see if it is one of them (so it can either be ignored, or updated with new content). You could create a cleanup processing rule (e.g. condition "age_greater 7 & ~unread", action "delete") to get rid of old stuff. I understand that might not be desirable, though. Only possible solution I can think of would be an optional age cutoff setting, with updates not being checked against older items.
Any workaround for this? I'm using Claws 3.13.2 and RSSyl still is I/O intensive...