Bug 4243

Summary: Switching folders is slow
Product: Claws Mail (GTK 2) Reporter: Alexander Harkness <me>
Component: Folders/IMAPAssignee: users
Status: NEW ---    
Severity: normal    
Priority: P3    
Version: 3.17.5   
Hardware: All   
OS: All   
Attachments:
Description Flags
Patch to add tests for subject_get_prefix_length
none
Patch to improve performance of subject_get_prefix_length
none
Patch to remove check for duplicate message IDs in threading process
none
Patch to eliminate duplicate insertions to subject hash table none

Description Alexander Harkness 2019-08-29 15:50:19 UTC
Currently switching between folders in Claws takes rather longer than I think it _should_ do. On my machine it takes ~5s to open an IMAP folder with ~100000 messages inside.

I have had a look at the code and there seem to be multiple causes of the issue:

 1. Folders are always rescanned over the network before being opened. This contributes several seconds of latency and download time.

 2. Folders (and their constituent summaryviews) are always rebuilt on opening rather than being cached.

 3. Building 100k GTK rows takes time (not much can be done here...).

 4. Extracting the information required for a summary row from the MsgInfo struct is expensive.

 5. Threading and sorting are also expensive.

With regards to 1., this seems like rather a design flaw, is there a good reason behind it? If not, it would make sense if scanning was completed in the background after the initial render of the folder.

Caching (in RAM) rendered folders as in 2. appears to be a larger architectural change but would make folder changes effectively instant no matter slow everything else is.

The code for 4. is relatively efficient, just the volume of messages and feature set requires that a lot of CPU time be spent on it. But a cache could be added (in the same vein as the imapcache) to reduce endless recalculations of unchanged messages.

I ran a profiler over the code for 5. and found two easy changes adding up to around a 10% CPU saving (of the total for loading a folder) - I can supply the patches for the changes. With some more substantial architectural changes I think another 5-10% at least could be saved. 

Any thoughts?
Comment 1 Andrej Kacian 2019-08-29 17:32:53 UTC
Just to correct your "no cache" assertion: before opening a folder, Claws Mail queries the server about list of UID numbers of messages it contains ("UID FETCH 1:* (FLAGS UID)"), and compares it to its local cache. After that, only additional information about messages with previously unseen UIDs is being retrieved.

As for your patches, we'll be happy to see any improvements regarding this.
Comment 2 Alexander Harkness 2019-08-29 21:02:57 UTC
Created attachment 2001 [details]
Patch to add tests for subject_get_prefix_length
Comment 3 Alexander Harkness 2019-08-29 21:03:24 UTC
Created attachment 2002 [details]
Patch to improve performance of subject_get_prefix_length
Comment 4 Alexander Harkness 2019-08-29 21:03:58 UTC
Created attachment 2003 [details]
Patch to remove check for duplicate message IDs in threading process
Comment 5 Alexander Harkness 2019-08-29 21:04:07 UTC
> Just to correct your "no cache" assertion: before opening a folder, Claws Mail queries the server about list of UID numbers of messages it contains ("UID FETCH 1:* (FLAGS UID)"), and compares it to its local cache. After that, only additional information about messages with previously unseen UIDs is being retrieved.

This is indeed the case, the folder rebuilding I was referring to is after the imapcache is loaded (and the remote server scanned against it) - much CPU time is spent converting the imapcache into summaryview form, only for this to be wasted by free() as soon as the user switches to viewing a different folder.

I have attached 3 patches (one is simply a test to ensure I haven't changed behaviour unintentionally).
Comment 6 Alexander Harkness 2019-08-30 17:11:23 UTC
Created attachment 2004 [details]
Patch to eliminate duplicate insertions to subject hash table
Comment 7 Alexander Harkness 2019-10-16 09:00:28 UTC
Any chance this could get a review?