Bug 1773 - [PATCH] compare strings faster
Summary: [PATCH] compare strings faster
Status: NEW
Alias: None
Product: Claws Mail (GTK 2)
Classification: Unclassified
Component: Other (show other bugs)
Version: 3.6.1
Hardware: PC Linux
: P3 enhancement
Assignee: users
URL:
Depends on:
Blocks:
 
Reported: 2008-11-14 16:15 UTC by Stanislav Brabec
Modified: 2017-07-13 00:51 UTC (History)
0 users

See Also:


Attachments
claws-mail-g_strcmp0.patch (22.80 KB, patch)
2008-11-14 16:16 UTC, Stanislav Brabec
no flags Details | Diff
Replace strcmp2 by g_strcmp0 (30.06 KB, patch)
2017-07-13 00:51 UTC, Ricardo Mones
no flags Details | Diff

Description Stanislav Brabec 2008-11-14 16:15:21 UTC
However using if (g_utf8_collate(foo1, foo2)) works and gives good results (at least if glibc or locale data are not broken), this usage is bad.

If you need to just compare strings to get equal/non-equal return value, than using of four-pass locale wise lexicographic collating is purely superfluous.

Using simpler functions like strcmp() or g_strcmp0() will give the same result 5-50 times faster.

In attached patch, I replaces all occurrences of upper mentioned use case.

Notes to this patch:

- I don't know background of case insensitive locale wise comparison in addressbook_foldersel.c. I am keeping this behavior using longer but faster code.

- Using of g_str_equal() should be faster alternative to procmime_str_equal().

- There are more possible candidates to replace.


In case sorted lists presented to users, the situation is more complicated:

- sorting using g_utf8_collate() directly needs n*log(n) expensive four pass comparisons

- using g_utf8_collate_key() and then sorting using fast strcmp would need n expensive four pass key creation, and then n*lon(n) calls of strcmp. It also will need an additional memory (typically 4*sum of strlens of all keys).

The second will be faster. For larger lists and if there is enough memory it will be up to 4-50 faster. But it will go out of memory for 4 times smaller list.
Comment 1 Stanislav Brabec 2008-11-14 16:16:07 UTC
Created attachment 641 [details]
claws-mail-g_strcmp0.patch
Comment 2 Colin Leroy 2008-11-14 20:21:43 UTC
(All of these are non-critical codepaths, apart maybe the hunk in summaryview.)
Comment 3 Colin Leroy 2008-11-16 11:51:55 UTC
The problem with g_strcmp0 is that it's since glib 2.16.
We have strcmp2 that does the same internally.
Comment 4 Stanislav Brabec 2008-11-16 12:58:00 UTC
You can use any simple function to compare strings. Simply replace g_strcmp0 by strcmp2 before applying my patch.

Purpose of the patch: g_utf8_collate() is a ~1000 lines of code run and several temporary memory allocations. Simple comparison may contain ~4 lines of code. If you don't need lexicographical order, g_utf8_collate() is simply an overkill.

I have used g_strcmp0(a,b) as it should return zero in exactly the same conditions like g_utf8_collate(a,b):
- If strings are equal
- If a or b are NULL
Comment 5 Andrej Kacian 2017-07-12 19:01:02 UTC
(In reply to comment #3)
> The problem with g_strcmp0 is that it's since glib 2.16.
> We have strcmp2 that does the same internally.

We now depend on glib >= 2.20, so this problem is no longer a problem. :)
Comment 6 Ricardo Mones 2017-07-12 23:47:05 UTC
(In reply to comment #5)
> (In reply to comment #3)
> > The problem with g_strcmp0 is that it's since glib 2.16.
> > We have strcmp2 that does the same internally.
> 
> We now depend on glib >= 2.20, so this problem is no longer a problem. :)

That strcmp2 sounds like a candidate for removal then :)
Comment 7 Andrej Kacian 2017-07-12 23:58:16 UTC
For a carefu(In reply to comment #6)
> > We now depend on glib >= 2.20, so this problem is no longer a problem. :)
> 
> That strcmp2 sounds like a candidate for removal then :)

For a careful removal - the two functions return slightly different values for various combinations of null and non-null arguments, so a simple s/strcmp2/g_strcmp0/g might cause some breakage.
Comment 8 Ricardo Mones 2017-07-13 00:46:42 UTC
(In reply to comment #7)
> For a carefu(In reply to comment #6)
> > > We now depend on glib >= 2.20, so this problem is no longer a problem. :)
> > 
> > That strcmp2 sounds like a candidate for removal then :)
> 
> For a careful removal - the two functions return slightly different values
> for various combinations of null and non-null arguments, so a simple
> s/strcmp2/g_strcmp0/g might cause some breakage.

Indeed, just to be sure I've made a little test program which shows what you say:

"(null)" "(null)"       g_strcmp0 = 0   strcmp2 = -1
"(null)" "abcdef"       g_strcmp0 = -1  strcmp2 = -1
"abcdef" "(null)"       g_strcmp0 = 1   strcmp2 = -1
"abcdef" "abcdef"       g_strcmp0 = 0   strcmp2 = 0
"abcdef" "fedcba"       g_strcmp0 = -5  strcmp2 = -5

Since usage is mostly to test != 0, the problematic case is the first one. Not sure if some comparison relies on that inequality, I'll check tomorrow.

Otherwise the patch works fine so far :-)
Comment 9 Ricardo Mones 2017-07-13 00:51:50 UTC
Created attachment 1765 [details]
Replace strcmp2 by g_strcmp0

Note You need to log in before you can comment on or make changes to this bug.