Since I updated to 3.0.2, Whenever I attempt to shutdown claws either through an Unmap request in the window manager or by selecting exit in claws, the process hangs leaving the window (without refreshes) and SIGKILL has to be sent to the claws process to remove it. I only get the problem when pgp-core is loaded, so it doesn't seem to be triggered by its subsequent plugins (pgp-inline and pgp-mime) since when only pgp-core is loaded I get the hangs. For now I have disabled PGP processing and can run fine with trayicon, vcalendar and rssyl.
Can you send a --debug log?
(In reply to comment #1) > Can you send a --debug log? > Before sending the log I moved my .claws-mail folder and started a simple configuration (mbox file) + loaded pgp-{core,inline,mime}. No hangups anymore, I will track down what triggers the hang in my configuration. In the meantime this bug can be closed. Thanks.
Created attachment 512 [details] debug log Can you reopen this bug ? If i enable pgp-core, i'm getting this hangup on close too, i have to pkill -9 claws. Log is not really significant, hangup seems to appear at plugin close/destroy.. i have checked, this didn't happen with 3.0.1. And i confirm this doesn't happen with an empty conf(simple mbox only)
Can you run through gdb, and when it hangs at quit, do a Ctrl-C followed by "thread apply all bt" Thanks,
Created attachment 513 [details] gdb trace for 3.0.2
Created attachment 514 [details] gdb trace for 3.1.0 I confirm this still happens with 3.1.0, and seems related to a thread deadlock issue..
(In reply to comment #6) > Created an attachment (id=514) [edit] > gdb trace for 3.1.0 > > I confirm this still happens with 3.1.0, and seems related to a thread deadlock > issue.. > And it still happens with 3.2.0, but not repeatedly... what can have change in thread handling since 3.0.1, last time it worked fine ? What can we test to see what triggers the hang ?
(In reply to comment #7) > And it still happens with 3.2.0, but not repeatedly... what can have change in > thread handling since 3.0.1, last time it worked fine ? What can we test to see > what triggers the hang ? I have no idea - it never happened to me despite heavy usage of pgp/* plugins... Nothing changed with thread management since 3.0.1 that I remember of, and pgp plugins only use a temporary thread to do the signature checks...
OK, maybe some news. Can you look at bug #1478, and try the attached patch and/or snapshots (3.2.0cvs57 or greater, plugins RSSyl VCalendar and GtkHtml2Viewer from CVS or snapshots too) ? If the hangups appeared at 3.0.2, it may not be the solution, but if they appeared at 3.0.0 but you upgraded from 2.10.0 to 3.0.2, that may be the reason.
Created attachment 540 [details] gdb trace for 3.2.0cvs58 Iirc the deadlock happens since 3.0.2, it wasn't triggered in 3.0.1. And i'm sorry to say that with this snapshot, it still happens (see attached thread bt) Running it three times in gdb didn't trigger the deadlock, but when launching claws normally it deadlocked at the second try, i attached it to gdb to get the trace.
Created attachment 547 [details] possible fix Could you try this attached patch? It justs avoids stopping the imap manager threads at exit.
(this is in imap, but it looks like a deadlock in the pthreads library)
with these two returns added, i've been able to launch, use and close claws ten times with pgp plugins loaded without being able to reproduce the deadlock. It seems to fix the problem, but inconditionally blaming all *BSD pthreads implementation and directly returning is not a good idea imho. Maybe calling etpan_thread_manager_join() only if !defined *BSD would be better ? If i only comment those two calls and let the *_main_done() function terminate, the deadlock doesn't happen. But btw i'm open to a better fix...
I have seen similar differencies in the Linux and OpenBSD thread implementations before. A greylist daemon for Postfix always left zilions of zombie childs in memory when running under OpenBSD but not under Linux. Maybe something similar is causing problems wit claws-mail? The solution was the following: signal(SIGCHLD,NoZombies); /************************************************/ /* */ /* NoZombies: Empeche la creation de zombies */ /* Quand on forke en System V */ /* */ /************************************************/ /* */ /* RIEN */ /* */ /************************************************/ /* */ /* RIEN */ /* */ /************************************************/ // French documentation is made by Salim Gasmi void NoZombies(int sig) { while(waitpid(-1, NULL, WNOHANG) > 0); }
(In reply to comment #14) > A greylist daemon for Postfix always left zilions of zombie childs in memory > when running under OpenBSD but not under Linux. Child processes and threads are two completely different things.
(In reply to comment #13) > with these two returns added, i've been able to launch, use and close claws ten > times with pgp plugins loaded without being able to reproduce the deadlock. It > seems to fix the problem, but inconditionally blaming all *BSD pthreads > implementation and directly returning is not a good idea imho. In fact I had that idea after googling for "hang _thread_kern_sched_state_unlock". Mozilla, openldap, mysqld, apache, ethereal... > Maybe calling > etpan_thread_manager_join() only if !defined *BSD would be better ? If i only > comment those two calls and let the *_main_done() function terminate, the > deadlock doesn't happen. But btw i'm open to a better fix... I did return; on purpose to avoid freeing things possibly accessed by the unstopped thread :)
(In reply to comment #16) > (In reply to comment #13) > > with these two returns added, i've been able to launch, use and close claws ten > > times with pgp plugins loaded without being able to reproduce the deadlock. It > > seems to fix the problem, but inconditionally blaming all *BSD pthreads > > implementation and directly returning is not a good idea imho. > > In fact I had that idea after googling for "hang > _thread_kern_sched_state_unlock". Mozilla, openldap, mysqld, apache, > ethereal... Effectively. > > Maybe calling > > etpan_thread_manager_join() only if !defined *BSD would be better ? If i only > > comment those two calls and let the *_main_done() function terminate, the > > deadlock doesn't happen. But btw i'm open to a better fix... > > I did return; on purpose to avoid freeing things possibly accessed by the > unstopped thread :) Aaah, yes you're right... then i think it's "the best" solution... or the least worse.
Changes related to this bug have been committed. Please check latest CVS and update the bug accordingly. You can also get the patch from: http://www.colino.net/claws-mail/ 2008-01-24 [colin] 3.2.0cvs67 * src/etpan/imap-thread.c * src/etpan/nntp-thread.c Fix bug 1348, 'Hang ups at exit time with pgp plugin since 3.0.2'
I dislike that 'fix' too, but it'll still be better than hanging. :)