Bug 1348 - Hang ups at exit time with pgp plugin since 3.0.2
Summary: Hang ups at exit time with pgp plugin since 3.0.2
Status: RESOLVED FIXED
Alias: None
Product: Claws Mail (GTK 2)
Classification: Unclassified
Component: Folders/IMAP (show other bugs)
Version: 3.0.2
Hardware: PC OpenBSD
: P3 normal
Assignee: users
URL:
Depends on:
Blocks:
 
Reported: 2007-10-05 07:41 UTC by Pierre-Yves Ritschard
Modified: 2008-01-24 16:43 UTC (History)
2 users (show)

See Also:


Attachments
debug log (304.58 KB, application/octet-stream)
2007-11-19 10:04 UTC, Landry Breuil
no flags Details
gdb trace for 3.0.2 (2.42 KB, application/octet-stream)
2007-11-26 12:45 UTC, Landry Breuil
no flags Details
gdb trace for 3.1.0 (3.18 KB, application/octet-stream)
2007-11-26 12:46 UTC, Landry Breuil
no flags Details
gdb trace for 3.2.0cvs58 (2.69 KB, text/plain)
2008-01-22 12:16 UTC, Landry Breuil
no flags Details
possible fix (1.31 KB, patch)
2008-01-23 19:39 UTC, Colin Leroy
no flags Details | Diff

Description Pierre-Yves Ritschard 2007-10-05 07:41:57 UTC
Since I updated to 3.0.2, Whenever I attempt to shutdown claws either through an Unmap request in the window manager or by selecting exit in claws, the process hangs leaving the window (without refreshes) and SIGKILL has to be sent to the claws process to remove it.

I only get the problem when pgp-core is loaded, so it doesn't seem to be triggered by its subsequent plugins (pgp-inline and pgp-mime) since when only pgp-core is loaded I get the hangs.

For now I have disabled PGP processing and can run fine with trayicon, vcalendar and rssyl.
Comment 1 Colin Leroy 2007-10-05 08:04:47 UTC
Can you send a --debug log?
Comment 2 Pierre-Yves Ritschard 2007-10-05 08:31:22 UTC
(In reply to comment #1)
> Can you send a --debug log?
> 

Before sending the log I moved my .claws-mail folder and started a simple configuration (mbox file) + loaded pgp-{core,inline,mime}.

No hangups anymore, I will track down what triggers the hang in my configuration.
In the meantime this bug can be closed.

Thanks.
Comment 3 Landry Breuil 2007-11-19 10:04:36 UTC
Created attachment 512 [details]
debug log

Can you reopen this bug ?
If i enable pgp-core, i'm getting this hangup on close too, i have to pkill -9 claws. Log is not really significant, hangup seems to appear at plugin close/destroy.. i have checked, this didn't happen with 3.0.1. 
And i confirm this doesn't happen with an empty conf(simple mbox only)
Comment 4 Colin Leroy 2007-11-19 10:10:56 UTC
Can you run through gdb, and when it hangs at quit, do a Ctrl-C followed by 
"thread apply all bt"

Thanks,
Comment 5 Landry Breuil 2007-11-26 12:45:45 UTC
Created attachment 513 [details]
gdb trace for 3.0.2
Comment 6 Landry Breuil 2007-11-26 12:46:30 UTC
Created attachment 514 [details]
gdb trace for 3.1.0

I confirm this still happens with 3.1.0, and seems related to a thread deadlock issue..
Comment 7 Landry Breuil 2007-12-25 19:08:10 UTC
(In reply to comment #6)
> Created an attachment (id=514) [edit]
> gdb trace for 3.1.0
> 
> I confirm this still happens with 3.1.0, and seems related to a thread deadlock
> issue..
> 

And it still happens with 3.2.0, but not repeatedly... what can have change in thread handling since 3.0.1, last time it worked fine ? What can we test to see what triggers the hang ?
Comment 8 Colin Leroy 2007-12-26 08:31:49 UTC
(In reply to comment #7)

> And it still happens with 3.2.0, but not repeatedly... what can have change in
> thread handling since 3.0.1, last time it worked fine ? What can we test to see
> what triggers the hang ?
 
I have no idea - it never happened to me despite heavy usage of pgp/* plugins... Nothing changed with thread management since 3.0.1 that I remember of, and pgp plugins only use a temporary thread to do the signature checks...
Comment 9 Colin Leroy 2008-01-21 17:55:47 UTC
OK, maybe some news. Can you look at bug #1478, and try the attached patch and/or snapshots (3.2.0cvs57 or greater, plugins RSSyl VCalendar and GtkHtml2Viewer from CVS or snapshots too) ?

If the hangups appeared at 3.0.2, it may not be the solution, but if they appeared at 3.0.0 but you upgraded from 2.10.0 to 3.0.2, that may be the reason.
Comment 10 Landry Breuil 2008-01-22 12:16:06 UTC
Created attachment 540 [details]
gdb trace for 3.2.0cvs58

Iirc the deadlock happens since 3.0.2, it wasn't triggered in 3.0.1.
And i'm sorry to say that with this snapshot, it still happens (see attached thread bt)
Running it three times in gdb didn't trigger the deadlock, but when launching claws normally it deadlocked at the second try, i attached it to gdb to get the trace.
Comment 11 Colin Leroy 2008-01-23 19:39:21 UTC
Created attachment 547 [details]
possible fix

Could you try this attached patch? It justs avoids stopping the imap manager threads at exit.
Comment 12 Colin Leroy 2008-01-23 19:41:03 UTC
(this is in imap, but it looks like a deadlock in the pthreads library)
Comment 13 Landry Breuil 2008-01-24 12:33:59 UTC
with these two returns added, i've been able to launch, use and close claws ten times with pgp plugins loaded without being able to reproduce the deadlock. It seems to fix the problem, but inconditionally blaming all *BSD pthreads implementation and directly returning is not a good idea imho. Maybe calling etpan_thread_manager_join() only if !defined *BSD would be better ? If i only comment those two calls and let the *_main_done() function terminate, the deadlock doesn't happen. But btw i'm open to a better fix... 
Comment 14 Michael Rasmussen 2008-01-24 13:41:12 UTC
I have seen similar differencies in the Linux and OpenBSD thread implementations before.

A greylist daemon for Postfix always left zilions of zombie childs in memory when running under OpenBSD but not under Linux. Maybe something similar is causing problems wit claws-mail?

The solution was the following:

signal(SIGCHLD,NoZombies);

/************************************************/
/*                                              */
/* NoZombies: Empeche la creation de zombies    */
/* Quand on forke en System V                   */
/*                                              */
/************************************************/
/*                                              */
/* 		RIEN                            */
/*                                              */
/************************************************/
/*                                              */
/*      RIEN                                    */
/*                                              */
/************************************************/
// French documentation is made by Salim Gasmi

void NoZombies(int sig)
{
	while(waitpid(-1, NULL, WNOHANG) > 0);
}
Comment 15 Holger Berndt 2008-01-24 13:57:18 UTC
(In reply to comment #14)
> A greylist daemon for Postfix always left zilions of zombie childs in memory
> when running under OpenBSD but not under Linux.

Child processes and threads are two completely different things.
Comment 16 Colin Leroy 2008-01-24 14:22:51 UTC
(In reply to comment #13)
> with these two returns added, i've been able to launch, use and close claws ten
> times with pgp plugins loaded without being able to reproduce the deadlock. It
> seems to fix the problem, but inconditionally blaming all *BSD pthreads
> implementation and directly returning is not a good idea imho.

In fact I had that idea after googling for "hang _thread_kern_sched_state_unlock". Mozilla, openldap, mysqld, apache, ethereal...

> Maybe calling
> etpan_thread_manager_join() only if !defined *BSD would be better ? If i only
> comment those two calls and let the *_main_done() function terminate, the
> deadlock doesn't happen. But btw i'm open to a better fix... 

I did return; on purpose to avoid freeing things possibly accessed by the unstopped thread :)
Comment 17 Landry Breuil 2008-01-24 14:45:38 UTC
(In reply to comment #16)
> (In reply to comment #13)
> > with these two returns added, i've been able to launch, use and close claws ten
> > times with pgp plugins loaded without being able to reproduce the deadlock. It
> > seems to fix the problem, but inconditionally blaming all *BSD pthreads
> > implementation and directly returning is not a good idea imho.
> 
> In fact I had that idea after googling for "hang
> _thread_kern_sched_state_unlock". Mozilla, openldap, mysqld, apache,
> ethereal...

Effectively.

> > Maybe calling
> > etpan_thread_manager_join() only if !defined *BSD would be better ? If i only
> > comment those two calls and let the *_main_done() function terminate, the
> > deadlock doesn't happen. But btw i'm open to a better fix... 
> 
> I did return; on purpose to avoid freeing things possibly accessed by the
> unstopped thread :)

Aaah, yes you're right... then i think it's "the best" solution... or the least worse.
Comment 18 users 2008-01-24 16:39:59 UTC
Changes related to this bug have been committed.
Please check latest CVS and update the bug accordingly.
You can also get the patch from:
http://www.colino.net/claws-mail/

2008-01-24 [colin]	3.2.0cvs67

	* src/etpan/imap-thread.c
	* src/etpan/nntp-thread.c
		Fix bug 1348, 'Hang ups at exit time with 
		pgp plugin since 3.0.2'
Comment 19 Colin Leroy 2008-01-24 16:43:24 UTC
I dislike that 'fix' too, but it'll still be better than hanging. :)

Note You need to log in before you can comment on or make changes to this bug.