Bug 3979 - Hang (with killing needed) during action which extracts attachments
Summary: Hang (with killing needed) during action which extracts attachments
Status: RESOLVED FIXED
Alias: None
Product: Claws Mail
Classification: Unclassified
Component: UI/Actions (show other bugs)
Version: 3.16.0
Hardware: PC Linux
: P3 normal
Assignee: users
URL:
Depends on:
Blocks:
 
Reported: 2018-03-03 22:25 CET by George
Modified: 2019-01-04 19:56 CET (History)
0 users

See Also:


Attachments
debug log (22.06 KB, text/plain)
2018-03-03 22:26 CET, George
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description George 2018-03-03 22:25:23 CET
STR:

1. "Download messages" from an IMAP folder with 3000+ DMARC reports, each one attached as a .zip or .gz file

2. Execute action:

export METAMAIL_TMPDIR=/tmp/download; metamail -w -x -q %f > /tmp/export.log

EXPECTED
All attachments should be extracted in /tmp/download

ACTUAL
The program hangs during execution with an empty "Error" dialog and becomes unresponsive - needs killing of the process. In /tmp/download there are only 645 files.

Attaching debug log with hidden specific hex and other numbers as I don't know what they mean and what info they may contain (privacy concern).
Comment 1 George 2018-03-03 22:26:01 CET
Created attachment 1850 [details]
debug log
Comment 2 Andrej Kacian 2018-03-04 01:03:33 CET
Are you sure there is just the one empty error dialog? When I try something like this, I first get an error dialog with:

Could not fork to execute the following command: ...
Too many open files

Only after I close that one, CM continues, and eventually it hangs, sometimes with an empty error dialog.


Colin, is there a particular reason why you used g_spawn_async() and not g_spawn_sync() when reworking the actions in commit 52516eabc ? Originally, it was done with fork()+waitpid() combo, which would block the whole program - not ideal, but I don't think the underlying code handles asynchronous children well.
Comment 3 George 2018-03-04 09:06:57 CET
Yes, I am sure.
Comment 4 Colin Leroy 2018-03-04 15:42:35 CET
Andrej: I don't really remember.
Comment 5 Andrej Kacian 2019-01-03 23:12:20 CET
I managed to reproduce the described issue. GDB gives following backtrace:

[...]
action.c:982:spawning echo "/home/ticho/.claws-mail/imapcache/vala/ticho/ML/claws/users/3431" >`mktemp -u /tmp/zz/XXXXXX`: 1
imap.c:1571:trying to fetch cached /home/ticho/.claws-mail/imapcache/vala/ticho/ML/claws/users/3436
imap.c:1581:message 3436 has been already fully cached.
[Detaching after fork from child process 23466]
action.c:982:spawning echo "/home/ticho/.claws-mail/imapcache/vala/ticho/ML/claws/users/3436" >`mktemp -u /tmp/zz/XXXXXX`: 1
imap.c:1571:trying to fetch cached /home/ticho/.claws-mail/imapcache/vala/ticho/ML/claws/users/3437
imap.c:1581:message 3437 has been already fully cached.
[Detaching after fork from child process 23468]
action.c:982:spawning echo "/home/ticho/.claws-mail/imapcache/vala/ticho/ML/claws/users/3437" >`mktemp -u /tmp/zz/XXXXXX`: 1
imap.c:1571:trying to fetch cached /home/ticho/.claws-mail/imapcache/vala/ticho/ML/claws/users/3438
imap.c:1581:message 3438 has been already fully cached.
action.c:982:spawning echo "/home/ticho/.claws-mail/imapcache/vala/ticho/ML/claws/users/3438" >`mktemp -u /tmp/zz/XXXXXX`: 0
alertpanel.c:253:Creating alert panel dialog...
alertpanel.c:211:called inc_lock (lock count 1)
action.c:1541:Catching grand child's output.
action.c:1170:Freeing children data 0x5555570ebcd0
action.c:1541:Catching grand child's output.
action.c:1170:Freeing children data 0x5555570ebcd0
free(): double free detected in tcache 2

Thread 1 "claws-mail" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff587a85b in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff5865535 in __GI_abort () at abort.c:79
#2  0x00007ffff58bc718 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff59c729a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff58c2e3a in malloc_printerr (str=str@entry=0x7ffff59c8f60 "free(): double free detected in tcache 2") at malloc.c:5382
#4  0x00007ffff58c491d in _int_free (av=0x7ffff59fec40 <main_arena>, p=0x5555561998b0, have_lock=<optimized out>) at malloc.c:4193
#5  0x0000555555688023 in free_children (children=0x5555570ebcd0) at action.c:1172
#6  0x0000555555687d88 in wait_for_children (children=0x5555570ebcd0) at action.c:1094
#7  0x0000555555689859 in catch_output (data=0x5555570e8560, source=25, cond=G_IO_HUP) at action.c:1615
#8  0x00005555558eddcf in claws_io_invoke (source=0x55555603c330, condition=G_IO_HUP, data=0x5555570e0440) at gtkutils.c:1651
#9  0x00007ffff6809ae8 in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007ffff6809ed8 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00007ffff6809f6c in g_main_context_iteration () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x00007ffff7a61b01 in gtk_main_iteration () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#13 0x000055555569bc99 in alertpanel_show () at alertpanel.c:213
#14 0x000055555569b79c in alertpanel_message
    (title=0x55555594a17b "Error", message=0x7fffffffab50 "Could not fork to execute the following command:\necho \"/home/ticho/.claws-mail/imapcache/vala/ticho/ML/claws/users/3438\" >`mktemp -u /tmp/zz/XXXXXX`\nToo many open files", type=3) at alertpanel.c:131
#15 0x000055555569ba3e in alertpanel_error (format=0x555555948c88 "Could not fork to execute the following command:\n%s\n%s") at alertpanel.c:170
#16 0x0000555555687947 in fork_child (cmd=0x555557182760 "echo \"/home/ticho/.claws-mail/imapcache/vala/ticho/ML/claws/users/3438\" >`mktemp -u /tmp/zz/XXXXXX`", msg_str=0x5555570e1e50 "", children=0x5555570ebcd0) at action.c:987
#17 0x000055555568746a in execute_actions (action=0x5555562e394e "echo %f >`mktemp -u /tmp/zz/XXXXXX`", msg_list=0x55555689a890 = {...}, text=0x555555ea4e60, body_pos=0, partinfo=0x0, callback=0x0, data=0x0) at action.c:879
#18 0x0000555555686c34 in message_actions_execute (msgview=0x555555f86a00, action_nb=3, msg_list=0x55555689a890 = {...}) at action.c:690
#19 0x0000555555686790 in mainwin_actions_execute (mainwin=0x555555d074b0, action_nb=3, widget=0x0) at action.c:629
#20 0x0000555555686747 in mainwin_actions_execute_cb (widget=0x555555d2a220, data=0x555555d074b0) at action.c:620
#21 0x00007ffff68e9b6d in g_closure_invoke () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#22 0x00007ffff68fc8f3 in  () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#23 0x00007ffff6905882 in g_signal_emit_valist () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#24 0x00007ffff6905ecf in g_signal_emit () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#25 0x00007ffff7b79a4a in gtk_widget_activate () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#26 0x00007ffff7a764bd in gtk_menu_shell_activate_item () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#27 0x00007ffff7a7678e in  () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#28 0x00007ffff7a641eb in  () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#29 0x00007ffff68e9b6d in g_closure_invoke () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#30 0x00007ffff68fc124 in  () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#31 0x00007ffff6904f43 in g_signal_emit_valist () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#32 0x00007ffff6905ecf in g_signal_emit () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#33 0x00007ffff7b7acac in  () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#34 0x00007ffff7a6248c in gtk_propagate_event () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#35 0x00007ffff7a6287b in gtk_main_do_event () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#36 0x00007ffff78d5bac in  () at /usr/lib/x86_64-linux-gnu/libgdk-x11-2.0.so.0
#37 0x00007ffff6809c3e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#38 0x00007ffff6809ed8 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#39 0x00007ffff680a1d2 in g_main_loop_run () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#40 0x00007ffff7a618e7 in gtk_main () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#41 0x0000555555713292 in main (argc=1, argv=0x7fffffffe168) at main.c:1626
Comment 6 Andrej Kacian 2019-01-03 23:23:44 CET
The problem seems to be that by the time the code gets to wait_for_children(), the Children struct is already freed. Valgrind shows a series of following invalid reads and invalid writes, all differing only in the size read or written and first line of the first backtrace (all of them within the same function). This means that wait_for_children() tries to access various members of the struct when it's already freed.

==23766== Invalid write of size 4
==23766==    at 0x23BD2B: wait_for_children (action.c:1085)
==23766==    by 0x23D858: catch_output (action.c:1615)
==23766==    by 0x4A1DCE: claws_io_invoke (gtkutils.c:1651)
==23766==    by 0x5F88AE7: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x5F88ED7: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x5F88F6B: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x4BCAB00: gtk_main_iteration (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.32)
==23766==    by 0x24FC98: alertpanel_show (alertpanel.c:213)
==23766==    by 0x24F79B: alertpanel_message (alertpanel.c:131)
==23766==    by 0x24FA3D: alertpanel_error (alertpanel.c:170)
==23766==    by 0x23B946: fork_child (action.c:987)
==23766==    by 0x23B469: execute_actions (action.c:879)
==23766==  Address 0x1055e774 is 100 bytes inside a block of size 120 free'd
==23766==    at 0x48369AB: free (vg_replace_malloc.c:530)
==23766==    by 0x23C0CF: free_children (action.c:1186)
==23766==    by 0x23BD87: wait_for_children (action.c:1094)
==23766==    by 0x23D858: catch_output (action.c:1615)
==23766==    by 0x4A1DCE: claws_io_invoke (gtkutils.c:1651)
==23766==    by 0x5F88AE7: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x5F88ED7: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x5F88F6B: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x4BCAB00: gtk_main_iteration (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.32)
==23766==    by 0x24FC98: alertpanel_show (alertpanel.c:213)
==23766==    by 0x24F79B: alertpanel_message (alertpanel.c:131)
==23766==    by 0x24FA3D: alertpanel_error (alertpanel.c:170)
==23766==  Block was alloc'd at
==23766==    at 0x4837B65: calloc (vg_replace_malloc.c:752)
==23766==    by 0x5F8E698: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x23B2C3: execute_actions (action.c:841)
==23766==    by 0x23AC33: message_actions_execute (action.c:690)
==23766==    by 0x23A78F: mainwin_actions_execute (action.c:629)
==23766==    by 0x23A746: mainwin_actions_execute_cb (action.c:620)
==23766==    by 0x5EF5B6C: g_closure_invoke (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5800.1)
==23766==    by 0x5F088F2: ??? (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5800.1)
==23766==    by 0x5F11881: g_signal_emit_valist (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5800.1)
==23766==    by 0x5F11ECE: g_signal_emit (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5800.1)
==23766==    by 0x4CE2A49: gtk_widget_activate (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.32)
==23766==    by 0x4BDF4BC: gtk_menu_shell_activate_item (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.32)

At the very end, valgrind reports the same double-free that can be seen in my previous GDB backtrace:

==23766== Invalid free() / delete / delete[] / realloc()
==23766==    at 0x48369AB: free (vg_replace_malloc.c:530)
==23766==    by 0x23C0CF: free_children (action.c:1186)
==23766==    by 0x23BD87: wait_for_children (action.c:1094)
==23766==    by 0x23D858: catch_output (action.c:1615)
==23766==    by 0x4A1DCE: claws_io_invoke (gtkutils.c:1651)
==23766==    by 0x5F88AE7: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x5F88ED7: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x5F88F6B: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x4BCAB00: gtk_main_iteration (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.32)
==23766==    by 0x24FC98: alertpanel_show (alertpanel.c:213)
==23766==    by 0x24F79B: alertpanel_message (alertpanel.c:131)
==23766==    by 0x24FA3D: alertpanel_error (alertpanel.c:170)
==23766==  Address 0x1055e710 is 0 bytes inside a block of size 120 free'd
==23766==    at 0x48369AB: free (vg_replace_malloc.c:530)
==23766==    by 0x23C0CF: free_children (action.c:1186)
==23766==    by 0x23BD87: wait_for_children (action.c:1094)
==23766==    by 0x23D858: catch_output (action.c:1615)
==23766==    by 0x4A1DCE: claws_io_invoke (gtkutils.c:1651)
==23766==    by 0x5F88AE7: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x5F88ED7: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x5F88F6B: g_main_context_iteration (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x4BCAB00: gtk_main_iteration (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.32)
==23766==    by 0x24FC98: alertpanel_show (alertpanel.c:213)
==23766==    by 0x24F79B: alertpanel_message (alertpanel.c:131)
==23766==    by 0x24FA3D: alertpanel_error (alertpanel.c:170)
==23766==  Block was alloc'd at
==23766==    at 0x4837B65: calloc (vg_replace_malloc.c:752)
==23766==    by 0x5F8E698: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5800.1)
==23766==    by 0x23B2C3: execute_actions (action.c:841)
==23766==    by 0x23AC33: message_actions_execute (action.c:690)
==23766==    by 0x23A78F: mainwin_actions_execute (action.c:629)
==23766==    by 0x23A746: mainwin_actions_execute_cb (action.c:620)
==23766==    by 0x5EF5B6C: g_closure_invoke (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5800.1)
==23766==    by 0x5F088F2: ??? (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5800.1)
==23766==    by 0x5F11881: g_signal_emit_valist (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5800.1)
==23766==    by 0x5F11ECE: g_signal_emit (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5800.1)
==23766==    by 0x4CE2A49: gtk_widget_activate (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.32)
==23766==    by 0x4BDF4BC: gtk_menu_shell_activate_item (in /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0.2400.32)
Comment 7 users 2019-01-04 19:54:03 CET
Changes related to this bug have been committed.
Please check latest Git and update the bug accordingly.
You can also get the patch from:
http://git.claws-mail.org/

++ ChangeLog	2019-01-04 19:54:02.968507323 +0100
http://git.claws-mail.org/?p=claws.git;a=commitdiff;h=b249609f1b0639c075f4089555cf31eaa82fa104
Author: Andrej Kacian <ticho@claws-mail.org>
Date:   Fri Jan 4 19:50:08 2019 +0100

    Make children tracking in execute_actions() more async-aware
    
    We need to update the children count in children->nb
    after starting each child process instead of after
    starting all of them, because we use that count
    in wait_for_children(), which is run asynchronously
    for each child.
    
    Fixes bug #3979: Hang (with killing needed) during
    action which extracts attachments