Discussion:
Debug information for lyx crash with data loss
José Matos
2014-06-29 11:01:08 UTC
Permalink
Hi,
a fedora user submitted the following report
https://bugzilla.redhat.com/show_bug.cgi?id=1114263

I hope that some of the debug information provided helps to find the cause of the crash. :-(

Regards,
--
José Abílio
Jean-Marc Lasgouttes
2014-06-29 12:06:58 UTC
Permalink
What is thé contents of thé truncated document ?

JMarc
Post by José Matos
Hi,
a fedora user submitted the following report
https://bugzilla.redhat.com/show_bug.cgi?id=1114263
I hope that some of the debug information provided helps to find the cause of the crash. :-(
Regards,
--
José Abílio
--
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la briÚveté.
Richard Heck
2014-06-29 13:32:17 UTC
Permalink
What is thé contents of thé truncated document ?
Here's a longer backtrace:

#0 0x0000000000a99af0 in lyx::to_utf8(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) ()
No symbol table info available.
#1 0x00000000005c6577 in lyx::Paragraph::write(std::ostream&, lyx::BufferParams const&, unsigned long&) const ()
No symbol table info available.
#2 0x00000000005ed142 in lyx::Text::write(std::ostream&) const ()
No symbol table info available.
#3 0x0000000000830afa in lyx::Tabular::write(std::ostream&) const ()
No symbol table info available.
#4 0x00000000005c6bab in lyx::Paragraph::write(std::ostream&, lyx::BufferParams const&, unsigned long&) const ()
No symbol table info available.
#5 0x00000000005ed142 in lyx::Text::write(std::ostream&) const ()
No symbol table info available.
#6 0x00000000005c6bab in lyx::Paragraph::write(std::ostream&, lyx::BufferParams const&, unsigned long&) const ()
No symbol table info available.
#7 0x00000000005ed142 in lyx::Text::write(std::ostream&) const ()
No symbol table info available.
#8 0x000000000048bfa9 in lyx::Buffer::write(std::ostream&) const ()
No symbol table info available.
#9 0x000000000048e86b in lyx::Buffer::writeFile(lyx::support::FileName const&) const ()
No symbol table info available.
#10 0x000000000049bbac in lyx::Buffer::emergencyWrite() ()
No symbol table info available.
#11 0x00000000004bc335 in lyx::BufferList::emergencyWriteAll() ()
No symbol table info available.
#12 0x0000000000576650 in lyx::emergencyCleanup() ()
No symbol table info available.
#13 0x0000000000576781 in error_handler ()
No symbol table info available.
#14 <signal handler called>
No locals.
#15 0x0000000000a99af0 in lyx::to_utf8(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) ()
No symbol table info available.
#16 0x00000000005c6577 in lyx::Paragraph::write(std::ostream&, lyx::BufferParams const&, unsigned long&) const ()
No symbol table info available.
#17 0x00000000005ed142 in lyx::Text::write(std::ostream&) const ()
No symbol table info available.
#18 0x0000000000830afa in lyx::Tabular::write(std::ostream&) const ()
No symbol table info available.
#19 0x00000000005c6bab in lyx::Paragraph::write(std::ostream&, lyx::BufferParams const&, unsigned long&) const ()
No symbol table info available.
#20 0x00000000005ed142 in lyx::Text::write(std::ostream&) const ()
No symbol table info available.
#21 0x00000000005c6bab in lyx::Paragraph::write(std::ostream&, lyx::BufferParams const&, unsigned long&) const ()
No symbol table info available.
#22 0x00000000005ed142 in lyx::Text::write(std::ostream&) const ()
No symbol table info available.
#23 0x000000000048bfa9 in lyx::Buffer::write(std::ostream&) const ()
No symbol table info available.
#24 0x000000000048e86b in lyx::Buffer::writeFile(lyx::support::FileName const&) const ()
No symbol table info available.
#25 0x00000000004a1b2d in lyx::Buffer::autoSave() const ()
No symbol table info available.
#26 0x00000000008aacfb in lyx::frontend::GuiView::GuiViewPrivate::autosaveAndDestroy(lyx::Buffer const*, lyx::Buffer*) ()
No symbol table info available.

Note #25: autoSave. And we get the same crash twice: Once when writing
the file the first time, then again when doing the emergency save.

The key part is:

#15 0x0000000000a99af0 in lyx::to_utf8(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) ()
No symbol table info available.
#16 0x00000000005c6577 in lyx::Paragraph::write(std::ostream&, lyx::BufferParams const&, unsigned long&) const ()
No symbol table info available.
#17 0x00000000005ed142 in lyx::Text::write(std::ostream&) const ()
No symbol table info available.
#18 0x0000000000830afa in lyx::Tabular::write(std::ostream&) const ()
No symbol table info available.

So the crash actually is within a table, but it's happening in the call
to to_utf8. The only such calls in Paragraph::write are at the very
beginning and the very end, the latter hidden in the call to flushString
(which may be inlined).

Is invalid data being passed to to_utf8?

Richard
Richard Heck
2014-06-29 14:58:53 UTC
Permalink
Not sure how important it is, but looking at

#15 0x0000000000a99af0 in lyx::to_utf8(std::basic_string<wchar_t,
std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) ()
#16 0x00000000005c6577 in lyx::Paragraph::write(std::ostream&,
lyx::BufferParams const&, unsigned long&) const ()
#17 0x00000000005ed142 in lyx::Text::write(std::ostream&) const ()
#18 0x0000000000830afa in lyx::Tabular::write(std::ostream&) const ()
#19 0x00000000005c6bab in lyx::Paragraph::write(std::ostream&,
lyx::BufferParams const&, unsigned long&) const ()
#20 0x00000000005ed142 in lyx::Text::write(std::ostream&) const ()
#21 0x00000000005c6bab in lyx::Paragraph::write(std::ostream&,
lyx::BufferParams const&, unsigned long&) const ()
#22 0x00000000005ed142 in lyx::Text::write(std::ostream&) const ()
#23 0x000000000048bfa9 in lyx::Buffer::write(std::ostream&) const ()

It looks as if this table is inside some other sort of inset: We are
three calls deep to Text::write.

And what I said before was wrong: There are several flushString calls in
Paragraph::write, so the crashing call to to_utf8 could come in any of
them. The crashing call certainly isn't the first one, since
"\begin_inset Tabluar" does actually get written (and it's hard to see
how that could crash, anyway).

Richard
Pavel Sanda
2014-06-30 22:54:31 UTC
Permalink
Post by Richard Heck
Paragraph::write, so the crashing call to to_utf8 could come in any of
them. The crashing call certainly isn't the first one, since "\begin_inset
smells little bit like concurrency problem inside to_utf8.
we might try doing autosave while using to_utf8 somewhere in ui?

p
Richard Heck
2014-07-01 13:37:47 UTC
Permalink
Post by Pavel Sanda
Post by Richard Heck
Paragraph::write, so the crashing call to to_utf8 could come in any of
them. The crashing call certainly isn't the first one, since "\begin_inset
smells little bit like concurrency problem inside to_utf8.
we might try doing autosave while using to_utf8 somewhere in ui?
I wondered about something like this, too: There was the very big change
in to_utf8, that we
use QThreadStorage to have per-thread storage for the IConvProcessor.
That said, if you look
at the full backtrace, there's no real activity in the other threads.
And people are seeing this on
normal save, too (i.e., no cloning). There's also the fact that we
always seem to see this with
tables....

I don't know if you noticed, but I missed it the first time: It's
reporting a segfault in to_utf8.
What could be causing that? I.e., where is the invalid access?

Richard
Richard Heck
2014-07-01 15:36:25 UTC
Permalink
Post by Pavel Sanda
Post by Richard Heck
Paragraph::write, so the crashing call to to_utf8 could come in any of
them. The crashing call certainly isn't the first one, since "\begin_inset
smells little bit like concurrency problem inside to_utf8.
we might try doing autosave while using to_utf8 somewhere in ui?
Is it possible that, somehow, the use of per-thread storage is causing
crashes now in exactly the
same cases where we previously got file corruption?

Richard
Georg Baum
2014-07-01 18:38:25 UTC
Permalink
Post by Richard Heck
Post by Pavel Sanda
Post by Richard Heck
Paragraph::write, so the crashing call to to_utf8 could come in any of
them. The crashing call certainly isn't the first one, since
"\begin_inset
smells little bit like concurrency problem inside to_utf8.
to_utf8() should not have any concurrency problem anymore. Each thread uses
its own data, and in one thread you cannot be twice in to_utf8 at any given
time.
Post by Richard Heck
Post by Pavel Sanda
we might try doing autosave while using to_utf8 somewhere in ui?
Is it possible that, somehow, the use of per-thread storage is causing
crashes now in exactly the
same cases where we previously got file corruption?
I am pretty sure that our own code related to iconv is completely thread
safe now. However, if there was a bug in QThreadStore, or we misunderstood
how QThreadStore works, then this could indeed be the case.


Georg
Richard Heck
2014-07-01 18:47:41 UTC
Permalink
Post by Georg Baum
Post by Richard Heck
Post by Pavel Sanda
Post by Richard Heck
Paragraph::write, so the crashing call to to_utf8 could come in any of
them. The crashing call certainly isn't the first one, since "\begin_inset
smells little bit like concurrency problem inside to_utf8.
to_utf8() should not have any concurrency problem anymore. Each thread uses
its own data, and in one thread you cannot be twice in to_utf8 at any given
time.
Post by Richard Heck
Post by Pavel Sanda
we might try doing autosave while using to_utf8 somewhere in ui?
Is it possible that, somehow, the use of per-thread storage is causing
crashes now in exactly the
same cases where we previously got file corruption?
I am pretty sure that our own code related to iconv is completely thread
safe now. However, if there was a bug in QThreadStore, or we misunderstood
how QThreadStore works, then this could indeed be the case.
That was what I was thinking, too. Just guessing at this point.

rh
Georg Baum
2014-07-01 20:36:12 UTC
Permalink
Post by Richard Heck
That was what I was thinking, too. Just guessing at this point.
Yes. It would be really nice if we had a more reliable way to reproduce the
crash. OTOH it is good that it does not occur very often;-)

I went over the code again and found and fixed a memory leak (and missing
iconv_close() call caused by that), but I doubt that this has anything to do
with the crash: An IconvProcessor instance is only copied when inserting a
new one into the map in ucs4To8bitProcessors(), but to_utf8 uses its own
dedicated IconvProcessor which is never copied.


Georg
Richard Heck
2014-07-02 15:53:17 UTC
Permalink
Post by Georg Baum
Post by Richard Heck
That was what I was thinking, too. Just guessing at this point.
Yes. It would be really nice if we had a more reliable way to reproduce the
crash. OTOH it is good that it does not occur very often;-)
I went over the code again and found and fixed a memory leak (and missing
iconv_close() call caused by that), but I doubt that this has anything to do
with the crash: An IconvProcessor instance is only copied when inserting a
new one into the map in ucs4To8bitProcessors(), but to_utf8 uses its own
dedicated IconvProcessor which is never copied.
Do you think that should go to 2.1.1? Or does it need testing?

Richard
Georg Baum
2014-07-02 19:37:10 UTC
Permalink
Post by Richard Heck
Post by Georg Baum
Post by Richard Heck
That was what I was thinking, too. Just guessing at this point.
Yes. It would be really nice if we had a more reliable way to reproduce
the crash. OTOH it is good that it does not occur very often;-)
I went over the code again and found and fixed a memory leak (and missing
iconv_close() call caused by that), but I doubt that this has anything to
do with the crash: An IconvProcessor instance is only copied when
inserting a new one into the map in ucs4To8bitProcessors(), but to_utf8
uses its own dedicated IconvProcessor which is never copied.
Do you think that should go to 2.1.1? Or does it need testing?
I think it would need testing. I also think that it probably does not affect
the bug, so I'd rather keep it out.


Georg
Georg Baum
2014-07-01 20:50:24 UTC
Permalink
Post by Richard Heck
That was what I was thinking, too. Just guessing at this point.
Another wild guess: Since the backtrace shows autoSave(), and autoSave()
runs in a separate thread and operates on the cloned buffer: Could the non-
thread-safety of the cloning business be the real problem? At least I saw
some comments about that in Buffer.cpp, maybe there are more places where
the cloned buffer is used in an unsafe way?

It may be possible to verify whether this guess is true by doing autoSave()
in a loop, and then trying to do any other editing operations at the same
time.


Georg
Richard Heck
2014-07-02 15:07:19 UTC
Permalink
Post by Georg Baum
Post by Richard Heck
That was what I was thinking, too. Just guessing at this point.
Another wild guess: Since the backtrace shows autoSave(), and autoSave()
runs in a separate thread and operates on the cloned buffer: Could the non-
thread-safety of the cloning business be the real problem? At least I saw
some comments about that in Buffer.cpp, maybe there are more places where
the cloned buffer is used in an unsafe way?
It may be possible to verify whether this guess is true by doing autoSave()
in a loop, and then trying to do any other editing operations at the same
time.
There was a report from someone who did not have autosave enabled.

Richard
Jean-Marc Lasgouttes
2014-07-02 17:50:00 UTC
Permalink
Post by Richard Heck
There was a report from someone who did not have autosave enabled.
The other possibility is to look at changes that occurred in Tabular
code. The main difference that I see is new support for
insert/duplicate-row. Do we have indications that people have used these
functions in their editing session?

I find the code for CellData::operator= quite weird:

Tabular::CellData & Tabular::CellData::operator=(CellData cs)
{
swap(cs);
return *this;
}

void Tabular::CellData::swap(CellData & rhs)
{
std::swap(cellno, rhs.cellno);
std::swap(width, rhs.width);
...

Is it possible that this code does weird things when duplicating a row?
I do not thin that an operator= implementation with a parameter passed
by value is very standard. And I do not know how this mixes with the
semantics of shared_ptr (used by the inset member).

JMarc
Scott Kostyshak
2014-07-02 18:40:00 UTC
Permalink
Post by Richard Heck
There was a report from someone who did not have autosave enabled.
The other possibility is to look at changes that occurred in Tabular code.
The main difference that I see is new support for insert/duplicate-row. Do
we have indications that people have used these functions in their editing
session?
Also move-row-left/right/down/up, and even if users don't remember,
they might have accidentally pressed "alt + right/left/up/down", which
is a shortcut for it.

Scott
Paul A. Rubin
2014-07-02 20:29:53 UTC
Permalink
On Wed, Jul 2, 2014 at 1:50 PM, Jean-Marc Lasgouttes <lasgouttes <at>
Post by Richard Heck
There was a report from someone who did not have autosave enabled.
The other possibility is to look at changes that occurred in Tabular code.
The main difference that I see is new support for insert/duplicate-row. Do
we have indications that people have used these functions in their editing
session?
Also move-row-left/right/down/up, and even if users don't remember,
they might have accidentally pressed "alt + right/left/up/down", which
is a shortcut for it.
Scott
Assuming this is the bug that bit me, I definitely did not insert a
duplicate row, and I saw no changes any tables prior to the crash (so
Scott's accidental use of shortcut scenario is unlikely my case).

Paul
Richard Heck
2014-07-02 20:47:50 UTC
Permalink
Post by Paul A. Rubin
On Wed, Jul 2, 2014 at 1:50 PM, Jean-Marc Lasgouttes <lasgouttes <at>
Post by Richard Heck
There was a report from someone who did not have autosave enabled.
The other possibility is to look at changes that occurred in Tabular code.
The main difference that I see is new support for insert/duplicate-row. Do
we have indications that people have used these functions in their editing
session?
Also move-row-left/right/down/up, and even if users don't remember,
they might have accidentally pressed "alt + right/left/up/down", which
is a shortcut for it.
Scott
Assuming this is the bug that bit me, I definitely did not insert a
duplicate row, and I saw no changes any tables prior to the crash (so
Scott's accidental use of shortcut scenario is unlikely my case).
I'm not surprised. It's hard to see why this kind of thing would cause a
crash in to_utf8. You'd
expect problems of the sort Scott and JMarc mentioned to cause a crash
in the Tabular code
itself.

Richard

Georg Baum
2014-07-02 19:40:14 UTC
Permalink
Post by Jean-Marc Lasgouttes
Post by Richard Heck
There was a report from someone who did not have autosave enabled.
Maybe there are two different triggers? When I find some time, I'll play a
bit with autosave.
Post by Jean-Marc Lasgouttes
The other possibility is to look at changes that occurred in Tabular
code. The main difference that I see is new support for
insert/duplicate-row. Do we have indications that people have used these
functions in their editing session?
Tabular::CellData & Tabular::CellData::operator=(CellData cs)
{
swap(cs);
return *this;
}
void Tabular::CellData::swap(CellData & rhs)
{
std::swap(cellno, rhs.cellno);
std::swap(width, rhs.width);
...
Is it possible that this code does weird things when duplicating a row?
I do not thin that an operator= implementation with a parameter passed
by value is very standard. And I do not know how this mixes with the
semantics of shared_ptr (used by the inset member).
This looks indeed strange. I'd definitely change this to standard signature
in trunk, but from a quick glance I don't see any immediate problem with
this code.


Georg
Richard Heck
2014-07-02 19:46:40 UTC
Permalink
Post by Georg Baum
Post by Richard Heck
There was a report from someone who did not have autosave enabled.
Maybe there are two different triggers? When I find some time, I'll play a
bit with autosave.
The other report came with a normal save. So it's saving that triggers
it, and the "randomness" of the crash, as reported by most people, is
explained by autosave.

Richard
Georg Baum
2014-06-29 17:52:50 UTC
Permalink
Post by Richard Heck
#15 0x0000000000a99af0 in lyx::to_utf8(std::basic_string<wchar_t,
#std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) ()
No symbol table info available.
#16 0x00000000005c6577 in lyx::Paragraph::write(std::ostream&,
#lyx::BufferParams const&, unsigned long&) const ()
No symbol table info available.
#17 0x00000000005ed142 in lyx::Text::write(std::ostream&) const ()
No symbol table info available.
#18 0x0000000000830afa in lyx::Tabular::write(std::ostream&) const ()
No symbol table info available.
So the crash actually is within a table, but it's happening in the call
to to_utf8. The only such calls in Paragraph::write are at the very
beginning and the very end, the latter hidden in the call to flushString
(which may be inlined).
Well, flushString() is called several times.
Post by Richard Heck
Is invalid data being passed to to_utf8?
Either that, or ucs4_to_utf8() returned invalid data, or some memory
corruption happened earlier (or in ucs4_to_utf8()) which kicks now in when
new objects are allocated. I can't see any other possibility for a crash in
the two lines of to_utf8().


Georg
Loading...