Discussion:
Possible Fix For Mystery Crash (Bug 9049)
Richard Heck
2014-08-07 19:12:40 UTC
Permalink
I'm attaching a patch which I'm hoping will fix the mystery crash.
Here's the log message:

Investigation of bug #9236 showed that crash to be due to a Paragraph's
holding a
dangling pointer to an old and deleted Layout after the DocumentClass
was reset.
Since the backtraces look almost identical, it seems likely that we have
the same
problem here. Since this crash seems almost always to involve tables, I
looked at
the code in switchBetweenClasses() and found that the Paragraphs that
belong to
"hidden" table cells are not seen by the initial recursion using a
ParIterator:
It skips right over them. This was confirmed by test code suggested by
Enrico,
with results reported in Trac.

The present patch attempts to deal with this problem in the second
recursion,
over Insets. When we see an InsetTabular, we call a new routine that
recurses
through the cells, looking for hidden ones. If it finds a hidden one, it
then
resets the Layout for the cell's Paragraphs (there should be only one,
but we
do not make any assumptions) to the PlainLayout that belongs to the new
DocumentClass. This is good enough, since such cells never have content.

Note that this does not, by itself, fix #9236, since the problem in that
case
was that we never called switchBetweenClasses, anyway.

Unfortunately, I can't really test this. I still can't trigger the
crash, which makes
me wonder if this is really the cause, actually. But I think we should
be doing
this, anyway.

Comments welcome.

Richard
Pavel Sanda
2014-08-08 02:34:06 UTC
Permalink
problem here. Since this crash seems almost always to involve tables, I looked at
the code in switchBetweenClasses() and found that the Paragraphs that belong to
This description looks as something ancient in the code. Does it make any sense
that we see this crash only in 2.1? P
Richard Heck
2014-08-08 03:53:58 UTC
Permalink
Post by Pavel Sanda
problem here. Since this crash seems almost always to involve tables, I looked at
the code in switchBetweenClasses() and found that the Paragraphs that belong to
This description looks as something ancient in the code. Does it make any sense
that we see this crash only in 2.1?
Yes, the crash is really a result of several changes.

(i) Since we have had modules, the TextClass associated with a Buffer is
not fixed at startup but is static; this is now a DocumentClass object.
As a result, the Layout object that is associated with a given Paragraph
actually changes whenever the DocumentClass is updated, since a Layout
belongs to a TextClass.

(ii) We used to store all DocumentClass objects in a container, so that
we would not see crashes when, e.g., Paragraphs in the cut stack still
held pointers to them. This, however, caused us to use a lot of memory,
so, for 2.1.x, we stopped storing all the DocumentClass objects and used
shared_ptr instead.

(iii) When a DocumentClass is deleted, so are all its contained Layout
objects. If a Paragraph holds a pointer to one of them, then, it will
become invalid. This shouldn't be a problem, since the Layout objects
are supposed to be reset in switchBetweenClasses. But, for #9236, we
forgot to run that routine, and for #9049, we are missing some
Paragraphs, namely, those that are PART_OF_MULTI*.

(iv) The fact that we miss those Paragraphs has to do with weirdnesses
in how multirow and multicolumn cells are handled. A 3x3 table /always/
has nine cells, whether we use multi* or not. But these are not seen by
the ParIterator in switchBetweenClasses.

Richard
Pavel Sanda
2014-08-10 05:29:34 UTC
Permalink
Post by Richard Heck
Yes, the crash is really a result of several changes.
Thanks for explanation. Pavel
Jürgen Spitzmüller
2014-08-08 07:01:23 UTC
Permalink
Post by Richard Heck
I'm attaching a patch which I'm hoping will fix the mystery crash.
Looks good.

Nitpick: To this

Tabular::CellData const & cdata = tabular.cellInfo(idx);
if (cdata.multicolumn != Tabular::CELL_PART_OF_MULTICOLUMN
&& cdata.multirow != Tabular::CELL_PART_OF_MULTIROW)
continue;

I'd prefer

if (!tabular.isPartOfMultiColumn(r, c) && !tabular.isPartOfMultiRow(r, c))
continue;
Post by Richard Heck
Unfortunately, I can't really test this. I still can't trigger the
crash, which makes
me wonder if this is really the cause, actually. But I think we should
be doing
this, anyway.
I suppose the only way to assure this is to release LyX with this fix and look
if people still get the crash.
Post by Richard Heck
Comments welcome.
I'd say: push it into master and branch, let us test it some days, and then
release.

Jürgen
Post by Richard Heck
Richard
Richard Heck
2014-08-08 13:59:45 UTC
Permalink
Post by Jürgen Spitzmüller
Post by Richard Heck
I'm attaching a patch which I'm hoping will fix the mystery crash.
Looks good.
Nitpick: To this
Tabular::CellData const & cdata = tabular.cellInfo(idx);
if (cdata.multicolumn != Tabular::CELL_PART_OF_MULTICOLUMN
&& cdata.multirow != Tabular::CELL_PART_OF_MULTIROW)
continue;
I'd prefer
if (!tabular.isPartOfMultiColumn(r, c) && !tabular.isPartOfMultiRow(r, c))
continue;
Yes, that is better. I don't really know the tabular code.
Post by Jürgen Spitzmüller
Post by Richard Heck
Unfortunately, I can't really test this. I still can't trigger the
crash, which makes me wonder if this is really the cause, actually. But I think we should be doing this, anyway.
I suppose the only way to assure this is to release LyX with this fix and look
if people still get the crash.
I have an idea how to reproduce it. Let me try in a bit.

Richard
Richard Heck
2014-08-08 15:14:41 UTC
Permalink
Post by Richard Heck
Post by Jürgen Spitzmüller
Post by Richard Heck
I'm attaching a patch which I'm hoping will fix the mystery crash.
Looks good.
Nitpick: To this
Tabular::CellData const & cdata = tabular.cellInfo(idx);
if (cdata.multicolumn != Tabular::CELL_PART_OF_MULTICOLUMN
&& cdata.multirow != Tabular::CELL_PART_OF_MULTIROW)
continue;
I'd prefer
if (!tabular.isPartOfMultiColumn(r, c) &&
!tabular.isPartOfMultiRow(r, c))
continue;
Yes, that is better. I don't really know the tabular code.
Post by Jürgen Spitzmüller
Post by Richard Heck
Unfortunately, I can't really test this. I still can't trigger the
crash, which makes me wonder if this is really the cause, actually.
But I think we should be doing this, anyway.
I suppose the only way to assure this is to release LyX with this fix and look
if people still get the crash.
I have an idea how to reproduce it. Let me try in a bit.
I still can't trigger the crash. I suppose it must depend upon some
details about when certain areas of memory are over-written, etc. But I
think I can confirm the diagnosis. I put this code:

LYXERR0(d->layout_->name() << ": " << d->layout_);

into Paragraph::write(), just after:

os << "\n\\begin_layout " << to_utf8(d->layout_->name()) << '\n';

Then using a file with multi* cells, I first save it, getting:

Paragraph.cpp (1625): Standard: 0x2cd0670
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Standard: 0x2cd0670

If I then change the document class and re-save, I get:

Paragraph.cpp (1625): Standard: 0x23cd4f0
Paragraph.cpp (1625): Plain Layout: 0x23cd110
Paragraph.cpp (1625): Plain Layout: 0x23cd110
Paragraph.cpp (1625): Plain Layout: 0x23cd110
Paragraph.cpp (1625): Plain Layout: 0x23cd110
Paragraph.cpp (1625): Plain Layout: 0x23cd110
Paragraph.cpp (1625): Plain Layout: 0x23cd110
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x2d01380
Paragraph.cpp (1625): Plain Layout: 0x23cd110
Paragraph.cpp (1625): Plain Layout: 0x23cd110
Paragraph.cpp (1625): Plain Layout: 0x23cd110
Paragraph.cpp (1625): Plain Layout: 0x23cd110
Paragraph.cpp (1625): Standard: 0x23cd4f0

Note how two of the cells still have pointers to the old layout. These
are the hidden cells.

Richard
Jürgen Spitzmüller
2014-08-08 16:36:34 UTC
Permalink
Post by Richard Heck
Note how two of the cells still have pointers to the old layout. These
are the hidden cells.
Excellent analysis! I suppose you tested that with your patch all pointers are
updated?

Jürgen
Richard Heck
2014-08-08 18:14:40 UTC
Permalink
Post by Jürgen Spitzmüller
Post by Richard Heck
Note how two of the cells still have pointers to the old layout. These
are the hidden cells.
Excellent analysis! I suppose you tested that with your patch all pointers are
updated?
Amazingly, it had not occurred to me to do that, and it was not working
properly. It did not occur to me that:
tabular.cellInset(r, c);
is not the same as:
tabular.cellInset(tabular.cellIndex(r, c));
I fixed that, and it does seem to work now.

Still, it bothers me that I cannot get it to crash, even though I can
see what ought to be a dangling pointer.

Richard
Jürgen Spitzmüller
2014-08-09 06:02:56 UTC
Permalink
Post by Richard Heck
tabular.cellInset(r, c);
tabular.cellInset(tabular.cellIndex(r, c));
Strange. Why is that?

Jürgen
Richard Heck
2014-08-09 16:28:04 UTC
Permalink
Post by Jürgen Spitzmüller
Post by Richard Heck
tabular.cellInset(r, c);
tabular.cellInset(tabular.cellIndex(r, c));
Strange. Why is that?
It appears that cellIndex returns the index of the VISIBLE cell to which
the row and column refer. So if it's part of a multi* cell, you get a
different answer.

Richard
Jürgen Spitzmüller
2014-08-09 16:56:33 UTC
Permalink
Post by Richard Heck
It appears that cellIndex returns the index of the VISIBLE cell to which
the row and column refer. So if it's part of a multi* cell, you get a
different answer.
I see. I think this difference should be documented, then.

Jürgen
Richard Heck
2014-08-09 17:13:37 UTC
Permalink
Post by Jürgen Spitzmüller
Post by Richard Heck
It appears that cellIndex returns the index of the VISIBLE cell to which
the row and column refer. So if it's part of a multi* cell, you get a
different answer.
I see. I think this difference should be documented, then.
Done.

rh
Enrico Forestieri
2014-08-11 01:28:48 UTC
Permalink
Post by Richard Heck
Post by Jürgen Spitzmüller
Post by Richard Heck
Note how two of the cells still have pointers to the old layout. These
are the hidden cells.
Excellent analysis! I suppose you tested that with your patch all pointers are
updated?
Amazingly, it had not occurred to me to do that, and it was not
tabular.cellInset(r, c);
tabular.cellInset(tabular.cellIndex(r, c));
I fixed that, and it does seem to work now.
Still, it bothers me that I cannot get it to crash, even though I
can see what ought to be a dangling pointer.
Actually, I am not able to get a dangling pointer, even if the hidden
cells are not updated. See the attached patch. The warning never triggers
for me.
--
Enrico
Richard Heck
2014-08-11 15:16:21 UTC
Permalink
Post by Enrico Forestieri
Post by Richard Heck
Post by Jürgen Spitzmüller
Post by Richard Heck
Note how two of the cells still have pointers to the old layout. These
are the hidden cells.
Excellent analysis! I suppose you tested that with your patch all pointers are
updated?
Amazingly, it had not occurred to me to do that, and it was not
tabular.cellInset(r, c);
tabular.cellInset(tabular.cellIndex(r, c));
I fixed that, and it does seem to work now.
Still, it bothers me that I cannot get it to crash, even though I
can see what ought to be a dangling pointer.
Actually, I am not able to get a dangling pointer, even if the hidden
cells are not updated. See the attached patch. The warning never triggers
for me.
This isn't terribly surprising. My suspicion is that we are holding a
reference to the
DocumentClass somewhere. There was something weird like this with the
other bug,
too. To get it to crash, I had to have a certain amount of stuff in
Local Layout, even
though none of the layouts defined there appeared in the document
itself. So it may
be, e.g., that there are other things that have to happen to make this
pointer dangle.

What certainly is true is that the pointer *can* dangle under these
sorts of
circumstances. Are there others?

Maybe for safety this should be made a smart pointer?

Richard
Enrico Forestieri
2014-08-11 21:07:35 UTC
Permalink
Post by Richard Heck
Post by Enrico Forestieri
Post by Richard Heck
Post by Jürgen Spitzmüller
Post by Richard Heck
Note how two of the cells still have pointers to the old layout. These
are the hidden cells.
Excellent analysis! I suppose you tested that with your patch all pointers are
updated?
Amazingly, it had not occurred to me to do that, and it was not
tabular.cellInset(r, c);
tabular.cellInset(tabular.cellIndex(r, c));
I fixed that, and it does seem to work now.
Still, it bothers me that I cannot get it to crash, even though I
can see what ought to be a dangling pointer.
Actually, I am not able to get a dangling pointer, even if the hidden
cells are not updated. See the attached patch. The warning never triggers
for me.
This isn't terribly surprising. My suspicion is that we are holding
a reference to the
DocumentClass somewhere. There was something weird like this with
the other bug,
too. To get it to crash, I had to have a certain amount of stuff in
Local Layout, even
though none of the layouts defined there appeared in the document
itself. So it may
be, e.g., that there are other things that have to happen to make
this pointer dangle.
In that case, it was reliably crashing for me even without a local layout.
Post by Richard Heck
What certainly is true is that the pointer *can* dangle under these
sorts of
circumstances. Are there others?
Maybe for safety this should be made a smart pointer?
I don't know. However, I suggest not to take countermeasures without
a clear understanding of the problem. The missing update of hidden
cells is surely an issue that has to be corrected, though.

However, things may be more complicated than that. For example, when
applying the layout3.diff patch attached to bug 9236, I always get
a warning when copying or cutting something, meaning that a dangling
pointer is in existance. It may very well be that this pointer is later
updated and there is no problem, actually. But if another thread tries
to use that pointer before it is updated (like autosave, maybe), then
things may still go awry. This kind of races are difficult to pinpoint.

Maybe we could actually implement a signature in the Layout class and
check that before using the layout_ pointer. If the signature is not
there because the class instance has been destroyed, we could try to
collect all possible info about this case and write it to a file,
telling the user to include that in a bug report if LyX actually
crashes (or we could simply avoid doing things that we know would
probably lead to a crash).
--
Enrico
Richard Heck
2014-08-12 01:40:16 UTC
Permalink
Post by Enrico Forestieri
Post by Richard Heck
Post by Enrico Forestieri
Post by Richard Heck
Post by Jürgen Spitzmüller
Post by Richard Heck
Note how two of the cells still have pointers to the old layout. These
are the hidden cells.
Excellent analysis! I suppose you tested that with your patch all pointers are
updated?
Amazingly, it had not occurred to me to do that, and it was not
tabular.cellInset(r, c);
tabular.cellInset(tabular.cellIndex(r, c));
I fixed that, and it does seem to work now.
Still, it bothers me that I cannot get it to crash, even though I
can see what ought to be a dangling pointer.
Actually, I am not able to get a dangling pointer, even if the hidden
cells are not updated. See the attached patch. The warning never triggers
for me.
This isn't terribly surprising. My suspicion is that we are holding
a reference to the
DocumentClass somewhere. There was something weird like this with
the other bug,
too. To get it to crash, I had to have a certain amount of stuff in
Local Layout, even
though none of the layouts defined there appeared in the document
itself. So it may
be, e.g., that there are other things that have to happen to make
this pointer dangle.
In that case, it was reliably crashing for me even without a local layout.
Post by Richard Heck
What certainly is true is that the pointer *can* dangle under these
sorts of circumstances. Are there others?
Maybe for safety this should be made a smart pointer?
I don't know. However, I suggest not to take countermeasures without
a clear understanding of the problem. The missing update of hidden
cells is surely an issue that has to be corrected, though.
However, things may be more complicated than that. For example, when
applying the layout3.diff patch attached to bug 9236, I always get
a warning when copying or cutting something, meaning that a dangling
pointer is in existance.
This is only without my patch or also with it?
Post by Enrico Forestieri
It may very well be that this pointer is later updated and there is no problem, actually. But if another thread tries to use that pointer before it is updated (like autosave, maybe), then things may still go awry. This kind of races are difficult to pinpoint.
I thought maybe copy-paste was part of the problem, as things get put on
the cut stack, taken off, etc, and this sort of updating does go on
there. It may be that we need more cleanup in copySelectionHelper. A lot
of that goes on already.

It would help just to add to layout3.diff a report on what the pointer
address is. Then we might have some idea where it originated.

I can try to play with this tomorrow.

Richard
Enrico Forestieri
2014-08-12 10:07:45 UTC
Permalink
Post by Richard Heck
Post by Enrico Forestieri
However, things may be more complicated than that. For example, when
applying the layout3.diff patch attached to bug 9236, I always get
a warning when copying or cutting something, meaning that a dangling
pointer is in existance.
This is only without my patch or also with it?
It makes no difference. The involved pointers are different from the
ones involved in the hidden cells of a tabular, seemingly.
However, the patch attached to 9049 does not really work, as you
already noted ealier in this thread. I used a modified version
taking into account Jürgen suggestion and your directions.
I suggest that you update the patch on track.
Post by Richard Heck
Post by Enrico Forestieri
It may very well be that this pointer is later updated and there is
no problem, actually. But if another thread tries to use that pointer
before it is updated (like autosave, maybe), then things may still go
awry. This kind of races are difficult to pinpoint.
I thought maybe copy-paste was part of the problem, as things get
put on the cut stack, taken off, etc, and this sort of updating does
go on there. It may be that we need more cleanup in
copySelectionHelper. A lot of that goes on already.
It would help just to add to layout3.diff a report on what the
pointer address is. Then we might have some idea where it
originated.
I attach here an updated version of layout3.diff that also reports
the pointer address, and a modified version of your patch for 9049.
--
Enrico
Richard Heck
2014-08-12 13:59:11 UTC
Permalink
Post by Enrico Forestieri
Post by Richard Heck
Post by Enrico Forestieri
However, things may be more complicated than that. For example, when
applying the layout3.diff patch attached to bug 9236, I always get
a warning when copying or cutting something, meaning that a dangling
pointer is in existance.
This is only without my patch or also with it?
It makes no difference. The involved pointers are different from the
ones involved in the hidden cells of a tabular, seemingly.
However, the patch attached to 9049 does not really work, as you
already noted ealier in this thread. I used a modified version
taking into account Jürgen suggestion and your directions.
I suggest that you update the patch on track.
Yes, right, of course. Done.

Richard
Richard Heck
2014-08-12 17:17:47 UTC
Permalink
Post by Enrico Forestieri
Post by Richard Heck
Post by Enrico Forestieri
However, things may be more complicated than that. For example, when
applying the layout3.diff patch attached to bug 9236, I always get
a warning when copying or cutting something, meaning that a dangling
pointer is in existance.
This is only without my patch or also with it?
It makes no difference. The involved pointers are different from the
ones involved in the hidden cells of a tabular, seemingly.
However, the patch attached to 9049 does not really work, as you
already noted ealier in this thread. I used a modified version
taking into account Jürgen suggestion and your directions.
I suggest that you update the patch on track.
I fixed a stupid thinko that was revealed by your debugging patch:
copying a layout instead of using a const ref. Probably not related, though.

I can get a crash on my 9049.lyx file just when copying the table, with
or without my patch. It's unpredictable, though. If I hold down ctrl-C
for long enough, though, it will crash.

I'm not sure how helpful the backtrace (below) is, but it is happening
in an InsetTableCell: In the destructor of its contained Paragraph. I
wonder if there is something wrong with the copying of pargraphs that
happens when we copy the table, e.g., we miss the paragraphs in the
hidden cells for some reason, and so we double delete or something?

Richard

=====

0 std::_Rb_tree... 0x639c75
1 clear stl_tree.h 860 0x63050d
2 clear stl_map.h 789 0x63050d
3 clear map.h 372 0x63050d
4 lyx::Paragraph::deregisterWords Paragraph.cpp 3784 0x63050d
5 lyx::Paragraph::~Paragraph Paragraph.cpp 1599 0x630a47
6 destroy new_allocator.h 133 0x46931b
7 std::__cxx1998::_List_base<lyx::Paragraph,
std::allocator<lyx::Paragraph> >::_M_clear list.tcc 77 0x46931b
8 ~_List_base stl_list.h 378 0x91adfb
9 ~list stl_list.h 438 0x91adfb
10 ~list list 120 0x91adfb
11 ~RandomAccessList RandomAccessList.h 41 0x91adfb
12 ~ParagraphList ParagraphList.h 23 0x91adfb
13 ~Text Text.h 41 0x91adfb
14 ~InsetText InsetText.h 33 0x91adfb
15 ~InsetTableCell InsetTabular.h 52 0x91adfb
16 lyx::InsetTableCell::~InsetTableCell InsetTabular.h 52 0x91adfb
17 operator() shared_ptr.h 285 0x918b17
18 std::tr1::_Sp_counted_base_impl<lyx::InsetTableCell*,
std::tr1::_Sp_deleter<lyx::InsetTableCell>,
(__gnu_cxx::_Lock_policy)2>::_M_dispose shared_ptr.h 257 0x918b17
19 _M_release shared_ptr.h 141 0x9199b7
20 ~__shared_count shared_ptr.h 341 0x9199b7
21 ~__shared_ptr shared_ptr.h 541 0x9199b7
22 ~shared_ptr shared_ptr.h 985 0x9199b7
23 ~CellData InsetTabular.h 618 0x9199b7
24 _Destroy<lyx::Tabular::CellData> stl_construct.h 93 0x9199b7
25 __destroy<lyx::Tabular::CellData*> stl_construct.h 103 0x9199b7
26 _Destroy<lyx::Tabular::CellData*> stl_construct.h 126 0x9199b7
27 _Destroy<lyx::Tabular::CellData*, lyx::Tabular::CellData>
stl_construct.h 151 0x9199b7
28 ~vector stl_vector.h 415 0x9199b7
29 ~vector vector 144 0x9199b7
30 _Destroy<std::__debug::vector<lyx::Tabular::CellData> >
stl_construct.h 93 0x9199b7
31 __destroy<std::__debug::vector<lyx::Tabular::CellData>*>
stl_construct.h 103 0x9199b7
32 _Destroy<std::__debug::vector<lyx::Tabular::CellData>*>
stl_construct.h 126 0x9199b7
33 _Destroy<std::__debug::vector<lyx::Tabular::CellData>*,
std::__debug::vector<lyx::Tabular::CellData> > stl_construct.h 151
0x9199b7
34 ~vector stl_vector.h 415 0x9199b7
35 ~vector vector 144 0x9199b7
36 lyx::Tabular::~Tabular InsetTabular.h 126 0x9199b7
37 lyx::InsetTabular::~InsetTabular InsetTabular.cpp 3471 0x8ffc62
38 lyx::InsetTabular::~InsetTabular InsetTabular.cpp 3474 0x8ffcb3
39 lyx::InsetList::~InsetList InsetList.cpp 75 0x71c0d8
40 lyx::Paragraph::Private::~Private Paragraph.cpp 593 0x630944
41 lyx::Paragraph::~Paragraph Paragraph.cpp 1600 0x630a58
42 destroy new_allocator.h 133 0x46931b
43 std::__cxx1998::_List_base<lyx::Paragraph,
std::allocator<lyx::Paragraph> >::_M_clear list.tcc 77 0x46931b
44 ~_List_base stl_list.h 378 0x843e1a
45 ~list stl_list.h 438 0x843e1a
46 ~list list 120 0x843e1a
47 ~RandomAccessList RandomAccessList.h 41 0x843e1a
48 ~ParagraphList ParagraphList.h 23 0x843e1a
49 ~Text Text.h 41 0x843e1a
50 ~InsetText InsetText.h 33 0x843e1a
51 lyx::InsetCollapsable::~InsetCollapsable InsetCollapsable.cpp
65 0x843e1a
52 ~InsetFloat InsetFloat.h 50 0x86146b
53 lyx::InsetFloat::~InsetFloat InsetFloat.h 50 0x86146b
54 lyx::InsetList::~InsetList InsetList.cpp 75 0x71c0d8
55 lyx::Paragraph::Private::~Private Paragraph.cpp 593 0x630944
56 lyx::Paragraph::~Paragraph Paragraph.cpp 1600 0x630a58
57 destroy new_allocator.h 133 0x46931b
58 std::__cxx1998::_List_base<lyx::Paragraph,
std::allocator<lyx::Paragraph> >::_M_clear list.tcc 77 0x46931b
59 ~_List_base stl_list.h 378 0x4693de
60 ~list stl_list.h 438 0x4693de
61 ~list list 120 0x4693de
62 lyx::RandomAccessList<lyx::Paragraph>::~RandomAccessList
RandomAccessList.h 41 0x4693de
63 ~ParagraphList ParagraphList.h 23 0x535bd1
64 lyx::(anonymous namespace)::pasteSelectionHelper
CutAndPaste.cpp 432 0x535bd1
65 lyx::(anonymous namespace)::putClipboard CutAndPaste.cpp 509
0x538825
66 lyx::cap::copySelection CutAndPaste.cpp 997 0x539b3e
67 lyx::cap::copySelection CutAndPaste.cpp 898 0x539c01
68 lyx::Text::dispatch Text3.cpp 1310 0x67ab97
69 lyx::InsetText::doDispatch InsetText.cpp 313 0x922548
70 lyx::Inset::dispatch Inset.cpp 319 0x81ad3c
71 lyx::Cursor::dispatch Cursor.cpp 399 0x52b158
72 lyx::frontend::GuiView::dispatchToBufferView GuiView.cpp
3279 0x97ddd6
73 lyx::frontend::GuiView::dispatch GuiView.cpp 3828 0x9959e1
74 lyx::frontend::GuiApplication::dispatch GuiApplication.cpp
1987 0x96285c
75 lyx::frontend::GuiApplication::dispatch GuiApplication.cpp
1337 0x9536a6
76 lyx::dispatch LyX.cpp 1303 0x5c9b3e
77 processFuncRequest GuiApplication.cpp 2125 0x955951
78 lyx::frontend::GuiApplication::processKeySym GuiApplication.cpp
2119 0x955951
79 lyx::frontend::GuiWorkArea::processKeySym GuiWorkArea.cpp
505 0x9a7e4f
80 lyx::frontend::GuiWorkArea::keyPressEvent GuiWorkArea.cpp
1080 0x9aa5df
81 QWidget::event(QEvent*) /lib64/libQtGui.so.4 0x3ad481dbc5
82 QFrame::event(QEvent*) /lib64/libQtGui.so.4 0x3ad4bd444e
83 QAbstractScrollArea::event(QEvent*) /lib64/libQtGui.so.4
0x3ad4c57873
84 lyx::frontend::GuiWorkArea::event GuiWorkArea.cpp 721 0x9aac8a
85 QApplicationPrivate::notify_helper(QObject*, QEvent*)
/lib64/libQtGui.so.4 0x3ad47cae5c
86 QApplication::notify(QObject*, QEvent*) /lib64/libQtGui.so.4
0x3ad47d2a66
87 lyx::frontend::GuiApplication::notify GuiApplication.cpp
2491 0x963d5b
88 QCoreApplication::notifyInternal(QObject*, QEvent*)
/lib64/libQtCore.so.4 0x3ad41868fd
89 QKeyMapper::sendKeyEvent(QWidget*, bool, QEvent::Type, int,
QFlags<Qt::KeyboardModifier>, QString const&, bool, int, unsigned int,
unsigned int, unsigned int, bool*) /lib64/libQtGui.so.4 0x3ad4869f5a
90 QKeyMapperPrivate::translateKeyEvent(QWidget*, _XEvent const*,
bool) /lib64/libQtGui.so.4 0x3ad486a319
91 QApplication::x11ProcessEvent(_XEvent*) /lib64/libQtGui.so.4
0x3ad48453ff
92 x11EventSourceDispatch(_GSource*, int (*)(void*), void*)
/lib64/libQtGui.so.4 0x3ad486cac4
93 g_main_context_dispatch /lib64/libglib-2.0.so.0 0x334fe492a6
94 g_main_context_iterate.isra /lib64/libglib-2.0.so.0 0x334fe49628
95 g_main_context_iteration /lib64/libglib-2.0.so.0 0x334fe496dc
96
QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>)
/lib64/libQtCore.so.4 0x3ad41b541e
97
QGuiEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>)
/lib64/libQtGui.so.4 0x3ad486cc46
98 QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>)
/lib64/libQtCore.so.4 0x3ad418538f
99 QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>)
/lib64/libQtCore.so.4 0x3ad41856dd
100 QCoreApplication::exec() /lib64/libQtCore.so.4 0x3ad418ada9
101 lyx::frontend::GuiApplication::exec GuiApplication.cpp 2289
0x951077
102 lyx::LyX::exec LyX.cpp 375 0x5d3d21
103 main main.cpp 42 0x43a118
Enrico Forestieri
2014-08-13 00:57:49 UTC
Permalink
Post by Richard Heck
I can get a crash on my 9049.lyx file just when copying the table,
with or without my patch. It's unpredictable, though. If I hold down
ctrl-C for long enough, though, it will crash.
I'm not sure how helpful the backtrace (below) is, but it is
happening in an InsetTableCell: In the destructor of its contained
Paragraph. I wonder if there is something wrong with the copying of
pargraphs that happens when we copy the table, e.g., we miss the
paragraphs in the hidden cells for some reason, and so we double
delete or something?
I am able to get a crash while trying to copy a table by holding down
Ctrl-C for long enough even with a simple table (no hidden cells).
It crashes more easily if a "Format 48" local layout is present and I
also got the following in the terminal before a crash (it only
happened once, and I could not reproduce afterwards):

Buffer.cpp (1404): /home/ef/9049-table.lyx.emergency
Traceback (most recent call last):
File "/usr/local/src/lyx/lyx-devel/lib/scripts/layout2layout.py", line 1110, in <module>
main(sys.argv)
File "/usr/local/src/lyx/lyx-devel/lib/scripts/layout2layout.py", line 1098, in main
format = convert(lines)
File "/usr/local/src/lyx/lyx-devel/lib/scripts/layout2layout.py", line 259, in convert
re_TocLevel = re.compile(r'^(\s*)(TocLevel)(\s+)(\S+)', re.IGNORECASE)
File "/usr/lib/python2.7/re.py", line 190, in compile
return _compile(pattern, flags)


It actually created a (complete) emergency save, but I don't understand
why it was trying a layout conversion.

Something very weird is happening here and it seems that tables work as
catalysts for the crashes.
--
Enrico
Richard Heck
2014-08-13 03:54:39 UTC
Permalink
Post by Enrico Forestieri
Post by Richard Heck
I can get a crash on my 9049.lyx file just when copying the table,
with or without my patch. It's unpredictable, though. If I hold down
ctrl-C for long enough, though, it will crash.
I'm not sure how helpful the backtrace (below) is, but it is
happening in an InsetTableCell: In the destructor of its contained
Paragraph. I wonder if there is something wrong with the copying of
pargraphs that happens when we copy the table, e.g., we miss the
paragraphs in the hidden cells for some reason, and so we double
delete or something?
I am able to get a crash while trying to copy a table by holding down
Ctrl-C for long enough even with a simple table (no hidden cells).
It crashes more easily if a "Format 48" local layout is present and I
also got the following in the terminal before a crash (it only
Buffer.cpp (1404): /home/ef/9049-table.lyx.emergency
File "/usr/local/src/lyx/lyx-devel/lib/scripts/layout2layout.py", line 1110, in <module>
main(sys.argv)
File "/usr/local/src/lyx/lyx-devel/lib/scripts/layout2layout.py", line 1098, in main
format = convert(lines)
File "/usr/local/src/lyx/lyx-devel/lib/scripts/layout2layout.py", line 259, in convert
re_TocLevel = re.compile(r'^(\s*)(TocLevel)(\s+)(\S+)', re.IGNORECASE)
File "/usr/lib/python2.7/re.py", line 190, in compile
return _compile(pattern, flags)
It actually created a (complete) emergency save, but I don't understand
why it was trying a layout conversion.
I believe this has to do with the way we put things on the clipboard. We
copy the paragraphs we are copying to a temporary Buffer and then export
from that Buffer. This means recreating the DocumentClass in at least
some cases, which means re-reading layouts, modules, etc. But I could be
wrong about that.
Post by Enrico Forestieri
Something very weird is happening here and it seems that tables work as
catalysts for the crashes.
It does seem as if this is some sort of race condition. Or something
similarly hard to track down.

Do you still operate with threaded export, etc, disabled? I don't think
that's an issue here.....

Richard
Enrico Forestieri
2014-08-13 13:27:35 UTC
Permalink
Post by Richard Heck
Post by Enrico Forestieri
Something very weird is happening here and it seems that tables work as
catalysts for the crashes.
It does seem as if this is some sort of race condition. Or something
similarly hard to track down.
Do you still operate with threaded export, etc, disabled? I don't
think that's an issue here.....
It is a while that I don't play with those settings. I don't even know
whether it is still possible compiling without threaded export.

However, what bothers me is that, seemingly, I can only trigger the
crash when the debugging code (the layout3b.diff patch) is in place.
And I always get different backtraces. I am attaching here three of them.
--
Enrico
Enrico Forestieri
2014-08-14 12:00:54 UTC
Permalink
Post by Enrico Forestieri
However, what bothers me is that, seemingly, I can only trigger the
crash when the debugging code (the layout3b.diff patch) is in place.
And I always get different backtraces. I am attaching here three of them.
I think I understand what is happening. The dangling pointer is actually
updated later, and the debugging code dutily decrements its refcount.
Unfortunately, it does so trough the pointer itself, which is now pointing
to memory holding something else and actually causing corruption.
So, holding Ctrl-C for long enough, we randomly overwrite extraneous
memory until it crashes. This also explain the always different backtraces.

Thus, this crash is a red herring. However, the existence of a dangling
pointer is worrisome, how also demonstrated by this case.

Anyway, the positive side of the thing is that it caused a detailed
audit of the code and now I think that there's a high chance that your
patch actually solves #9049.
--
Enrico
Jean-Marc Lasgouttes
2014-08-14 18:35:18 UTC
Permalink
Congratulation to all for the bug hunting. However, it makes me think that the pointer should go and the paragraph should only hold the name of the layout, like insets do.

I am not sure that the cost in terms of performance would be so high.

JMarc
Post by Enrico Forestieri
Post by Enrico Forestieri
However, what bothers me is that, seemingly, I can only trigger the
crash when the debugging code (the layout3b.diff patch) is in place.
And I always get different backtraces. I am attaching here three of
them.
I think I understand what is happening. The dangling pointer is
actually
updated later, and the debugging code dutily decrements its refcount.
Unfortunately, it does so trough the pointer itself, which is now pointing
to memory holding something else and actually causing corruption.
So, holding Ctrl-C for long enough, we randomly overwrite extraneous
memory until it crashes. This also explain the always different
backtraces.
Thus, this crash is a red herring. However, the existence of a dangling
pointer is worrisome, how also demonstrated by this case.
Anyway, the positive side of the thing is that it caused a detailed
audit of the code and now I think that there's a high chance that your
patch actually solves #9049.
--
Enrico
--
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la briÚveté.
Richard Heck
2014-08-14 20:46:58 UTC
Permalink
Post by Jean-Marc Lasgouttes
Congratulation to all for the bug hunting. However, it makes me think
that the pointer should go and the paragraph should only hold the name
of the layout, like insets do.
I am not sure that the cost in terms of performance would be so high.
Yes, I thought about this, too, but I wasn't sure how to get at the
actual layout. The paragraph does not know what Buffer it is in. Do we
have to go through the inset_owner_? I.e., do something like:

Layout const & getLayout() const
{
LBUFERR(d->inset_owner_);
DocumentClass const & dc =
d->inset_owner_->buffer.params().documentClass();
return dc[layout_name_];
}

Or is there a better way?

This seems like a big change for stable. Perhaps my other patch should
go there, and we can do something more involved like this for devel.

Richard
Post by Jean-Marc Lasgouttes
However, what bothers me is that, seemingly, I can only
trigger the crash when the debugging code (the layout3b.diff
patch) is in place. And I always get different backtraces. I
am attaching here three of them.
I think I understand what is happening. The dangling pointer is actually
updated later, and the debugging code dutily decrements its refcount.
Unfortunately, it does so trough the pointer itself, which is now pointing
to memory holding something else and actually causing corruption.
So, holding Ctrl-C for long enough, we randomly overwrite extraneous
memory until it crashes. This also explain the always different backtraces.
Thus, this crash is a red herring. However, the existence of a dangling
pointer is worrisome, how also demonstrated by this case.
Anyway, the positive side of the thing is that it caused a detailed
audit of the code and now I think that there's a high chance that your
patch actually solves #9049.
--
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la briÚveté.
Jean-Marc Lasgouttes
2014-08-16 20:10:13 UTC
Permalink
Sure, this is not a proposal for stable.

JMarc
Post by Richard Heck
Post by Jean-Marc Lasgouttes
Congratulation to all for the bug hunting. However, it makes me think
that the pointer should go and the paragraph should only hold the
name
Post by Jean-Marc Lasgouttes
of the layout, like insets do.
I am not sure that the cost in terms of performance would be so high.
Yes, I thought about this, too, but I wasn't sure how to get at the
actual layout. The paragraph does not know what Buffer it is in. Do we
Layout const & getLayout() const
{
LBUFERR(d->inset_owner_);
DocumentClass const & dc =
d->inset_owner_->buffer.params().documentClass();
return dc[layout_name_];
}
Or is there a better way?
This seems like a big change for stable. Perhaps my other patch should
go there, and we can do something more involved like this for devel.
Richard
Post by Jean-Marc Lasgouttes
On 14 août 2014 14:00:54 UTC+02:00, Enrico Forestieri
On Wed, Aug 13, 2014 at 03:27:35PM +0200, Enrico Forestieri
However, what bothers me is that, seemingly, I can only
trigger the crash when the debugging code (the layout3b.diff
patch) is in place. And I always get different backtraces. I
am attaching here three of them.
I think I understand what is happening. The dangling pointer is
actually
Post by Jean-Marc Lasgouttes
updated later, and the debugging code dutily decrements its
refcount.
Post by Jean-Marc Lasgouttes
Unfortunately, it does so trough the pointer itself, which is now
pointing
Post by Jean-Marc Lasgouttes
to memory holding something else and actually causing corruption.
So, holding Ctrl-C for long enough, we randomly overwrite
extraneous
Post by Jean-Marc Lasgouttes
memory until it crashes. This also explain the always different
backtraces.
Post by Jean-Marc Lasgouttes
Thus, this crash is a red herring. However, the existence of a dangling
pointer is worrisome, how also demonstrated by this case.
Anyway, the positive side of the thing is that it caused a
detailed
Post by Jean-Marc Lasgouttes
audit of the code and now I think that there's a high chance that
your
Post by Jean-Marc Lasgouttes
patch actually solves #9049.
--
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la briÚveté.
--
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la briÚveté.
Jean-Marc Lasgouttes
2014-08-16 21:23:17 UTC
Permalink
Post by Jean-Marc Lasgouttes
Sure, this is not a proposal for stable.
In a slightly more elaborate answer, your proposal is OK, modulo some error checking and maybe a basic cache for repetitive calls with the same layout name.

On a slightly related subject, we could investigate the use of std::unordered_map instead of std::map in c++11 in terms of performance.

JMarc

Richard Heck
2014-08-12 18:54:59 UTC
Permalink
Post by Enrico Forestieri
However, things may be more complicated than that. For example, when
applying the layout3.diff patch attached to bug 9236, I always get a
warning when copying or cutting something, meaning that a dangling
pointer is in existance. It may very well be that this pointer is
later updated and there is no problem, actually. But if another thread
tries to use that pointer before it is updated (like autosave, maybe),
then things may still go awry. This kind of races are difficult to
pinpoint. Maybe we could actually implement a signature in the Layout
class and check that before using the layout_ pointer. If the
signature is not there because the class instance has been destroyed,
we could try to collect all possible info about this case and write it
to a file, telling the user to include that in a bug report if LyX
actually crashes (or we could simply avoid doing things that we know
would probably lead to a crash).
The warning that triggers here seems to happen duing the last step in
this sequence of operation:

static Buffer * staticbuffer = theBufferList().newInternalBuffer(
tempfile.name().absFileName());
Buffer * buffer = staticbuffer->cloneBufferOnly();
doc_class_ = const_pointer_cast<DocumentClass>(tc);

(I put a breakpoint on the warning code.) So we clone (copy) this static
Buffer, which we use for various operations, then reset its
DocumentClass to match what we're copying.

I added more debug code and found that what is being deleted here is the
default DocumentClass that buffer got on cloning. The layout whose
refcount is bad is the Standard layout had by the only paragraph in this
Buffer. I'm not sure why that would be, but I have occasionally seen
similar messages when closing a Buffer.

Richard
Richard Heck
2014-08-12 19:45:24 UTC
Permalink
Post by Richard Heck
Post by Enrico Forestieri
However, things may be more complicated than that. For example, when
applying the layout3.diff patch attached to bug 9236, I always get a
warning when copying or cutting something, meaning that a dangling
pointer is in existance. It may very well be that this pointer is
later updated and there is no problem, actually. But if another
thread tries to use that pointer before it is updated (like autosave,
maybe), then things may still go awry. This kind of races are
difficult to pinpoint. Maybe we could actually implement a signature
in the Layout class and check that before using the layout_ pointer.
If the signature is not there because the class instance has been
destroyed, we could try to collect all possible info about this case
and write it to a file, telling the user to include that in a bug
report if LyX actually crashes (or we could simply avoid doing things
that we know would probably lead to a crash).
The warning that triggers here seems to happen duing the last step in
static Buffer * staticbuffer = theBufferList().newInternalBuffer(
tempfile.name().absFileName());
Buffer * buffer = staticbuffer->cloneBufferOnly();
doc_class_ = const_pointer_cast<DocumentClass>(tc);
(I put a breakpoint on the warning code.) So we clone (copy) this
static Buffer, which we use for various operations, then reset its
DocumentClass to match what we're copying.
I understand now why this happens. When we reset the DocumentClass via
the evil
buffer->params().setDocumentClass(docclass);
which is what leads to the assignment:
doc_class_ = const_pointer_cast<DocumentClass>(tc);
we are /only/ resetting the DocumentClass. We do not do anything to
change the layout pointers of the various paragraphs. So when the old
DocumentClass is destroyed, its layouts get destroyed, as well, but
there are still pointers to them.

I think this is harmless, however, as we immediately go on to overwrite
all the paragraphs in this Buffer. And this is the only place
setDocumentClass is called from.

Richard
Liviu Andronic
2014-08-08 14:29:56 UTC
Permalink
Post by Jürgen Spitzmüller
Post by Richard Heck
I'm attaching a patch which I'm hoping will fix the mystery crash.
Looks good.
Nitpick: To this
Tabular::CellData const & cdata = tabular.cellInfo(idx);
if (cdata.multicolumn != Tabular::CELL_PART_OF_MULTICOLUMN
&& cdata.multirow != Tabular::CELL_PART_OF_MULTIROW)
continue;
I'd prefer
if (!tabular.isPartOfMultiColumn(r, c) && !tabular.isPartOfMultiRow(r, c))
continue;
Post by Richard Heck
Unfortunately, I can't really test this. I still can't trigger the
crash, which makes
me wonder if this is really the cause, actually. But I think we should
be doing
this, anyway.
I suppose the only way to assure this is to release LyX with this fix and look
if people still get the crash.
At least for Ubuntu Linux users it is fairly easy to test
bleeding-edge branch and trunk using the daily builds:
https://launchpad.net/~lyx-devel/+archive/ubuntu/daily

Liviu
Post by Jürgen Spitzmüller
Post by Richard Heck
Comments welcome.
I'd say: push it into master and branch, let us test it some days, and then
release.
Jürgen
Post by Richard Heck
Richard
--
Do you think you know what math is?
http://www.ideasroadshow.com/issues/ian-stewart-2013-08-02
Or what it means to be intelligent?
http://www.ideasroadshow.com/issues/john-duncan-2013-08-30
Think again:
http://www.ideasroadshow.com/library
Loading...