Post by Scott KostyshakI think I understand more now. But why in this commit do you choose a
mutex? Why is it obvious that "a deadlock can't happen" ? And why "it
should give better performance" ?
A deadlock would mean that thread x has the lock and waits for thread y,
while thread y waits for thread x. This can't happen, because
a) Formats::isZippedFile() does not call any method or function that
communicates in any way with other threads, and
b) it does not use any shared resource. Even if one tries to construct a
situation where the file being queried in thread x is currently opened
exclusively in another thread or process, and thread y enters
isZippedFile(), you may get wrong information about the zipped status in
thread x, but no deadlock. The wrong information is caused by the missing
error handling of guessFormatFromContents(), and this problem would even
exist in a single threaded LyX.
Therefore, if one thread enters isZippedFile(), it will eventually finish
without waiting for any other thread, so the worst thing that happens if two
threads enter isZippedFile() is that one has to wait.
The better performance is only my guess. My line of thought was this:
getFormatFromFile() is expensive (otherwise the whole cache would not be
needed at all), and the probability that two threads arrive at the same time
in isZippedFile() is low, and the cost of obtaining a lock without conflict
is low as well => share the result of getFormatFromFile() between threads.
As always when you want to know something about performance, you should
measure, but this can be quite difficult, and we are not talking about an
inner loop of a complicated numerical procedure, so I was lazy and did not
measure.
Post by Scott KostyshakIn terms of efficiency, in general if you can use either a mutex or
local thread storage, if the object is light it seems like local
thread storage would be more efficient because no thread would ever
have to wait to get access to the shared object. But if the object is
heavy, then maybe the overhead of creating it in multiple threads is
more than the time saved from threads waiting their turn. Is that at
all correct?
Basically yes, but you need to take into account the computation of the
object as well: The object might be very light (in this case it is a bool),
but the computation might be quite expensive. memory-wise it would not be
any problem to use thread-local data, but computation time wise you would
see a difference.
In LyX we have a quite comfortable situation: It is a small project (so you
can check for deadlocks easily in many cases), and it has only a few
performance-critical areas, so in many cases it does not matter if we take
the wrong decision performance wise;-)
Georg