Discussion:
[GSoC 2014]ODT-->LyX Conversion
Prannoy Pilligundla
2014-07-03 15:06:28 UTC
Permalink
Hi Everyone,

I am almost done with LyX to ODT part of my project and I started work on
the roundtrip i.e ODT to LyX part of the project. For LyX to ODT I
converted LyX to LaTeX first and then used tex4ht for converting LaTeX to
ODT. In case of math I am also storing the latex expression as it as in the
ODT file(using annotation property) as meta data.

Now for getting back to LyX from ODT I am thinking of writing a python
script which does this conversion. As of now I am planning to use the xml
parsers expat library <https://docs.python.org/2/library/pyexpat.html>.I
also had a look at lyx2lyx scripts. I tried using Writer2LaTeX but it
doesn't suit our purpose of getting back the same LyX file. I am not very
clear on how to go ahead from this juncture. For example I am not clear on
how I will get information about all the packages used from the ODT file. I
feel I am not aware of many issues which need to be taken care of during
this conversion process. It would be great to hear your comments and
suggestions on how to go ahead

Thanks and Regards
Prannoy Pilligundla
Liviu Andronic
2014-07-03 15:10:19 UTC
Permalink
Hello Prannoy,

On Thu, Jul 3, 2014 at 5:06 PM, Prannoy Pilligundla
Post by Prannoy Pilligundla
Hi Everyone,
I am almost done with LyX to ODT part of my project
This sounds most promising! A lot of LyX users are dying to have a
useful way to convert LyX documents to something Word-compatible.
Post by Prannoy Pilligundla
and I started work on
the roundtrip i.e ODT to LyX part of the project. For LyX to ODT I converted
LyX to LaTeX first and then used tex4ht for converting LaTeX to ODT. In case
of math I am also storing the latex expression as it as in the ODT
file(using annotation property) as meta data.
How can we test the feature, and where is the repo located? Also, are
there some test documents available?

Thanks,
Liviu
Post by Prannoy Pilligundla
Now for getting back to LyX from ODT I am thinking of writing a python
script which does this conversion. As of now I am planning to use the xml
parsers expat library.I also had a look at lyx2lyx scripts. I tried using
Writer2LaTeX but it doesn't suit our purpose of getting back the same LyX
file. I am not very clear on how to go ahead from this juncture. For example
I am not clear on how I will get information about all the packages used
from the ODT file. I feel I am not aware of many issues which need to be
taken care of during this conversion process. It would be great to hear your
comments and suggestions on how to go ahead
Thanks and Regards
Prannoy Pilligundla
--
Do you think you know what math is?
http://www.ideasroadshow.com/issues/ian-stewart-2013-08-02
Or what it means to be intelligent?
http://www.ideasroadshow.com/issues/john-duncan-2013-08-30
Think again:
http://www.ideasroadshow.com/library
Prannoy Pilligundla
2014-07-03 16:57:27 UTC
Permalink
Hi Liviu,
Post by Liviu Andronic
How can we test the feature, and where is the repo located? Also, are
there some test documents available?
I am attaching the tests report, please have a look. I am using the LyX
GSoC repo. Everything is up to date on the branch tex4htTesting. There are
many test documents which can be seen in the tests folder inside the repo.
Dr Eberhard Lisse
2014-07-03 16:04:32 UTC
Permalink
Prannoy,

what's wrong with the LaTeX Export from ODT?

el

On 2014-07-03, 17:06 , Prannoy Pilligundla wrote:
[...]
Post by Prannoy Pilligundla
Now for getting back to LyX from ODT I am thinking of writing a python
script which does this conversion. As of now I am planning to use the
xml parsers expat library
<https://docs.python.org/2/library/pyexpat.html>.I also had a look at
lyx2lyx scripts. I tried using Writer2LaTeX but it doesn't suit our
purpose of getting back the same LyX file. I am not very clear on how to
go ahead from this juncture. For example I am not clear on how I will
get information about all the packages used from the ODT file. I feel I
am not aware of many issues which need to be taken care of during this
conversion process. It would be great to hear your comments and
suggestions on how to go ahead
Thanks and Regards
Prannoy Pilligundla
Prannoy Pilligundla
2014-07-03 17:07:57 UTC
Permalink
Hi Eberhard,
Post by Liviu Andronic
Prannoy,
what's wrong with the LaTeX Export from ODT?
I used Writer2LaTeX for doing this. Many things are interpreted wrong for
example math which was originally inline is now put into equation
environment. Align and Eqnarray which were used in the initial file don't
come back exactly as it is. Some sections are recognized as subsections and
so on. There are actually many issues like these
Dr Eberhard W Lisse
2014-07-03 19:40:08 UTC
Permalink
Did you look at the source code? Or perl/python postprocessing?

This strikes me as solvable...

el

Sent from Dr Lisse's iPad mini
Post by Prannoy Pilligundla
Hi Eberhard,
Post by Liviu Andronic
Prannoy,
what's wrong with the LaTeX Export from ODT?
I used Writer2LaTeX for doing this. Many things are interpreted wrong for example math which was originally inline is now put into equation environment. Align and Eqnarray which were used in the initial file don't come back exactly as it is. Some sections are recognized as subsections and so on. There are actually many issues like these
Cyrille Artho
2014-07-03 22:48:50 UTC
Permalink
Hi Prannoy,
I guess Stefano has probably already told you that, but for information
that is not correctly converted back from ODT to LyX, there are two
different cases:

(1) The data is "correct" in ODT (at least in a form that makes sense to
us) but does not get converted back in the right way.
As other said, this can be solved by fixing the converter, even though
it may be difficult in some cases.

(2) The data is not correct, or the mapping from LyX to ODT is not a
bijection. You mentioned the equation environment in another e-mail. If the
two different modes in LyX/LaTeX (inline and paragraph mode) get mapped to
one mode in ODT, then the best way to maintain the necessary information is
by having an extra "back-annotation", maybe as a comment or extra attribute
that is not visible in the ODT document.
I'm not sure if such a feature exists, but it would solve problems with
features that cannot be accurately represented in the ODT file.
Post by Prannoy Pilligundla
Hi Everyone,
I am almost done with LyX to ODT part of my project and I started work on
the roundtrip i.e ODT to LyX part of the project. For LyX to ODT I
converted LyX to LaTeX first and then used tex4ht for converting LaTeX to
ODT. In case of math I am also storing the latex expression as it as in the
ODT file(using annotation property) as meta data.
Now for getting back to LyX from ODT I am thinking of writing a python
script which does this conversion. As of now I am planning to use the xml
parsers expat library <https://docs.python.org/2/library/pyexpat.html>.I
also had a look at lyx2lyx scripts. I tried using Writer2LaTeX but it
doesn't suit our purpose of getting back the same LyX file. I am not very
clear on how to go ahead from this juncture. For example I am not clear on
how I will get information about all the packages used from the ODT file. I
feel I am not aware of many issues which need to be taken care of during
this conversion process. It would be great to hear your comments and
suggestions on how to go ahead
Thanks and Regards
Prannoy Pilligundla
--
Regards,
Cyrille Artho - http://artho.com/
If you can, help others. If you can't, at least don't hurt others.
-- the Dalai Lama
Georg Baum
2014-07-05 08:54:32 UTC
Permalink
Post by Prannoy Pilligundla
Hi Everyone,
I am almost done with LyX to ODT part of my project and I started work on
the roundtrip i.e ODT to LyX part of the project. For LyX to ODT I
converted LyX to LaTeX first and then used tex4ht for converting LaTeX to
ODT. In case of math I am also storing the latex expression as it as in
the ODT file(using annotation property) as meta data.
Very nice!
Post by Prannoy Pilligundla
Now for getting back to LyX from ODT I am thinking of writing a python
script which does this conversion. As of now I am planning to use the xml
parsers expat library <https://docs.python.org/2/library/pyexpat.html>.I
also had a look at lyx2lyx scripts. I tried using Writer2LaTeX but it
doesn't suit our purpose of getting back the same LyX file. I am not very
clear on how to go ahead from this juncture. For example I am not clear on
how I will get information about all the packages used from the ODT file.
I feel I am not aware of many issues which need to be taken care of during
this conversion process. It would be great to hear your comments and
suggestions on how to go ahead
Did you have a look at the old discussions we had before your GSoC project
started? There were many ideas floating around, but I don't remember the
details anymore.

I think the most important goal would be to get the structure right. Then it
becomes complicated: You need to find out if certain modules are needed for
a certain feature (packages are LaTeX stuff, these should be irrelevant if
you go from odt to LyX). This can be quite challenging, since you basically
need to use LyX to get the available modules etc. At this point, you would
need to reimplement a lot of LyX infrastructure in python.

Another approach would be to interface with the LyX server, query it for
modules, and let it write the LyX file. Then you would not need to care
about changing file formats. However, it might be needed to extend the LyX
server interface, I don't know if it is powerful enough.

Finally, it would also be possible to hack tex2lyx to read odt. You could
use a lot of the existing infrastructure. Maybe it is not so much work to
use a C++ XML parser (e.g. from qt) to read the document, then process it
and use existing code to write the LyX file?


Georg

Loading...