[LyX GSoC/odt2lyx] Updated Tests-Report.lyx and completed documenting parseodt.py which is the main script for ODT to LyX conversion

Discussion:

Scott Kostyshak

2014-08-14 21:19:37 UTC

Permalink

On Thu, Aug 14, 2014 at 3:45 PM, Prannoy Pilligundla

The branch, odt2lyx, has been updated.
- Log -----------------------------------------------------------------
commit b2c9011b31fd2c398f2376894bca003767bca6be
Date: Fri Aug 15 01:14:05 2014 +0530
Updated Tests-Report.lyx and completed documenting parseodt.py which is the main script for ODT to LyX conversion

Hi Prannoy, I'm interested in your LyX to ODT tests. Do you have it
documented how to run the tests? I have not even attempted to run the
tests myself, so perhaps it is easy. Are the tests integrated into the
Ctests for CMake and the automake test target for autotools? Are the
tests similar to tex2lyx tests were the conversion is performed and
the result is compared to a saved file to see if they are identical?

There is a document lib/doc/Development.lyx where it would be nice to
add documentation on how to run the tests. I've been meaning to add a
lot of information on the Ctests but have been lazy. I will try to
remember to do that in the coming weeks.

Scott

Prannoy Pilligundla

2014-08-15 12:00:55 UTC

Permalink

Hi Scott,

On Fri, Aug 15, 2014 at 2:49 AM, Scott Kostyshak <***@lyx.org> wrote:
á§

Post by Scott Kostyshak
Hi Prannoy, I'm interested in your LyX to ODT tests. Do you have it
documented how to run the tests? I have not even attempted to run the
tests myself, so perhaps it is easy. Are the tests integrated into the
Ctests for CMake and the automake test target for autotools? Are the
tests similar to tex2lyx tests were the conversion is performed and
the result is compared to a saved file to see if they are identical?

Tests I performed here are very much different from what happens in
tex2lyx. In this case Stefano had created many test files in LyX spanning
all the Environments, Math, Figures etc etc. So I wrote a shell script
which converts LyX to LaTeX first and then runs mk4ht on the generated tex
file. Then I manually see the generated ODT file to see the issues. This
way I fixed issues with every test file. You can see the latest results in
the branch tex4htTesting. All the tests have been run inside the tests
directory. The shell script is named convert.sh, so for converting a lyx
file named "First-Test.lyx" I need to run "./convert.sh First-Test.lyx" and
the output ODT file can be seen in the same directory. While adding the
Export option inside LyX I ported this convert.sh to a python scripts and
put it inside scripts directory.

Thanks and Regards
Prannoy

Scott Kostyshak

2014-08-18 00:31:52 UTC

Permalink

Post by Prannoy Pilligundla
Hi Scott,
á§

Thanks for the explanation Prannoy! So if I understand correctly, the tests
require manual inspection of the output .odt file, right? Do you have plans
to make automated tests? Is there a reason why we can't do the same as
tex2lyx? And if so, would this require a lot of work? Additionally, is
there any automatic way (command-line option) to check that a .odt file is
valid (e.g. that if I open it in libre office it will not say there is a
syntax error or a corrupted file)? I'd be interested in your thoughts.

Best,

Scott

Prannoy Pilligundla

2014-08-18 15:49:09 UTC

Permalink

á§

Post by Scott Kostyshak
Thanks for the explanation Prannoy! So if I understand correctly, the
tests require manual inspection of the output .odt file, right?

Yes, we had to to inspect the output ODT file manually.

Post by Scott Kostyshak
Do you have plans to make automated tests? Is there a reason why we can't
do the same as tex2lyx? And if so, would this require a lot of work?

I didn't concentrate on writing automatic tests as I thought it would take
up lot of time and I wanted to focus on the conversion part at this stage
as we are just at the beginning. Actually I know that tests are automated
in tex2lyx but I am not aware of what happens there exactly.

Post by Scott Kostyshak
Additionally, is there any automatic way (command-line option) to check
that a .odt file is valid (e.g. that if I open it in libre office it will
not say there is a syntax error or a corrupted file)? I'd be interested in
your thoughts.

I went through help manual of OO, I don't think there is a direct option
available

Scott Kostyshak

2014-08-18 18:38:59 UTC

Permalink

On Mon, Aug 18, 2014 at 11:49 AM, Prannoy Pilligundla <

Post by Prannoy Pilligundla
á§

Post by Scott Kostyshak
Thanks for the explanation Prannoy! So if I understand correctly, the
tests require manual inspection of the output .odt file, right?

Yes, we had to to inspect the output ODT file manually.

Post by Scott Kostyshak
Do you have plans to make automated tests? Is there a reason why we
can't do the same as tex2lyx? And if so, would this require a lot of work?

I didn't concentrate on writing automatic tests as I thought it would take
up lot of time and I wanted to focus on the conversion part at this stage
as we are just at the beginning.

Makes sense. I would be surprised if this took a lot of time to set up
though, but maybe there's something I'm missing.

Post by Prannoy Pilligundla
Actually I know that tests are automated in tex2lyx but I am not aware of
what happens there exactly.

The conversion is done and the converted file is compared to a saved file
that we know is the correct file.

Post by Prannoy Pilligundla

I went through help manual of OO, I don't think there is a direct option
available
Thanks for checking.

Best,

Scott

Prannoy Pilligundla

2014-08-20 13:11:39 UTC

Permalink

á§

The actual problem is because of the difference in ODT syntax that
tex4ht uses and that the latest versions of Libre Office or Open Office
uses. Suppose we just open the ODT generated by tex4ht, make a slight
change and save it there are large number of differences in all the xml
files before and after saving. So I guess it becomes difficult for us to
take an example file in these kind of cases and compare them. And as our
main aim was semanticity, even verifying semanticity becomes very difficult
to verify keeping in mind all these constraints.

Did you mean to email this to me and not the list? We try to keep things
on the list as much as possible.

Oh Sorry, just realized that I just mailed you and not the list. I wanted
to reply to all but maybe I clicked on reply by mistake. Again, I am sorry,
will be careful from next time on.

I think we're talking about different kinds of tests. The kind I have in
mind are the following: suppose you make a change to the ODT export. How
can you be sure that that change doesn't break anything? One way to address
this is to have tests. You would not need to open Libre Office or in fact
even have it installed. The tests would just check that nothing changed in
the other exports (it would do this just like tex2lyx by comparing a saved
exported file to the new exported file and checking that they're
identical). Of course, it might be expected that the tests change. In this
case, you would want to check the new exported files manually and then save
the new files as the files to compare to. Does that make sense? It
shouldn't take much time to implement (although I know that even a little
time can be hard to find and prioritize). You just run ODT export on a .lyx
file, say test1.lyx, then save that .odt, test1.save.odt. Then suppose you
change ODT export. You would have a script that exports test1.lyx to
test1.odt and then compare test1.odt to test1.save.odt to see if they are
identical. If they are not identical, then manual inspection would be
needed to see if the differences are legitimate. If they are, rename
test1.odt to test1.save.odt (overwriting) and explain the changes in the
commit message.
Does that make sense?

Thanks, now I understood what you meant. Ya, I guess this should not take
much time to implement. I didn't do this sort of testing in LyX to ODT as I
was not touching any of tex4ht's post-processors. I was only configuring
some new styles and fixing issues with some old ones, so we can say all
were kind of independent changes which don't effect each other(provided
mk4ht doesn't raise any error while running). Whenever I write a wrong XML
syntax, the generated ODT doesn't have a content.xml at all, so I used this
as feedback manytimes. But recently, when I tried converting a real life
lyx doc, then the resultant ODT file turned out to be corrupt. I was not
able to find out why the file was corrupt and I am still wondering on how
to fix these kind of issues.

stefano franchi

2014-08-20 14:42:20 UTC

Permalink

Post by Prannoy Pilligundla
á§

Did you mean to email this to me and not the list? We try to keep things
on the list as much as possible.

Oh Sorry, just realized that I just mailed you and not the list. I wanted
to reply to all but maybe I clicked on reply by mistake. Again, I am sorry,
will be careful from next time on.

Hi Prannoy,

are you familiar with test-driven development [1]? This is what Scott is
referring to (or almost).
The basic idea is to write the tests first and then code until your code
passes the all the tests (even those you haven't touched in your last
iterations).
We almost did that in the first phase of the project, but in a very
informal manner. I guess Scott is pushing for a more structured and
automated testing procedure (which is indeed what test-driven development
requires to be effective).
I would be very happy to see it happen, at least for the text4ht part. Let
me know if you need any help.

Cheers,

Stefano

[1] See http://en.wikipedia.org/wiki/Test-driven_development. Or pick any
of the books by Kent Beck for a good introduction. "Test-Driven Development
by Example" is a step by step introduction with lots of examples in Python.
If you are willing to learn a bit of Smalltalk (strongly recommended), the
free book Pharo by Example" (http://pharobyexample.org/) has a short
section on unit testing that you may find useful.
--
__________________________________________________
Stefano Franchi
Associate Research Professor
Department of Hispanic Studies Ph: +1 (979) 845-2125
Texas A&M University Fax: +1 (979) 845-6421
College Station, Texas, USA

***@tamu.edu
http://stefano.cleinias.org

Prannoy Pilligundla

2014-08-20 17:45:36 UTC

Permalink

á§

Post by Scott Kostyshak
Hi Prannoy,
are you familiar with test-driven development [1]? This is what Scott is
referring to (or almost).

Yes, I am aware of test-driven development. I had actually written some
tests for RoR previously.

The basic idea is to write the tests first and then code until your code

Post by Scott Kostyshak
passes the all the tests (even those you haven't touched in your last
iterations).

Yes, I understand this part but I was saying is that since we are mainly
fixing issues in styling for just configuring new environments mostly one
part of the code doesn't interfere with other parts. But I guess I have
been thinking in the wrong direction and neglecting good coding practices.
I also agree with you and Scott, it would be good to have such a thing and
it would definitely give structure to our otherwise manual testing process

Scott Kostyshak

2014-08-20 22:51:17 UTC

Permalink

But I guess I have been thinking in the wrong direction and neglecting
good coding practices. I also agree with you and Scott, it would be good to
have such a thing and it would definitely give structure to our otherwise
manual testing process

I think we all agree. It's then just a question of priorities. It's hard to
spend time working on tests when you feel pressured to make other progress
on the code. I just wanted to add my opinion that I don't think it would
take that much time and I think the benefit/time ratio would be high. I'm
not convinced I'm right, especially since I know nothing about .odt files.
But I wanted to be a little pushy until I understood if there is a reason
why it wouldn't make sense.

I'm just a novice programmer so don't take anything I say too seriously.

Thanks for your replies and explanations Prannoy!

Scott

Cyrille Artho

2014-08-20 22:41:47 UTC

Permalink

On Wed, Aug 20, 2014 at 8:11 AM, Prannoy Pilligundla
ᐧ
The actual problem is because of the difference in ODT syntax
that tex4ht uses and that the latest versions of Libre Office
or Open Office uses. Suppose we just open the ODT generated by
tex4ht, make a slight change and save it there are large number
of differences in all the xml files before and after saving. So
I guess it becomes difficult for us to take an example file in
these kind of cases and compare them. And as our main aim was
semanticity, even verifying semanticity becomes very difficult
to verify keeping in mind all these constraints.
Did you mean to email this to me and not the list? We try to keep
things on the list as much as possible.
Oh Sorry, just realized that I just mailed you and not the list. I
wanted to reply to all but maybe I clicked on reply by mistake. Again,
I am sorry, will be careful from next time on.
I think we're talking about different kinds of tests. The kind I
have in mind are the following: suppose you make a change to the
ODT export. How can you be sure that that change doesn't break
anything? One way to address this is to have tests. You would not
need to open Libre Office or in fact even have it installed. The
tests would just check that nothing changed in the other exports
(it would do this just like tex2lyx by comparing a saved exported
file to the new exported file and checking that they're identical).
Of course, it might be expected that the tests change. In this
case, you would want to check the new exported files manually and
then save the new files as the files to compare to. Does that make
sense? It shouldn't take much time to implement (although I know
that even a little time can be hard to find and prioritize). You
just run ODT export on a .lyx file, say test1.lyx, then save that
.odt, test1.save.odt. Then suppose you change ODT export. You would
have a script that exports test1.lyx to test1.odt and then compare
test1.odt to test1.save.odt to see if they are identical. If they
are not identical, then manual inspection would be needed to see if
the differences are legitimate. If they are, rename test1.odt to
test1.save.odt (overwriting) and explain the changes in the commit
message.
Does that make sense?
Thanks, now I understood what you meant. Ya, I guess this should not
take much time to implement. I didn't do this sort of testing in LyX to
ODT as I was not touching any of tex4ht's post-processors. I was only
configuring some new styles and fixing issues with some old ones, so we
can say all were kind of independent changes which don't effect each
other(provided mk4ht doesn't raise any error while running). Whenever I
write a wrong XML syntax, the generated ODT doesn't have a content.xml
at all, so I used this as feedback manytimes. But recently, when I
tried converting a real life lyx doc, then the resultant ODT file
turned out to be corrupt. I was not able to find out why the file was
corrupt and I am still wondering on how to fix these kind of issues.
Hi Prannoy,
are you familiar with test-driven development [1]? This is what Scott is
referring to (or almost).
The basic idea is to write the tests first and then code until your code
passes the all the tests (even those you haven't touched in your last
iterations).
We almost did that in the first phase of the project, but in a very
informal manner. I guess Scott is pushing for a more structured and
automated testing procedure (which is indeed what test-driven development
requires to be effective).
I would be very happy to see it happen, at least for the text4ht part. Let
me know if you need any help.
Cheers,
Stefano

The challenge here is that we cannot really verify that the output document
is correct (because that would require a detailed specification). However,
we can inspect the output manually once, and then use it as a reference
against unexpected changes, as suggested earlier.

This may not sound as convincing as actual output verification, but it is
really very helpful in practice, and easy to set up.

--
Regards,
Cyrille Artho - http://artho.com/
The human mind treats a new idea the way the body treats
a strange protein -- it rejects it.
-- P. Medawar

Scott Kostyshak

2014-08-20 22:51:50 UTC

Permalink

Post by Cyrille Artho
The challenge here is that we cannot really verify that the output document
is correct (because that would require a detailed specification). However,
we can inspect the output manually once, and then use it as a reference
against unexpected changes, as suggested earlier.
This may not sound as convincing as actual output verification, but it is
really very helpful in practice, and easy to set up.

Thanks for clarifying. This is exactly what I had in mind.

Scott

Scott Kostyshak

2014-08-20 22:50:47 UTC

Permalink

On Wed, Aug 20, 2014 at 8:11 AM, Prannoy Pilligundla <

Post by Prannoy Pilligundla
á§

Did you mean to email this to me and not the list? We try to keep things
on the list as much as possible.

Oh Sorry, just realized that I just mailed you and not the list. I wanted
to reply to all but maybe I clicked on reply by mistake. Again, I am sorry,
will be careful from next time on.

Hi Prannoy,
are you familiar with test-driven development [1]? This is what Scott is
referring to (or almost).

Thanks for the help clarifying things, Stefano.

The basic idea is to write the tests first and then code until your code
passes the all the tests (even those you haven't touched in your last
iterations).

Not necessarily the best idea. Because of the semantics issue that Prannoy
brought up, I imagine it would be extremely difficult to guess ahead of
time what the .odt file should look like.

I guess Scott is pushing for a more structured and automated testing
procedure (which is indeed what test-driven development requires to be
effective).

Yes, but I'm also fine with others pushing back :)

Best,

Scott