Hi Steve,
I am bringing this back to the lyx-devel list, where it belongs (I
responded privately by mistake)
On Thu, May 29, 2014 at 9:14 PM, Steve Litt <***@troubleshooters.com>
wrote:
> On Thu, 29 May 2014 14:55:02 -0500
> stefano franchi <***@gmail.com> wrote:
>
> > On Thu, May 29, 2014 at 2:35 PM, Steve Litt
> > <***@troubleshooters.com> wrote:
> >
> > > On Thu, 29 May 2014 14:36:29 -0500
> > > "Alex Vergara Gil" <***@cphr.edu.cu> wrote:
> > >
> > > > Hello Lyxers
> > > >
> > > > I wonder why LyX is not available to process little pieces of
> > > > python code within its own framework, like ipython notebook for
> > > > instance?? This feature allows us to have beautiful graphics such
> > > > the one produced by matplotlib package. I know there already
> > > > exists a similar binding for R through knitr module, so why not a
> > > > binding for python too??
> > > >
> > > > Is there a way, like modules or whatever, to achieve the same
> > > > functionality or at least some basic functionality of ipython
> > > > notebook within LyX??
> > >
> > > Oh, if we're going consider requests for difficult additions to
> > > handle a small subset of needs like beautiful graphics, how about
> > > filling the GAPING HOLE that there's no practical way to export to
> > > ePub, without massive human intervention and end-user programming?
> > > None of LyX's HTML and xHtml exports are remotely suitable for
> > > flowing-text eBook production, especially because different people
> > > have different ideas of how eBooks should be built.
> > >
> > > Personally, I think the easiest way forward on this is to take the
> > > current half XML half something else native format, and make it well
> > > formed XML. No doing favors of renaming graphics files with
> > > arbitrary numbers, no doing favors of making obvious hierarchy into
> > > <div>, just make the native format XML and let anyone with Python
> > > and lxml.etree have his way with the native LyX file.
> > >
> > > I know that two years ago I railed against XML native format, but
> > > parsers have gotten better, and right now we have the human
> > > unreadability of XML, combined with the unparsability of a more TeX
> > > like language. Well formed XML can only be an improvement.
> > >
> > > If well formed XML native format is not practical in the near
> > > future, perhaps somebody could make a program that exports the
> > > current native format into well-formed XML, once again without
> > > renumbering, throwing away structure, etc. Basically, just pass the
> > > environments through: No need to map Part to <h1>, once it's XML,
> > > we can trivially do that ourselves, the way we want to. Given that
> > > most eBooks don't do a lot of bibliography stuff, you could even
> > > have an initial version that has hooks for the bibliography stuff
> > > but doesn't actually do it. Put in the bibliographies next time. If
> > > it's not perfect with math, that's fine: I can't think of anything
> > > more frustrating than trying to read a math book on a small device.
> > >
> > > Because of LyX's inability to author flowing text eBooks in any
> > > reasonable way, I haven't used LyX in 9 months: I'm authoring with
> > > Bluefish now. Slow, difficult, crashy, but at least my source
> > > document can produce both print, PDF and ePub.
> > >
> > > Maybe it's just me, but if there's one feature LyX should really
> > > have in 2014, I think that one feature is a reasonable export
> > > mechanism to something that can be turned into ePub, without
> > > undoing all the BS the current LyX (x)Html converters throw into
> > > the export.
> > >
> > > Mark my words: Within two years, beautiful graphics won't matter one
> > > bit if the document can't be read on a small device.
> > >
> > >
> >
> > Steve,
> >
> > would the ODT converter that is the topic of one of our GSOC projects
> > work for the task you are pushing for?
> > After all, ODT *is* XML (for some reading of XML). I know too little
> > about the XML and e-pub formats to answer that question, but perhaps
> > you could take a look at the project description on the wiki [1] and
> > let us know what you think?
> > The project currently targets a subset of LyX's full functionalities
> > (only one class, only a subset of graphic formats, etc.). I would be
> > curious to know if these limitations prevent it to be used for epub
> > conversion (direct or otherwise).
> >
>
> Hi Stefano,
>
> Thanks for asking. I appreciate it.
>
> I'll give four answers to your question:
> * The precise answer
> * A speculative answer
> * What's really needed
> * Comments
>
>
> THE PRECISE ANSWER
> I don't know whether the exported ODT file will serve as a practical
> intermediate step for making an ePub. I won't know that until the
> software's been finished in August.
>
>
Fair enough.
>
> THE SPECULATIVE ANSWER
> I doubt the exported ODT file would be helpful in building ePubs. I've
> explored ODT, and although it's XML, its purpose is to work with
> LibreOffice, not to convey semantic information. If ODT were a
> database, it would be universally cursed as the most denormalized
> database ever. Change something one place, you need to change it in five
> more. And speaking of that, it's not even a single file: it's several
> XML files depending on each other. What could *possibly* go wrong.
>
> The LyX ODT exporter is based on tex4ht, which hasn't been updated for a
> couple years and, due to the death of its originator/maintainer, has an
> iffy future. I'd say that makes the future of the LyX ODT exporter
> iffy, especially because once the GSOC guy gets a great job, he'll be
> long gone.
>
> The tex4ht package is *very* complex, and I couldn't hope to get
> competent with it in a week, and I don't have that week, but I *did*
> download and install it, and after exporting a 1 chapter simple LyX
> book to LaTeX, I ran the following command on it:
>
> mk4ht htlatex test.tex
>
> A redacted version of the resulting HTML file and the resulting full CSS
> file are attached to this email. Take a look at these files with a text
> editor, and you'll see they are so cluttered with appearance
> information that the needed semantic information is obscured. Look at
> the CSS file it produced: That's crazy! And sadly, most of that
> appearance info is irrelevant in an ePub. I would suspect that building
> an ePub from an ODT created through tex4ht would be like trying to
> write a book by screen-scraping a PDF file.
>
> The bottom line is, to make an ePub out of an ODT created from tex4ht,
> by far the toughest task would be to wade through all the garbage to
> glean the rare relevant semantic information. And that's a shame,
> because take it from me, making a program to create an ePub is no easy
> thing: I've done it.
>
>
I agree with all you said and with the results of your test, but you didn't
give tex4ht a fair chance.
If you ask for html it will indeed try to format the output. Things are
better, in my opinion, if you shoot for
odt (or even xml). Try compiling your test chapter with
mk4ht oolatex test.tex
and then take a look at the resulting .odt file. Resuts may (or may not) be
more encouraging.
>
> WHAT'S REALLY NEEDED
> What's really needed is the words, pictures, environments, and
> character styles, in a parsable format. Nothing less, and *certainly*
> nothing more. Life's too short to pick through the extraneous 90% to
> get to the 10% that's needed.
>
>
I could not agree more. And this is exactly what we are trying to do with
the LyX converter:
keep the semantics and discard as much formatting as possible. However, it
turns out there is *a lot* of semantics that needs to be encoded, not just
words and pictures (and environments). Take a look at the test document we
prepared for the LyX conversion project (it is on the git repo [1]).
I tried to stick to the essentials, and yet it turns out to be rather
complex...
> Don't even pass me the definitions of the environments: Making them for
> ePub is a ten minute CSS task. Don't bother with indented and
> non-indented paragraphs: that's an implementation detail I can easily do
> myself. All I need is the environment and char style names
> surrounding the applied material. A separate list of environments and
> charstyles used in the doc would be handy for creation of stub CSS, but
> not necessary.
>
> In other words, pass me the words, pictures, and semantic information,
> and I'll take care of the rest. Anything else is just clutter.
>
>
Agreed. And that's what we are trying to do, that's why I suggested you
take a look.
> And, due to the power of Python plus lxml.etree, by far the handiest
> way to deliver this semantic information is via well formed XML.
>
> Of course, the best solution for me would be a well formed XML LyX
> native format. Then I could simply parse the LyX file to create the
> ePub, and perhaps along the way of the conversion I could drop
> intermediate files so people who want different conversion criteria
> could change individual parts of the process.
>
>
There are two problems with this approach:
1. the XML native format route has been discussed several times, but
support for it is non unanimous among developers, as far as I can tell, and
even if it were, shortage of manpower is a serious issue. So I would not
hold my breath.
2. The second problem is actually tougher to crack, in the general case.
Since LyX relies (most often) on LaTex, and LaTeX can add content to a tex
file though its own processing (think of bibliographies, procedurally
generated graphics, indices, and so on and so forth), it follows that a LyX
file (if Latex-oriented) is, in principle, information-incomplete. In other
words, processing the file with LaTeX and exporting to ePub may produce
seriously different outputs (semantic-wise).
- One solution is to emulate TeX processing in the export filter (as many
Latex-to-html|XML exporters do), but the emulation will never be perfect,
and most often is not even satisfactory.
- The other solution is to go through TeX itself, which is what tex4ht
does. The tradeoff is complexity, as you have discovered
- The third "solution" is to forbid XML export from using any LaTex
semantic-enriching packages. Or better yet, decouple LyX from LaTeX
altogether. Personally, this route would not be very appealing, but I can
see its merits. Perhaps this is what you have in mind?
[...snip]
LyX has the potential to be a write once, read everywhere tool. It
> could be *much more* than a front end for LaTeX. It's the fastest
> authoring environment I've ever used. It was heartbreaking when I had to
> leave it behind. But the days of selling print books are over: Every
> time I've offered a PDF version of one of my print books, the PDF
> outsold the print version 10 to 1, with a much higher profit margin.
> But more and more people read on small devices, and PDF files *suck* on
> smaller devices for anyone without Superman vision. Horizontal
> scrolling is the kiss of death: flowing text is where it's at.
>
> Right now the exodus from LyX isn't conspicuous, but it's there. I
> hardly ever post, and haven't used LyX in months, after 12 years of
> LyX being an integral part of my business. Rob Oakes hasn't posted on
> the LyX list since November 2012. Sure, there will always be a need for
> LyX amongst those writing educational theses, at least until their
> professors start reading on smaller devices. But right now inability to
> write ePubs is driving people away from LyX, and within two years this
> inability will be the kiss of death. LyX, the fastest and easiest
> authoring environment ever, will be relegated to grad students, and
> that too will eventually go away.
>
>
I agree in principle on all this, even though it does not apply to me or to
any of the people I am in daily contact with (students, colleagues, etc.).
That is: I (we) are still pdf-centric. In fact, I my self print out a pdf
before reading it, ad so do everyone around me. But you could rightfully
say we are dinosaurs.
At any rate, I hope you will be able to provide some feedback on the
project when the LyX-->ODT part is completed (which should happen by mid
June). Who knows, perhaps the situation then won't be as dire as it looks
now.
Cheers,
Stefano
[1]
http://git.lyx.org/?p=gsoc.git;a=tree;f=tests;h=00ee494cc993fca7ef135a8d26946b9101240399;hb=refs/heads/tex4htTesting
--
__________________________________________________
Stefano Franchi
Associate Research Professor
Department of Hispanic Studies Ph: +1 (979) 845-2125
Texas A&M University Fax: +1 (979) 845-6421
College Station, Texas, USA
***@tamu.edu
http://stefano.cleinias.org