This post might be a bit out of the overall context but I thought it couldn’t hurt to loose a few words about a recent coding project of mine. As you might have noticed, I have put some of my open source projects on Gitlab after emigrating from Github last year (and moved on to Codeberg in September 2021). Most of the things there are quite particular and probably only interesting to few. Recently, however, I was looking for a way to generate Libreoffice Writer documents on the PHP based server part of a web application and thought that there must at least be half a dozen projects in public repositories with a solution for me. To my surprise it turned out that this was not the case, so I set out to put a library together on my own and open source it for others to benefit as well.
One project I have found that would have partly solved my problem is ‘Tiny But Strong‘. The library offers a convenient way to fill in text and graphics into existing Libreoffice documents with placeholders. Sounds great at first but an important requirement of mine was that I didn’t only want to insert plain text that is formatted with the styles of the paragraph a placeholder is in but to have different styles such as bold, italics, superscript and color in the text as well. Turns out that Tiny But Strong can’t do that.
So I set out to write a library on my own. At first I thought that this would be a simple thing as a Libreoffice Writer .ods file is basically a ZIP file with a number of files inside, one being ‘content.xml‘ that contains the actual document encoded in XML. As my input text is encoded in HTML, I hoped that a translation should be straight forward. After all, HTML is very similar to XML. But it quickly turned out that the description of styles and paragraphs in Libreoffice XML significantly differs from HTML, not only in the kinds of tags that are used but also when it comes to the actual ‘grammar’ so to speak. So a simple 1:1 conversion was not possible. The solution I have finally come up with handles simple tags, tags with parameters and spans that describe color separately and creates non-repeating conditional formatting style combinations that are used in Libreoffice. The result works pretty well for me and even input text that generates an output document of well over 700 pages is converted without a hickup.
One thing I was surprised about was the overall processing speed of the code, which is, after all not compiled but interpreted to some extent. Previously, I used HTML output with special paragraph markers that I copied to Libreoffice and then used a Libreoffice Basic macro to convert the paragraph markers into the respective paragraphs styles. Running through the 700 page document took the Basic macro 4-5 minutes, i.e. around 300 seconds. The PHP Libreoffice export code does the whole thing in 0.4 seconds. Slightly more efficient I would say.
So if you found this post with a search engine because you are looking for a solution just like this, head over to the repository and have a closer look.