11:15 pm, 20 Jun 03
Summary: It’s possible, and really not that hard.
So XSL (the S is for “style”) appears to have two major subparts. http://www.w3.org/TR/xsl/ says: This specification defines the features and syntax for the Extensible Stylesheet Language (XSL), a language for expressing stylesheets. It consists of two parts: 1. a language for transforming XML documents, and 2. an XML vocabulary for specifying formatting semantics. In practice, those are known as 1. XSLT, T = transformations, for transforming one XML document into another. This is the one I was looking at in the past. 2. FO, Formatting Objects, which is an HTML-like language for representing printed material. This includes ideas like the conception of a “page” (like different layouts for even and odd pages) and higher-quality typesettings things like kerning. I’ve been playing with the latter. You can combine the two to take an XML export of your journal, XSLT it into a FO document, and then render the FO to a PDF. (Really, an S2 style should create the FO document.) There are a few systems for rendering FO: - some commercial ones (including the top few links whenever I tried Googling for information on this on Linux). - some Java one that is made by Apache or something. - http://xmlroff.sourceforge.net/ which uses Pango(!) + PDFlib or Gnome-Print for rendering. - http://www.tei-c.org.uk/Software/passivetex/ which uses TeX for rendering. The last one is in Debian. It’d be neat to be able to hook this up to some web page export, but even rendering two pages of output takes a few seconds on my computer. Some sort of delayed response (submit, then come back for the results) would work. Creating good styles themselves is the same problem we have with pushing S2, but it’d be trivial to copy the style found in a printed journal; I was looking at the free preview of Diary of Anne Frank on Amazon, for example, and I think I could copy that in a few hours if I knew FO better. (Are there any copyright issues here?) The other hard part brings us back to Pie (or whatever): the journal content also needs to be transformed into FO. For example, here’s a trivial transformation that maps <b> tags into the <fo:inline> tag (equivalent of HTML’s “span”): <xsl:template match="b"> <fo:inline font-weight="bold"> <xsl:apply-templates/> </fo:inline> </xsl:template> (As you can see here and elsewhere, FO feels a lot like CSS in its design.) However, that would only work if journal content is available as part of the XML tree. It isn’t for us, because journals aren’t guaranteed to be well-formed XML. It would seem to me that we could just run the HTML cleaner on the entries before they’re generated, and then we’re good to go. (Alternately, we can try to generate FO directly from the malformed entries, but FO is written in XML so we still need a well-formed hierarchy.) Attached is a test PDF and the XSL used to generate it from a LogJam exported month. I snipped out most of the entries for testing, and it doesn’t look so good because I don’t know much about FO. There are also still some HTML-isms in the XSL, but there’s enough there to get the gist of it. To build it yourself: apt-get install xsltproc passivetex xmltex xsltproc monthpdf.xsl logjam-xml-file.xml > output.fo pdfxmltex output.fo