This page contains tips on typesetting DITA documents using two popular open-source packages, Apache FOP and DITA Open Toolkit, plus the commercial editor Syncro Soft Oxygen (version 13.2). The discussion assumes Oxygen’s directory setup and GUI integration, but most points also apply to the stand-alone versions of Apache FOP and DITA OT.
All the information presented here comes from my own attempts to figure out these fairly complex packages. Please note that I have since abandoned my attempts to get decent typesetting out of DITA, so I can’t say if and how these tips apply to newer versions of the involved software packages.
Oxygen is an all-purpose XML editor that supports a broad variety of XML-based technologies, including “transformation scenarios” that turn DITA or DocBook input into HTML or PDF output. Oxygen comes bundled with various open-source projects that perform these transformations, all preconfigured and conveniently accessible from an integrated GUI.
Here we’re interested in two particular methods of creating PDF documents using Oxygen XML Editor and the following technologies: XSL-FO (Formatting Objects), Apache FOP (Formatting Objects Processor), DITA (Darwin Information Typing Architecture), and DITA OT (Open Toolkit).
.fo) and run it through the “FO PDF” transformation scenario that relies on Apache FOP. The XSL-FO language defines visual markup, not semantic markup, giving you precise control over the appearance of each page. We cover this case along with DITA transformations due to their close kinship in DITA OT.
.dita) and run it through the “DITA PDF” or “DITA Map PDF” transformation scenario that relies on DITA Open Toolkit. The latter first converts your DITA document into XSL-FO and then runs the temporary FO document through Apache FOP, as above.
Both methods rely on Apache FOP, and version 1.0 of this software has some serious shortcomings: FOP cannot use OpenType fonts with PostScript outlines, nor embed PDF images in PDF output files. Fortunately both can be worked around with the help of additional software. Another download will enable automatic hyphenation, and lastly we’ll tweak PDF formatting in the DITA Open Toolkit. But before we get to that I’d like to point out two more problems you should be aware of.
Apache FOP 1.0 still lacks support for many XSL-FO objects, as you can see in this compliance chart. One notorious example is the attribute
font-variant which enables small caps. Since Apache FOP does not recognize this attribute you won’t get small caps formatting, either real or simulated. You’ll have to simulate it yourself, e.g. using
text-transform="uppercase" font-size="smaller"; or alternatively use a font that replaces normal characters with their small-caps versions. You’ll see below how to declare new fonts for Apache FOP and DITA OT.
Oxygen and DITA OT generally support
bookmap files well. PDF output correctly shows hyperlinked tables of contents, figures, etc. as well as PDF bookmark trees. You should be aware that any phrase markup (e.g.
<i>) is dropped from chapter titles when they are shown in the table of contents, but that’s not a big deal.
However, DITA OT 1.5.4 falls far short of the DITA 1.2 standard when it comes to
bookmap metadata. DITA 1.2 replaced the old
bkinfo element with a much more expansive
bookmeta element, but the FO style sheets in DITA OT 1.5.4 still largely refer to
bkinfo instead of
bookmeta. Only these few elements produce any output:
bookmeta/authorfor simple author names. The standard
hrefattribute is ignored but plain-text URLs are automatically rendered clickable in PDF output, at least.
bookmeta/shortdescfor a short book description, but inappropriately formatted for topics rather than book front matter. You’ll have to modify the corresponding XSLT style sheets to use this element; I haven’t attempted to do this.
bookmeta/prodinfo/prodnamefor a product name shown in front of the chapter title on page headers. However,
prodnamedoes not appear in front matter output.
That’s it – everything else is ignored, as far as I could determine. I’ve posted a feature request on DITA OT’s SourceForge tracker to create front matter output from the most important
bookmeta elements, and Oxygen’s Radu Coravu has submitted another request to populate PDF metadata with
bookmeta information. You may wish to comment there if any
bookmeta elements are especially important to you.
All relative directory paths in the following sections are relative to your Oxygen installation folder unless otherwise specified. On 64-bit Windows, that would be
“C:\Program Files (x86)\Oxygen XML Editor 13” for the current version. Note that creating new files or directories within that folder usually requires administrator permissions.
I’m using forward slashes rather than backslashes for paths that are relative to Oxygen. Windows supports either variant, and forward slashes seem more natural for software such as Apache FOP and DITA OT. If you run into any trouble with Windows file tools, just replace forward slashes with backslashes.
I will occasionally advise you to create new Oxygen transformation scenarios, since the default scenarios are read-only. You may be interested to know that these new scenarios are embedded in this monolithic user settings file:
This assumes a Windows 7 file structure, your Windows user name replacing
User, and Oxygen version 13.2.
Apache FOP 1.0 supports several font formats including Adobe Type 1, TrueType, and OpenType with TrueType outlines. One notable format FOP doesn’t support is OpenType with PostScript outlines, a.k.a. OTF/CCF (Compact Font Format). Annoyingly, this means popular fonts that are available only in OTF/CFF, such as Adobe Minion and Myriad, are completely unusable. When FOP encounters them in a document you will see this runtime error: “OpenType fonts with CFF data are not supported, yet.” This defect has been known for years, and nobody seems able or willing to fix it.
The only workaround is to convert OTF/CFF fonts into a supported format, such as TrueType, and then have FOP use the converted fonts. So how to perform this conversion? A free option is to use FontForge which luckily has an unofficial Windows port. FontForge is a gnarly Unix program based on the X11 windowing system, designed before people had quite figured out how windowing systems are supposed to work, but converting font files is reasonably simple.
FontForge doesn’t seem to support Windows drive letters. For this reason, and also to simplify the conversion and avoid damaging your installed Windows fonts, I suggest you do the following:
FontForgesomewhere you have write permission.
FontForge\fontswithin that folder.
Now download the unofficial Windows port of FontForge. I recommend getting the 7-Zip package rather than the unnecessary installer. Extract the package into the
FontForge folder you just created. Execute
run_fontforge.exe and accept the Windows Firewall request (it’s caused by internal communication within the X11 windowing system).
You should see an Open Font dialog. Double-click
fonts and shift-select all the OTF/CFF font files you copied there. Many windows will open, including one that shows lots of errors. Ignore that one, and also ignore any errors and warnings that appear during the next step. FontForge is meticulous about font format specifications whereas most commercial font foundries couldn’t care less, but the converted fonts should work anyway.
Activate any one of the font windows that have popped up. Choose File: Generate Fonts, ensure that TrueType is selected as the output format… and here comes another trick: you must click Options and check the Old style kern option. OTF/CFF fonts may use a new way of specifying kern pairs that FOP doesn’t understand, so if you want proper kerning you must check this option. Now click OK and then Save. Confirm any warnings you may see and eventually you should find the converted TTF file in the
Now you can close the font window of the converted font, activate any of the remaining ones, and repeat the above steps until all font files have been converted. We don’t want to confuse Windows by putting two versions of the same font into
C:\Windows\Fonts, so I suggest that you create another
fonts directory in your Oxygen installation folder and move all newly created TTF files there.
Apache FOP automatically scans
C:\Windows\Fonts but we have to tell it about our new folder. I also found that FOP isn’t that great at identifying fonts on its own; for example, it would confuse italic and regular variants unless they were explicitly declared. So that’s what we’re going to do now.
The configuration file for stand-alone execution of FOP (as opposed to embedded DITA OT execution, see below) is
lib/fop-config.xml in your Oxygen installation folder. Create a copy of this file in the same directory, e.g.
lib/fop-config-custom.xml, and open that copy in a text editor. Near the top you’ll find a comment block explaining the
<font-base> element. Directly below, add the following line:
This declares the base directory for the custom font declarations we’re going to add, assuming you created
fonts directly below your Oxygen installation folder as I suggested. You’ll need to adjust the line for a different Oxygen installation or
Now scroll down past the
<fonts> tag in the block
<renderer mime="application/pdf">. There’s another comment block explaining the
<font> element here, and that’s just what we’re going to add: one
<font> element for each converted TTF font file. Obviously, the exact contents will depend on your font files but here’s an example for the basic Minion Pro variants:
<font kerning="yes" embed-url="MinionPro-Regular.ttf"> <font-triplet name="Minion" style="normal" weight="normal"/> </font> <font kerning="yes" embed-url="MinionPro-It.ttf"> <font-triplet name="Minion" style="italic" weight="normal"/> </font> <font kerning="yes" embed-url="MinionPro-Bold.ttf"> <font-triplet name="Minion" style="normal" weight="bold"/> </font> <font kerning="yes" embed-url="MinionPro-BoldIt.ttf"> <font-triplet name="Minion" style="italic" weight="bold"/> </font>
name attribute of the
<font-triplet> element declares the font family name that FO documents will recognize. You are free to define any family name you like. For fonts that lack explicit declarations, FOP will extract the family name from the font files.
One task remains, and that’s getting Oxygen to use our new FOP configuration file when processing FO documents. Bring up the Preferences dialog (Options: Preferences), then select the node XML: XSLT-FO-XQuery: FO Processors in the tree view on the left, and finally enter the path to your custom configuration file in the field Configuration file, like so:
You can set this option globally or per-project, using the radio buttons at the bottom. Click OK when done. Transforming an FO document using the default “FO PDF” scenario should now give you access to your converted fonts.
Annoyingly, font customization in the DITA Open Toolkit uses a different FOP configuration file, plus a DITA-specific “font mappings” file. This section describes how to apply the requisite modifications to copies of each file, and then tell Oxygen how to locate these copies. We assume that any desired OTF/CFF fonts have been converted and saved as described in Apache FOP and OTF/CFF Fonts.
All file paths in this section are relative to
frameworks/dita/DITA-OT/demo/fo/ in the Oxygen installation folder. This folder contains all XSL-FO build scripts, style sheets, and configuration files used by DITA OT, including anything related to Apache FOP.
DITA OT uses
fop/conf/fop.xconf to configure FOP. You could just edit this file in place and save yourself a lot of work, but for the sake of a proper tutorial we’ll use the official customization folder instead. Copy the file to
Now edit your copy of
fop.xconf and replicate all changes you made to
fop-config.xml. This includes
<font-base> with the same path as before – just put it below
<base>. Comments in
fop.xconf claim that its original contents have no effect since they reflect default settings, so you might as well delete anything we don’t need, such as the other
fop.xconf lacks the element
<auto-detect/> within the
<fonts> element for the PDF renderer. You might wish to add this element if you want DITA OT to use any fonts in your
Finally, Oxygen needs to know about our new configuration file. Find the scenario “DITA PDF” under “DITA OT transformation” in the Transformation Scenarios sidebar, right-click and select Duplicate, and change the name to “DITA PDF (Custom)” or the like. Now select the tab Parameters, select the parameter
args.fo.userconfig, click Edit, and enter the following Value:
Click OK twice to save your new transformation scenario. Should you wish to use the new fonts in other “DITA OT transformation” scenarios, you must copy & edit them likewise. You still cannot actually use your new fonts in a DITA document at this point, because DITA OT only knows the fonts declared in a separate font mappings file.
DITA OT’s font management throws another complication into the mix. All fonts declared by the FOP configuration file are considered physical fonts, whereas DITA OT typesetting requires specially declared logical fonts. The file
cfg/fo/font-mappings.xml performs the necessary mapping of physical to logical fonts. Every font you wish to use in a DITA style sheet or document must be declared as a logical font in this file.
Copy the font mappings file to
Customization/fo/font-mappings.xml, right alongside the file
fop.xconf you created earlier. Now it’s time to ponder what you want. DITA normally uses only three different font families in a document. These are declared as the logical fonts
Monospaced, which are mapped to the physical fonts Helvetica (or Arial), Times (New), and Courier (New), respectively.
If you wish to outright replace the default physical font for any or all of these logical fonts, you simply add the font name you declared in
fop.xconf to the corresponding
<font-face> element, preceding any other options. The element with
char-set="default" is usually the only one you want to change, unless you have fonts with South-East Asian characters in mind. An example with Minion:
<logical-font name="Serif"> <physical-font char-set="default"> <font-face>Minion, Times New Roman, Times</font-face> </physical-font> …
If you wish to use an additional physical font in your DITA documents, you have a problem. Declaring a new logical font is easy enough: just copy one of the existing
<logical-font> elements with any desired
char-set (again, usually just
default) and change the logical name and physical fonts as desired. However, all the default DITA style sheets still only reference the three default logical fonts, so in order to use your new font you will have to change the requisite XSLT style sheets. The easiest way to achieve this is attribute customization, which we’ll do later on.
Activating our custom font mappings requires one more step. Copy the file
Customization/catalog.xml, and uncomment the line following “FontMapper configuration override entry” in your copy. You don’t need to do anything else in Oxygen, as this customization catalog is read automatically when you use any “DITA OT transformation” scenario. Should you wish to use multiple different customization catalogs, you must create new customization folders and transformation scenarios, and set the parameter
customization.dir in each scenario accordingly. (At least I think that’s how it’s supposed to work!)
Note: You may notice an element called
override-size in the existing entries for the “SymbolsSuperscript” character set. You might think this lets you easily adjust the size of any given physical font. Sadly, you’d be wrong – at least I couldn’t get it to work. Eventually I used the
font-size attribute in the transformation style sheets instead, as mentioned in Customizing DITA Attributes.
Apache FOP can embed a variety of image formats in PDF output, but not other PDF files. Fortunately, there’s an easy solution which is briefly mentioned in the Oxygen help. This section describes the steps to implement this solution, with one caveat: it only works for DITA OT (DITA documents), not for stand-alone Apache FOP (FO documents). We need to load a custom Java plug-in for FOP, and weirdly Oxygen provides the requisite customization point only in DITA scenarios, but not in FO scenarios. The Oxygen help suggests writing a custom batch file for FO scenarios to call FOP with the required plug-in, but I couldn’t get that to work on Windows.
So with this caveat, here’s what you need to do. First, download the PDF Image Support Plug-In for Apache FOP by Jeremias Märki. The current version at the time of this writing was 2.0 which works just fine. Unpack the file contents (minus the
javadocs folder unless you’re a Java developer) to some empty directory of your choosing; I suggest
fop-pdf-images in your Oxygen installation folder.
Now we just need to tell Oxygen to use this plug-in when creating PDF output from DITA documents. This requires editing a DITA PDF transformation scenario. If you have already created a “DITA PDF (Custom)” scenario in section DITA OT and OTF/CFF Fonts, you can select that one and choose Edit. Otherwise, find the scenario “DITA PDF” under “DITA OT transformation” in the Transformation Scenarios sidebar, right-click and select Duplicate, and change the name to “DITA PDF (Custom)” or the like.
With your “DITA PDF (Custom)” scenario open for editing, select the tab Advanced, click Libraries and then Add. Click the Browse button to the right and select the folder
fop-pdf-images you just created. Make sure to select only the folder, not any individual files stored within. The resulting URL should look similar to this:
Click OK twice to save your new transformation scenario. Should you wish to use PDF images in other “DITA OT transformation” scenarios, you must copy & edit them likewise. You can now enter image links such as
<image href="MyImage.pdf"/> in DITA documents, and they should render correctly in PDF output, although the Oxygen editor preview still won’t show them
One final note: the PDF image plug-in needlessly copies the original PDF image to the output folder and also creates big temporary files during transformation. Remember to delete the redundant copies and clean out your temporary files folder when you’re done.
Apache FOP supports automatic hyphenation but does not ship with the requisite hyphenation patterns, and neither do DITA OT or Oxygen. This is due to licensing issues described in Apache FOP: Hyphenation. You can get more information and download the actual patterns at FOP XML Hyphenation Patterns.
The installation procedure is simple. Download the current binary package (version 2.0 at the time of this writing) and copy the included file
fop-hyph.jar into the folder that contains the file
fop.jar of your Apache FOP. For Oxygen users, that is the directory
lib in your Oxygen installation folder. Apache FOP will now automatically use the hyphenation library, during both FO and DITA OT transformations.
Annoyingly, hyphenation is disabled by default, so you need to manually enable it. This requires setting a language so that Apache FOP knows which patterns to use, and then requesting automatic hyphenation itself. First, here is the procedure for FO documents:
xml:langattribute such as
xml:lang="en-US"to any element that contains text in that language.
hyphenate="true"to any element for which automatic hyphenation should be enabled.
Both attribute values are inherited by all nested elements, and can also be repeated in nested elements to select a different language or hyphenation mode.
In order to automatically hyphenate DITA documents via Apache FOP, the same attributes must be placed on the intermediate FO documents generated during transformation. This requires some further preparation which also enables a broad variety of layout customizations, as discussed below in Customizing DITA Attributes.
Note: DITA OT adds the default attribute
xml:lang="en-US" during all FOP transformations, so you only need to specify
xml:lang if you require a different language.
Unsurprisingly, automatic hyphenation can have trouble with short lines and difficult words, such as code identifiers. You can attempt to enforce a specific line break by inserting a regular hyphen plus space in the offending word, but there’s also a subtler method.
The Apache FOP hyphenation engine recognizes Unicode discretionary hyphens or “soft hyphens” (U+00AD). They are entered in XML as
­ and don’t appear in display or print output. Theoretically, words should be hyphenated exactly where you insert a soft hyphen, just as with the equivalent feature in a word processor. Sadly, this is not what happens in practice.
In my experience, the hyphenation engine rarely breaks lines exactly at a soft hyphen, but the mere presence of a soft hyphen within a word “encourages” Apache FOP to hyphenate the word somewhere else when it would otherwise not get hyphenated at all! This strange behavior does allow one neat trick: because the precise location of the
­ code doesn’t matter, you can insert it where it won’t impede plain text search, e.g. after the
. of a namespace qualifier or before the
< of a generic type parameter.
DITA OT provides a simple way to customize the XSLT style sheets used by Apache FOP without having to modify the default files. You can provide two XSLT style sheets, one for attribute sets and one for templates, that contain nothing but your customizations. They are merged with and override the default styles during FOP transformation. We’ll use the first option to make a couple of basic layout changes.
All file paths in this section are relative to
frameworks/dita/DITA-OT/demo/fo/ in the Oxygen installation folder. Copy the file
Customization/catalog.xml if you haven’t already, and uncomment the line following “Custom attributes entry” in your copy. Now copy the file
Customization/fo/attrs/custom.xsl – this is where you put all your custom attributes and other transformation parameters.
To simplify matters, I’ve put up my own copy of custom.xsl for download. All entries are grouped by the default XSLT style sheets containing the original values, and commented to explain the effect of the modifications. Our custom
attribute-set definitions only state the modified attributes; any unchanged values are simply inherited from the default definitions.
The rest of this section gives an overview of all the customizations in my
custom.xsl file. An older version of the Tektosyne User’s Guide (PDF) demonstrates DITA bookmap formatting with these settings. For comparison, the guide’s current version is based on LaTeX with MiKTeX.
This section summarizes my simpler layout customizations. More complex changes are discussed below.
pretend to be rather big and bulky for their nominal point size, so I scaled them down to 85%. (This should work via font mappings but sadly doesn’t!)
li) and table cells are automatically hyphenated (see Automatic Hyphenation). Paragraphs and definitions (element
dd) are justified as well.
Consecutive paragraphs (element
p) are set with vertical spacing by default, but I prefer an indented first line and no spacing. This requires a somewhat complex bit of conditional XSLT to do the following:
Moreover, vertical spacing is always omitted if the parent is the definition part (element
dd) of a definition list entry. This is due to a peculiarity of definition list typesetting which we’ll cover next.
Definition lists (element
dl) superficially look like other list types (
sl), except that each entry (
dlentry) is split into a term (
dt) and a definition (
dd) part. So far, the natural way to typeset a definition list would be to e.g. replace a regular list’s bullet with the bolded term.
Unfortunately, that wasn’t good enough for DITA. The standard also allows a definition list to have a table-like heading (
dlhead) and, worse, multiple terms and definitions per entry! This turns a useful extension of the list model into a useless restriction of the table model. Even though most definition lists have no heading and only one term and definition per entry, DITA OT uses table typesetting for definition lists because that’s the only general way to handle their content model.
This has unfortunate consequences for the vertical spacing of definition list elements. Normally, DITA OT picks the greater of the
space-after attributes specified by two adjacent blocks to determine their spacing. However, since
dd elements are typeset as table cells, their
space-after values apply individually to the borders of each invisible table cell! As far as I could puzzle out, the two spacing mechanisms work like this:
The left diagram shows ordinary block spacing between a paragraph and a list item: both
space… values are overlapped, and the larger value determines overall spacing. The right diagram shows table spacing for definition list entries: each
dd element resides in its own invisible table cell, and each
space… value is individually applied from the border of that cell. Adjacent
space… values are added rather than overlapped. Similarly, if any horizontal indentation were specified it would apply from the vertical border of each element’s cell.
So if you’re using definition lists at all, you will have to take this layout mechanism into account and ensure that all potential contents specify exactly the desired spacing. The DITA OT default styles for definition lists are no help here, they merely inherit an inappropriate table layout. Our customization uses the following overrides:
ddelement. This is one-half of the standard block spacing of 0.6 em because the spacing of adjacent table cells is additive. The table as a whole uses the regular maximum spacing rule, so those 0.3 em are automatically replaced by 0.6 em (or more) on the outer borders.
ddelements, as noted above. This means definitions can contain multiple paragraphs without inappropriate extra spacing.
dtelements. There are no visible table cell borders that would require an indentation, but terms should observe a minimum distance from each other and from definitions.
The table nature of definition lists has one other nasty consequence: terms always reserve a fixed space to the left of definitions, rather than appearing above definitions or running into them, like a heading. The reserved space is fully half the width of the entire list by default, which is far too large. We reduce it to 40 mm which is just about big enough for most purposes.
This concludes my small collection of tips to get you started with PDF typesetting using Oxygen, DITA Open Toolkit, and Apache FOP. There are many more parameters and style sheets that you can customize – I recommend that you check out Scriptorium for ideas. As you will have realized by now, this tool chain is both quite complex and rather immature, so be prepared to spend many hours in order to get the desired output… and also prepare for disappointment when that isn’t possible. Good luck!