Publishing Markup Mysteries: Bill Kasdorf’s Take

In his Apex Content Solutions blog post, Bill Kasdorf clears up confusion around publishing markup mysteries. He admits that some people may find the concept or “markup” confusing. Let’s take a look at two of his examples below.

Isn’t EPUB just a form of XML?

Well, yes and no. The content documents in an EPUB are XML—the words you’re reading on your ereader or phone. But EPUB itself is a file format. It’s a package that contains lots of components that make up a publication. Not just the content documents, but the images and media and other features that together comprise a given publication, the CSS stylesheets and fonts that govern how they look, and metadata and navigation files that make it all work. All this good stuff is gathered up in a systematic package called an EPUB.

Because its current packaging is a .zip file, an EPUB looks like—and is—a single file. Which leads people to think it’s just a file like an XML file. Nope. It’s way more than that.

HTML is not the same as XML. Except when it is.

Those XML content documents in an EPUB aren’t just any XML. They’re XML using a very specific vocabulary: HTML5. Or, to say that the other way around, they’re HTML5 using XML syntax. That’s often referred to as XHTML; but it’s not the old XHTML 1.1 of a few years back.

If you want to learn more about markup for publishers, go read the Apex Content Solutions blog.