The BookGlutton conversion library for PHP is now open source! Download or fork it at github!
Converter-header

STEP 1 OF 3 - UPLOAD .ZIP FILE

  1. Upload an HTML book produced from our specifications (see below).
  2. Run a pre-flight check to make sure it's up to specs.
  3. Download a production-ready EPUB for free.
Converter-step1-header  
SAVE YOURSELF SOME TIME - EXAMPLE ZIP

Along the way, you'll receive feedback on your source files, and a validation report on the generated EPUB.

ABOUT STEP 1

Before you upload your ZIP you MUST format your content as XHTML 1.1. You can do this with Dreamweaver (File > Convert > XHTML 1.1), Amaya, TextMate HTMLTidy or other web design tools - your file will still have a .html or .htm extension. We recommend saving each chapter in its own HTML file, making sure its encoding is set to UTF-8 (Dreamweaver > Ctrl + J > Title Encoding). Also make sure that:

  • YOUR CHAPTER FILES ARE VALID XML - THIS MEANS XHTML OR HTML 5
  • YOU INCLUDE AN INDEX FILE FOR THE TABLE OF CONTENTS (SEE BELOW)

You can tweak your author info and metadata, too. To see how, check out Example ZIP

QUICK START GUIDE

Here's the fastest way to get things converted.

  1. Start with a folder that contains your book in html format. It can include up to 4 MB of images.
  2. Save each chapter as a separate .html file - not required, but easiest. As you save these .html files make sure their formatting is set to XHTML 1.1 - discussed above. To avoid getting question marks in your files, make sure they're set to UTF-8 (Dreamweaver > Ctrl + J > Title Encoding).
  3. Download our example file and use that index.html file as a template for your own by copying it into your folder. It includes detailed instructions about how to modify it - put in your own title, author, description and more.
  4. Create the table of contents using the default list in the index file, again following the comments. When you're done, test the index file in a browser, then zip up the folder and upload.

ADVANCED INSTRUCTIONS

This tool accepts ZIP archives containing one or more HTML documents (with related assets) and generates EPUB files as HTTP response output.

Summary of recent changes (as of August 2009):

  • If you have a standalone HTML file as source, you'll need to ZIP compress it before converting. On modern versions of Mac OS X, Windows and Linux, this is built into the right-click menu.
  • All metadata (title, author, etc) must now be included in the ZIP archive itself. See example below for guidelines.
  • This converter now emphasizes validation. However, it's a two-way street. There's no guarantee of valid output.

In order to ensure a valid EPUB with the navigation structure you desire, you'll need to follow some guidelines for your ZIP archive. They are much simpler than the requirements for EPUB files, which are also ZIP archives, and the only requirements from you are a working knowledge of XHTML and a reasonably good HTML editor (Amaya and Dreamweaver are a few of the many options).

Guidelines for creating a ZIP source

  • Use a website structure Follow the best practices and standards used in creating a simple website. EPUB is very closely related to this structure, and using it as a source makes a lot of sense. Start with a single folder containing your content documents and all your assets. Create and edit using standard web development tools. Preview and refine your site with a standards-compliant browser such as Safari, Chrome or Firefox. Then create your archive file from the final directory structure.
  • Use an index.html file Create a file called index.html at the top level of your archive.
  • Use an images directory You can store images in whatever folder you want, but a folder named as such has become common practice.
  • Use a css directory You can use CSS files, best stored in a directory named as such.
  • Use embedded fonts Currently font files may have unpredictable results. We'd like to hear your results so we can better support fonts.
  • Include a table of contents This should be part of your index.html file. See example below.
  • Include metadata Specify all metadata in the head of your index.html file, using Dublin Core for HTML. Title, author and language are required. EPUB files also require an identifier, which you can specify if you want, but if you don't, the converter will generate a unique ID for you. For cover images, in the head of your index file, use: <meta name="UBO.cover" content="path_to/file_name_of_your_cover.png" /> , where the content attribute has a relative path to your cover image file. This path should reference the file from the same level that you reference all your content docs.

Why The Change?

The converter is more strict about input for two reasons. One is the positive impact on overall outputs. As a paid service, the goal here is production-ready EPUBs. The other reason is ease of authoring. For those actually producing e-books, the ability to preview a book in a browser as well as an EPUB renderer is a huge benefit. It vastly speeds up production, requires only well-tested and widely-used tools, and allows testing book layout against actual implementations of open standards instead of proprietary implementations.

Valid Source==Valid EPUB

The files in the source archive must be UTF-8 encoded, XHTML 1.1, pre-validated with W3C validation tools. This is now a strict prereq for this converter. It requires a bit more knowhow but ultimately simplifies the process and offers much more control over the resulting EPUB structure, content and metadata. Both Amaya and Dreamweaver offer options for converting HTML to XHTML, and generating validation reports.

A detailed technical exploration of the key differences between the flavor of XHTML that a valid EPUB requires, and the kind of XHTML most people are used to is here, but in short the most common problems in converting are:

Structure and nesting are more constrained
For example, in XHTML 1.1 Strict, you can't place things like anchor tags, image tags or br tags at the top level of the document (ie as children of the body element). Both must be nested in other block elements, such as divs. A quick fix for HTML 4 documents that may have hundreds of invalidly nested elements is to enclose everything in the body of the document in a div wrapper.
Some previously acceptable attributes and tags are no longer allowed
The "name" attribute, very commony used in much in-the-wild HTML, and the "lang" attribute, also very common, are no longer used. Instead of "name", use only "id". Instead of "lang", use "xml:lang". The >center< tag is no longer allowed, and neither is the "target" attribute.
HTML entities are not allowed, mostly
This is always another big problem when converting HTML documents to XHTML Strict. The only entities that XHTML 1.1 shares with HTML are &gt;, &lt; and &amp;. The &nbsp; entity should be replaced with &#160; You may want a list of other useful conversions. You can read more about characters in XML
Some previously optional attributes are now required
The alt tag is required for img elements, and some other elements have stricter requirements on attributes geared toward fallback content, better accessibility, or machine-readability.

Sample index.html file

This sample index.html file will help you understand how the source archive will work. This is contained inside the THIS EXAMPLE.ZIP archive, but here's a commented version that explains things:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<!--
     DOCTYPE must be the first line of the file
     and if not set to XHTML 1.1, will be replaced
     with an identifier for that.
-->
<!-- 



BOOKGLUTTON SAMPLE SOURCE FILE FOR EPUB OUTPUT


This file is meant to generate a production-ready EPUB file
from HTML sources. This file must be named index.html and
included somewhere in your archive directory structure.

The comments here are intended as guidelines to understanding
how the converter uses this file, and to the limitations and
possibilities.

Please direct all support inquiries to travis@bookglutton.com.

-->


<!--

use http://validator.w3.org/ to validate all the html docs in
your HTML archive!

for stricter html5 validation, try http://html5.validator.nu/

-->

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<!-- 
      none of the metadata items included in this index
      file need to be repeated in the other content
      files.
-->

<!-- the profiles used for metadata in this doc. -->
<head profile="http://dublincore.org/documents/dcq-html/ http://www.bookglutton.com/formats/ubo/">
<!-- Encoding. must be utf-8! keep this tag at the top of the head -->
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
<!-- title-will not be used in the generated EPUB, but is a required element -->
<title>Unbound Book Object 1.0</title>
<!--reference the schemas for this profile: required-->
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
<link rel="schema.UBO" href="http://www.bookglutton.com/formats/ubo/2009/" />
<!-- Conversion meta tags. EPUB metadata will be generated from these tags, only
      in the index file of the archive. Only the first index file encountered in the
      zip will be used for metadata. All other files called index will be ignored.
      We use DC meta data terms and values, see (http://www.ietf.org/rfc/rfc2731.txt)
     The difference that the qualifiers on these terms, eg the 'aut' on the DC.creator
     term, will appear as opf attributes in the generated package. These qualifiers do not
     need to correspond to any recommended dublin core qualifiers -->
<!--  uncomment the following tag to set your own primary unique id.
      if you don't, none of the other identifiers will be primary,
      and bookglutton will generate a primary id for you using a UUID scheme 
      (eg. urn:uuid:cb1f6a60-e76c-3184-fd36-60dcd3865471 - universally unique
      identifier). leaving this out is the recommended approach. -->     
<!--
<meta name="UBO.primaryId" scheme="URL" content="http://www.bookglutton.com/catalog/book/0" />
-->

<!-- additional identifiers are allowed, but not required -->
<meta name="DC.identifier" scheme="URN" content="urn:bookglutton:catalog:book:0" />
<meta name="DC.identifier" scheme="ISBN" content="1-56592-149-6" />

<!-- language IS a required item for EPUB files -->
<meta name="DC.language" content="en-US" />

<!-- title is a required item for EPUB, and you can optionally have more than one -->
<meta name="DC.title" content="First title" />
<meta name="DC.title" content="Second title" />

<!-- creator is a required metadata item for EPUB. the .aut specifies this is the primary author -->
<meta name="DC.creator.aut" xml:lang="en" content="aaron miller" />

<!-- you can list as many additional authors as you want -->
<meta name="DC.creator" xml:lang="en" content="travis alber" />

<!-- you can list other kinds of contributors, but not required -->
<meta name="DC.contributor.art" xml:lang="en" content="travis alber" />

<!-- the date of ops publication is optional -->
<meta name="DC.date.ops-publication" content="2001-07-18" />

<!-- see http://www.w3.org/TR/NOTE-datetime -->
<meta name="DC.date.original-publication" content="2001-07-18" />

<!-- a description is specified like this. remember it has
     to be compliant as an XML attribute value, or you'll 
     get errors -->
<meta name="DC.description" content="Description goes here" />

<!-- publisher, imprint or publishing group is specified here -->
<meta name="DC.publisher" content="Bookglutton Digital Press" />

<!-- you can have as many rights coverage items as you want -->
<meta name="DC.coverage" content="Global" />

<!-- there are many other kinds of data you can specify with Dublin Core.
     please request that we add something if you don't see it here. for now
     these are the only values we support with this converter -->


<link rel="stylesheet" type="text/css" href="css/style.css" />

</head>
<body>

<h1>Table of Contents</h1>

<p>This is the table of contents. Reading systems may look in this file for an ordered list element at the top level, which serves as a machine-readable, structured list of navigation points. Other elements and text in this file may be ignored by reading systems. This file will not be included in the EPUB's structural navigation or spine unless you explicitly include a self-reference to it here.</p>

<!-- In the following list, each list item corresponds to a navPoint element in its functionality. Within it, an anchor link element is used, and this corresponds both to the content and label elements of the navPoint. In the anchor, the text contained between tags represents the content of a navPoint label, and the value of href attribute is equiv. to the value of the src attribute on the navPoint's content element. 

required:

1. each li must contain one anchor
2. each li may also contain one ol but is not required to
3. li containing an ol must have class set to 'section'
4. li containing no ol must have class set to 'chapter'

the example below is not the simplest case, but it shows how to
use sections as well as chapters. you can nest as deep as you want.

we recommend you use a flat structure (no nesting ) for maximum
reading-system compatibility.

-->
<!--

*markup inside anchors will not display

-->

<ol class="toc">
      <li class="section">
        <a href="ch1.html">Part 1</a>
        <ol>
             <li class="chapter">
                  <a href="ch1.html#top">Top</a>
             </li>
             <li class="chapter">
                  <a href="ch1.html#bottom">Bottom</a>
             </li>
        </ol>
      </li>
      <li class="section">
        <a href="ch2.html">Part 2</a>
        <ol>
             <li class="chapter">
                  <a href="ch2.html#top">Top</a>
             </li>
             <li class="chapter">
                  <a href="ch2.html#bottom">Bottom</a>
             </li>
        </ol>
      </li>
      <li class="chapter">
        <a href="ch3.html">Part 3</a>
      </li>
</ol>

<!-- 
     any other ordered lists will be ignored by the converter.
     only the first one in the document is used to specify structure.
-->

</body>



</html>


	

Validation

Use the W3C validator to validate all the html docs in your archive.

We no longer filter input files with tidy, so results will vary if run against an epub validator like epubcheck. The best way to ensure validation is to pass XHTML 1.1 as input (though reading systems should be tolerant of epub files with relaxed XHTML). We do appreciate your feedback for improving the tool's compliance. See the EPUB specifications for more details.

API

We do still provide this as a remote API, for those interested. For batches, we offer in-house and custom production solutions. If you're interested, contact BookGlutton from the footer link below.