Converting PDF to ePub: A Christmas Carol

POSTER A Christmas CarolHow many times did you purchase or order an ebook and then find yourself leafing through a book full of gaps? Let me explain: Have you ever noticed how many small caps and italics are losing their tracks? Not to mention the words or letters in the jump summaries, links to incorrect page numbers campati here and there in the text flow… Well, if now the paranoia has taken hold of you, you are ready to read the rest of the post.
These typos would never be accepted in a paper book, not to flee careful proofreading… but unfortunately not uncommon in the eBook captain (I speak for both readers for publishers).
Starting from converting a PDF to ePub, a non-editable pdf to be precise, I will show you what you may be able to obtain such results, and especially to say “no thanks”.

To this was used as a file starting a text on the Internet Archive .

[spoiler show=”more” hide=”less”]The mission of this digital library is to provide “universal access to knowledge,” and thanks to the tremendous work done can be enjoyed a multitude of masterpieces. I say this because my intention is not to say that their work has been done wrong, or rather: their is a nonprofit organization that gives us the ability to access from any location around the globe with their huge archive. So if the files they make available are not flawless little bad, what matters is their end. But the errors found in their ranks are the most common ones that are purchased in ebook…[/spoiler]

The pdf of departure to illustrate the work is to A Christmas Carol (available for view or download free ).
In the following pictures you can see on the left of the screen in Adobe Digital Editions , dell’epub realized by the Internet Archive; on the right ePub made ​​by point-sharp.


[singlepic id=16 w=540 float=none]

The cover image was not treated properly. This means that the thumbnail in the top left nell’ePub is only part of the cover and also the image displayed by the program where they will flow the text, the cover is cut and not fully displayed. But with a little trick here is that you can easily solve (see right).

[singlepic id=2 w=540 float=none]


[singlepic id=15 w=540 float=none]

In this case the image of the cover page that has been left to which that output from the scan (with the sepia color of the card). In the display ePub though it makes no sense to keep that color, especially thinking about an e-ink screen (like the Kindle and Sony). Even in this case just a ‘image optimization to ensure full clarity of content.

[singlepic id=3 w=540 float=none]


[singlepic id=5 w=540 float=none]

In the case of index figures have been committed, as well as some classics by OCR errors, including errors such as failure to preserve the formatting (for example small caps and italics) and above was not exploited the ‘interactivity dell’epub entering direct links to images . A further caveat (right image), making it even more enjoyable to read is to use a particular font as what I used.

[singlepic id=6 w=540 float=none]


[singlepic id=4 w=540 float=none]

In these images, I simulated the game of the differences in green you can see the most obvious differences between the two versions. In addition to the font and color of the image, I kept the formatting and headers mistakenly deleted the OCR maintained.

[singlepic id=1 w=540 float=none]


[singlepic id=12 w=540 float=none]

A little touch of style: infratesti styles often are destroyed and no longer understands the difference between paragraph and sub-text … and reading becomes difficult. But that’s just extra attention and the text resumes its original shape.

[singlepic id=11 w=540 float=none]


[singlepic id=15 w=540 float=none]

One of the points for the ebook than the paper is the presence of tool “search”. But if the text is embedded in an image, as in the case of the photograph in the lower left, our research will be fruitless. And then dealing with the caption as text from the image by exporting the problem is obvious. Then an intervention for a lighter board and consistent image and the eye is satisfied.

[singlepic id=17 w=540 float=none]

The images, if treated properly, are a big plus in a book. But if you are mortified and appear badly distort it and also risk becoming a burden (both terms of a kilo / Megabytes than annoyance). Cleaned up and redesigned for the new format can restore the “lost dignity.”

[singlepic id=13 w=540 float=none]

Last but not least… The ePub should be valid for not generate problems reading the various e-readers. And this is the litmus test for ePub: prove that the house of cards (digital) hold. And guess what ‘the outcome dell’ePub realized by point-acute? 🙂

ePub Valid

ePub Valid