Choosing between PDF Clown and iText (by IT Curious)

Friday, 10 April 2009

Choosing between PDF Clown and iText

I was looking for a Java library to manipulate files using the PDF (Portable Document File) format. The two obvious possibilities seemed to be:

PDF Clown http://www.stefanochizzolini.it/en/projects/clown/
iText http://www.lowagie.com/iText/

BTW This isn't a detailed comparison but a general impression of the two products (a gut feeling)...Hmm decisions, decisions....

I downloaded both JARs and checked out the documentation. I soon realised that there was a big difference in the quantity of documentation available (iText has a lot more). Which makes sense since Bruno Lowagie (iText creator) has been working on the library since about 1999 while PDF Clown is relatively new.

I started out by installing the Jar files in Eclipse and trying out a simple example. Unfortunately I seemed to have a small problem with the PDF Clown example (SerializationModeEnum.Compact mentioned in the UserGuide didn't seem to exist) while the iText example worked fine.

I also then read the following note about PDF Clown: "This project is young, so that it's to be considered UNSTABLE. Feel free to experiment with it, but DO NOT use it in a production environment (so: beware!)" at: http://www.stefanochizzolini.it/en/projects/clown/downloads.html#License Needless to say it didn't really fill me with confidence for PDF Clown ;-)

On the other hand iText has been used successfully by numerous commercial and open source applications: Macromedia ColdFusion (now belongs to Adobe), Jasper-Reports, Eclipse/BIRT, Google Calendar, etc.

For these reasons I decided to go with iText.

The main inconvenient with iText is that it doesn't properly implement the MVC (Model-View-Controller) pattern, meaning that it doesn't give the possibility to properly separate content and presentation.

The fact that I chose iText does not mean that PDF Clown is without merit: I like the object-oriented approach Stefano Chizzolini has taken. I simply feel that the library needs a bit more time before truly becoming mature.

If following this article people are interested in using iText I would advise to get Bruno Lowagie's book "iText in Action": http://www.1t3xt.com/docs/book.php which is pretty much a definite guide.

Please bookmark, your votes are noticed and appreciated:

10 comments:

Unknown13 April 2009 at 18:57
Hi Martin

I'm PDF Clown's lead developer.

Your review seems quite objective (although, as you fairly explained, it's not an in-depth analysis).

I'd just like to complete it with some considerations.

1) Instability
I have to admit that my disclaimer sounds too much worrisome than should be in common sense. In order to properly interpret the term 'UNSTABLE', you have to consider what it really means in the open-source community: it does not necessarily mean that there are problems — rather, that enhancements or changes have been made to the software that have not undergone rigorous testing and that more changes are expected to be imminent (see [1]).
I use 'UNSTABLE' to inform users that the library is in development stage, therefore its API could change to harmonize with its future evolution and so I cannot guarantee they will have no need to update their own code to keep up with next releases. Anyway, PDF Clown's evolution is typically incremental (NOT disruptive), so no one has to worry about major changes that may cause their code to be thrown! :-)
Furthermore, I wanna stress that I always meticulously take care to release only well-tested versions, excluding any common-case bug. Actually, reported bugs have typically been about fringe cases, i.e. particular uses that touched the implementation frontier (the 'hic sunt leones' of the library domain, the boundary between what has already been implemented and what hasn't yet).

2) Documentation
As you pointed out, PDF Clown's documentation is still partial: I appreciate any suggestions about the most important topics to treat from a user's point of view.

3) Examples
Whenever you stumble upon some problems, please let me know applying to the public forums (see [3]) so that I can correct them or suggest you a correct approach. I take into consideration any user request.

4) Comparison
PDF Clown is much younger than iText, so it's just a matter of fact that it's not as mature; on the other hand, its newness allows it to stand on a vantage point, taking care to avoid some design and implementation pitfalls that may have encountered previous projects.

Despite both dealing with the Portable Document Format, PDF Clown and iText originate from different philosophies and approaches.
iText was initially conceived as a multi-format (PDF along with HTML, RTF etc.) generator, analogous to some other efforts like standard XSL-FO engines (see Apache FOP [2] as a popular implementation); later it was retrofitted with editing capabilities, such as encryption, annotations and a whole bunch of nice things.
PDF Clown has been designed from scratch to smoothly combine generation, reading and editing capabilities framed inside a cohesive, robust and flexible model. Just to mention latest developments, I'm currently working on the text extraction capability (you cannot find it in iText) that will allow users to retrieve page text along with rich information about its graphic location (coordinates) and style (font, font size, font color and so on). This isn't a retrofit: it's just a coherent result of what I envisioned since the beginning, as it works upon a common versatile set of layers which serves disparate functionalities.
Flexibility and simplicity reveal themselves also when you need to extend the library; I can cite, for example, the case of an IT Senior Consultant who needed to extract some file attachments from a pdf document but found a particular codec filter was missing from current PDF Clown's implementation. Well, it took him just a few hours to figure out how to implement a new codec filter and send me his contribution (by the way, it will be part of next release (0.0.8)).

This is obviously not the appropriate context to apply a neutral critique on the respective merits of PDF Clown or iText; I just wanna suggest users to compare the consistency of their object models, their flexibility and the cleanliness of their designs, looking from both a black-box perspective (the API usability) and a white-box perspective (the code implementation behind the API).

Concluding, due to its maturity, iText is undisputably winner about the richness of its feature set (for example, you cannot find encryption or digital signature support in PDF Clown); nonetheless, if you are also concerned with solution elegance (for example: fully object-oriented traversal of pdf document contents and metacontents to perfom advanced reading/editing operations without awkward tricks), I suggest you get PDF Clown a try.

Thank you
Stefano

[1] http://en.wikipedia.org/wiki/Software_release_life_cycle#Stable_or_unstable
[2] http://xmlgraphics.apache.org/fop/
[3] http://sourceforge.net/forum/forum.php?forum_id=607163
ReplyDelete
Replies
Unknown13 April 2009 at 19:12
Correction: iText has recently added a text extraction feature.

Stefano
ReplyDelete
Replies
IT Curious14 April 2009 at 00:29
Hi Stefano

It is a great honour to get a comment from one of the two original authors :-)

My comparison was based on the present state of the two libraries. The fact that for example PDF Clown has not reached version 1.0. With time I am sure that choosing between the two will become more and more difficult.

I have got to say that in both cases I was touched by the generosity of the approaches and believe that both libraries are very useful to the Java community.
ReplyDelete
Replies
Jack18 March 2011 at 19:58
Thanks !
ReplyDelete
Replies
lauren19 March 2011 at 11:10
I am use to iText and feel that it serves the purpose very well although I never used PDF clown but from your blog i can get a good impression of this.I too feel that drawback of iText is that it doesn't properly implement the MVC (Model-View-Controller) pattern.
electronic signature pdf
ReplyDelete
Replies
Ondrej Medek30 March 2011 at 08:52
Hi, the problem if iText is also in its license. Now it uses AGPL licence. And the commercial license is a little bit expensive. Many open source tools like JasperReports do have to use old version of iText (2.1.7) since it was the last release under Apache 2.0 license. So, the Apache version iText is a kind of "unmaintained".

PDF Clown can be a iText replacement for those who do not want to use AGPL iText version.
ReplyDelete
Replies
Unknown19 September 2011 at 16:59
Wow, I can't believe Stefano commented on your post! Do you even realize how cool that is? He's like one of the leads in converting xml to pdf. I wish I could invite him to my birthday party.
ReplyDelete
Replies
dpr3 January 2016 at 12:26
This discussion was started on 2009. It is 2016. So I thought I might add something.

PdfClown 0.1.2 is great. It is better than iText# 5.5.6 in respect to text extraction facility, which I used mostly.

You could convert a Pdf to Html in almost exact replication.
ReplyDelete
Replies
Email-Helpline-Number-UK16 March 2020 at 05:49
If it is about getting access to Gmail inbox then in that case, it is advisable to open the official Gmail website using the web browser further the user should enter the username and password then the user should click “sign in.” If needed then for further information or help the user should get connected with the certified Gmail experts.
Gmail Customer Care UK
ReplyDelete
Replies

Add comment

IT Curious

Friday, 10 April 2009

Choosing between PDF Clown and iText

10 comments:

Meetic

Blog Archive

Keywords

Followers