Date: Tue, 03 Aug 93 16:32:46 PDT
From: gcosimin@mv.us.adobe.com
Message-Id: <9307037444.AA744420766@signum.mv.us.adobe.com>
To: Richard W Wiggins <WIGGINS@msu.edu>
Subject: Re: Hello World.PDF
Hello all (7k)
This is an interesting series of comparisions. As the unofficial "rule of
thumb" author, I'd like to toss in a few crumbs.
PDF overhead for a single page document seems to run about 1 kilobyte if no
fonts beyond the "LaserWriter 14" base set are used. You have to start
somewhere if you want structure! Acrobat PDF files are composed of a series of
objects accessed by an outline or tree at the end of each PDF file. The
Portable Document Format Reference Manual (ISBN 0-201-62628-4; $24.95)
published by Addison-Wesley describes the PDF format in great detail and I
highly recommend it.
The PDF file representation is fairly straightforward. There are several rules
of thumb for determining PDF file size, but as you might expect they depend on
the nature of the PostScript being converted. Once you have some idea of the
contents of the source PostScript, it's not hard to deduce how large the
resulting Acrobat file is going to be. The best way to learn how to do this is
to a) read the book and b) distill some sample files and look at them. I
suggest turning off LZW compression from time to time if you want a a full
exposition of the page contents as converted into PDF.
The simplest rule of thumb for file sizes is that conversion of text pages to
PDF adds about 25% overhead for the PDF form of PostScript. Since PDF files are
static descriptions of pages, no header need accompany evry file, as with
PostScript pages which are dynamic: each PostScript page is a program, but
Acrobat pages are simply descriptions.
When the PDF created from simple text is saved with Distiller's default LZW
compression, a savings of 50% is usually realized. Therefore, <<RULE OF THUMB>>
a PDF file of straight text is about 60% of the uncompressed ASCII original.
This includes the trivial overhead for structure described above.
Now, if the PostScript created by some application is incredibly inefficient
(can't happen!), then the PostScript (and therefore the PDF) file might deviate
somewhat from the norm. My rule of thumb applies to text created from typical
programs that print through Mac or Windows printer drivers and most desktop
publishing applications.
When one uses fonts creatively, or adds graphics or scanned images, file size
grows in a predicatable proportion. In general, for each font with ISO-Latin1
encoding used in a document, add 2 kilobytes for font metric data. Embedded
fonts such as symbolic fonts or iuntenmtionally embedded fonts add about 30
kilobytes, the size of the font outline program, for each font so used.
Vector graphics, whether imaged as Encapsulated PostScript or pasted into an
application from QuickDraw or GDI, are expressed in PDF in a format very
similar to that used in the Abobe Illustrator 3.0 file format, a very compact
way of representing such data. The vector information is then LZW compressed by
default, again resultinmg in a 2:1 reduction, generally.
Raster images are compressed and/or resampled according to the parameters set
in the Acrobat Distiller, and resulting object sizes are exactly what you'd
expect.
For more detailed explanations, please refer the Manual mentioned above. I'd
also be happy to field any specific questions or forward them the appropriate
party at Adobe.
Sincerely,
Gary Cosimini
Business Development Manager, Publishing Market
Adobe Systems NY
gcosimin@adobe.com
-----------------------------------------------------------------------------
>Message was resent -- Original recipients were:
To: www-talk@nxoc01.cern.chCc: Paul Holbrook <holbrook@cic.net>, uri@bunyip.com-
-
-----------------------------------------------------------------------------
An Acrobat newsgroup should be formed for discussion and rumor
control. Alt.acrobat would be a good start.
/rich
----------------------------Original message----------------------------
This is hardly a fair metric! It's like saying a C compiler is
inefficient 'cause it produces binaries that are 1200K in size
just to say Hello World, so let's use Microsoft Basic instead.
I have taken a 5 page document including a couple of diagrams and
run it through the Distiller. The resulting file was 57K in size.
The original file in MS Word was 82K. A flat Postscript version of
the same document was 118K. Word counts 13.5K words in the document;
when saved as flat ASCII text it's 14K long.
The point is that you cannot extrapolate from a trivial example to
anything meaningful. A rule of thumb from Adobe is that documents
without a lot of fancy diagrams are only about 25% larger than the
underlying flat ASCII files; I haven't measured that with text-only
documents.
And of course the stuff is not human readable -- that's not the purpose,
and it'd be impossible to provide this kind of functionality in
readable text. JPEG files aren't readable either.
The Acrobat Starter kit is relatively cheap, especially for educational
institutions. I've begun playing with the stuff and feel I've got a
lot to learn before any evaluations can be made. I'd urge other sites
to get hold of the kit or wait for careful evaluations in the trade
rags before jumping to conclusions.
/Rich Wiggins, Gopher Coordinator, Michigan State U
PS -- Has an Acrobat news group started anywhere? Rather than spilling
this discussion into a bunch of newsgroups that already have plenty
to talk about, should we find one or create alt.acrobat?
----------------------------Original message----------------------------
Below is a PDF file I created on a Macintosh. I did this because many
people probably haven't seen the heart of the PDF beast yet ;> It doesn't
get much simpler. All the file contained was "Hello World" without the
quotes to be printed on an 8.5" x 11" page. I made the typeface Courier, so
that the PDF file wouldn't contain lots of font metric information, if I
would have used a TrueType font the file would have been much larger (about
2500 bytes for a scaleable font like TrueType Geneva). The point here is
that PDF is not simple, it is like a PostScript superset, oriented towards
presentation, not content. The files are fairly large; in this example the
11 characters of Hello World turned into a 942 character PDF file, almost
100 times larger than the original, of course I didn't use the LZW encoding
for the text :). Like PostScript, this is not very human readable.