Digitizing Initiatives
Methods and Costs
From 2005:
- There were, as of 2005, three mass productions options
for scanning books according to Dustin Goot in a Wired article titled
3 Ways to Scan a Library. The first method is to remove the
spines and use "machines [that] cost $25,000 and churn through 90
black-and-white pages per minute, front and back." Second, libraries
can have "workers in India, China, and the Philippines earn about
40 cents an hour to manually turn pages that are zapped by $15,000 overhead
scanners... Carnegie Mellon's Million Book Project [see above] alone employs
more than 100 Indians for this activity" Third, libraries or publishers
can employ automated systems such as Kirtas
Technologies' automated system that scans 1,200 pages per hour from
bound books.
-
- When the end purpose of digitization is the publishing of converted
material onto the Internet, art books and journal articles present a special
challenge for conversion of the analog material to digital files. Since
images of art objects are frequently embedded in pages containing text,
.pdf digital output for Internet publication is usually not feasible due
to the time and expense needed for copyright clearance of the images of
art objects.
-
- In addition to the mass scanning methods, organizations can manually
scan text in bound books page by page. Resource Library recently
estimated that the time required to manually scan, delete artwork images,
proofread and convert to HTML a 400 word page of text in a bound book averages
6 minutes per page, maintaining 99.995% accuracy. The combined direct labor
cost is estimated at $2.50 per page, or $25 per hour for 10 pages. A 5,000
word essay would therefore cost $32 to process in direct labor cost. Capital
equipment and overhead costs need to be added to direct labor costs to
arrive at total cost. (see the Content presentation
guidelines from Resource Library
for further information on its text presentation conventions)
-
- In 2006, TFAO conducted research to assess the feasibility of outsourcing
its text conversion process to service bureaus. TFAO would provide final
processing into .htm files for online publication. Assumptions for quotes:
- estimate of a minimum input of 3 documents per week x 36 weeks / year;
each bound document = 2,000 to 100,000 words. Average 10,000 words. All
source document printing is black ink on white paper.
- output accuracy at 99.995%.
- paper documents in good condition (not fragile): 1. sent by sources
direct to service bureau for processing, or, 2. scanned at museum and converted
there to .pdf files, or .pdf and .doc files. For a .PDF image of a sample
paper source document see this page with a link to a .pdf of a 50 page
catalogue: < http://www.tfaoi.org/aa/6aa/6aa418.htm >.
- if documents scanned at a service bureau, all source documents in English
and scannable on a 8 1/2 x 14 inch scanner.
- batch processing ok with quarterly turnaround
- source paper documents not returned to sources or to TFAO -- if sent
to a service bureau by a museum
- output .doc file formatted to Resource Library text presentation
conventions and a .pdf file showing the image of each source document page
both sent by email to TFAO
- For a discussion on the costs related to reading of "open access
publishing" vs. subscription based articles see "The
Cost per Article Reading of Open Access Articles" by Jonas Holmström,
Research Assistant, Swedish School of Economics and Business Administration.
-
- For a comparison of costs involved with operating a paper vs. virtual
library see "Comparing
Library Resource Allocations for the Paper and the Digital Library"
by Lynn Silipigni Connaway, Research Scientist, Office of Research, OCLC
Online Computer Library Center, Inc. and Stephen R. Lawrence, Associate
Professor of Operations Management, Leeds School of Business, University
of Colorado. Also see "The
Return on Investment of Electronic Journals - It Is a Matter of Time"
by Jonas Holmström, Swedish School of Economics and Business Administration,
Helsinki, Finland
-
- At an image resolution of 300 to 500 dpi. Kirtas estimated in 2005
that their automated method costs "as low as $.03" per page ($36
per hour), while manual scanning, at a rate of 100 to 150 pages per hour,
costs "$.35 to $1.50" per page. (This cost quote is probably
not applicable to Resource Library's text conversion and text presentation
conventions requiring 99.995% accuracy.)
-
- A November 9, 2005 Wall Street Journal article by David Kesmodel
and Vauhini Vara discussed costs connected with the book digitizing program
of Internet Archive, a San Francisco
nonprofit group that is spearheading the Open Content Alliance, a consortium
of business and educational groups. Employees manually scan out of copyright
books in five-hour shifts, four times a week. Pay is just over $10 per
hour. The article says that the Archive has digitized around 2,800 books,
at a cost of about $108,000, which is $38.50 per book. It costs "about
10 cents a page to get a book online, taking into account equipment, labor
and the cost of hosting the pages on the Internet Archive's Web servers."
Each special scanning machine costs $20,000 to $40,000. It takes around
one hour to scan 500 pages or about 8 1/3 pages per minute. (This cost
quote is probably not applicable to Resource Library's text conversion
and text presentation conventions requiring 99.995% accuracy.)
-
- A December 12, 2005 article in the Wall Street Journal by Jeffrey
A. Trachtenberg and Kevin J. Delaney said that a major publisher was recently
told that "it costs as much as 10 cents per page to scan, digitize
and tag a book, which means a 300-page novel would cost $30." (This
cost quote is probably not applicable to Resource Library's text
conversion and text presentation conventions requiring 99.995% accuracy.)
-
- A December 14, 2004 announcement
by Google that the firm will collaborate with institutional libraries to
digitize large quantities of books spawned numerous articles in the media.
Digitizing expenses were quoted from $10 to $20 per book. For instance,
a December 14, 2004 Reuters article by Lisa Baertlein titled "Google
Bets Big on Bringing Libraries to Web" said "Librarians and non
profits already involved in scanning books for other projects say it costs
around $20 to do a 300-page book, but that the cost should soon fall to
around $10 per book." At $20 that is 7 cents per page and at $10 it's
3 cents. (This cost quote is probably not applicable to Resource Library's
text conversion and text presentation conventions requiring 99.995%
accuracy.)
-
From 2006:
- During 2006 TFAO received quotes from firms to provide text conversion
service.
-
- One firm's subcontractor offered 99.995% accuracy with pricing for
.doc output files to be:
-
- -- Bound bitone scanning up to 8.5" x 11" = $0.72/each
- -- Bound bitone scanning up to 11" x 17" = $1.02/each
- -- OCR bitone images = $0.18/each
- -- Proofing and formatting = $1.17 per 1,000 characters (later reduced
to 80 cents in a 2007 requote)
- -- CD-R masters = $10.00/each (optional)
- -- Shipping = at cost
-
- Assuming a 10,000 word essay with 5.3 characters per word, there would
be 53,000 characters in the document. The (2007) proofreading cost = $42.40.
If there are 600 words per page the scanning = $12. Adding a CD-R master
brings the total cost to $64.40
-
- Firms quoted for proofreading and formatting service only for an equivalent
document $42, $156 and $200.
-
- For proofreading there are a number of specialty specialty service
bureaus For example, Canyouproofthis.com
charges a minimum of $50 as of November, 2006. They provide an online
rate calculator. Wordsru.com provided
an "instant estimate" of $78 for a 5,000 word document.
From 2007:
- During 2007 TFAO received a quote for scanning, formatting, proofreading
and emailing of a resultant .doc file at 80 cents per 1,000 characters.
The source has a $100 minimum, so for maximum efficiency, TFAO would send
to the contractor 125,000 characters, equivalent to 23,500 words of text.
-
- A sample AAR article converted in 2007 has 1,840 words in five pages,
or 368 words per page. At that rate for AAR articles to maximize use of
$100 minimum, 23,500 words divided by 368 words per page = 64 pages needed.
-
- These quotes are based on adherence to TFAO's text
presentation conventions.
-
- rev. 6/4/07
Go to:
- Commercial Ventures
- The eBook future
- Related Non-Profit Organizations
- Methods and Costs
- Notes
back to start of Digitizing Initiatives
Individual pages in this study will be
amended as TFAO adds content, corrects errors and reorganizes sections for
improved readability. Refreshing or reloading pages enables readers to view
the latest updates. Links to sources of information outside of our web site
are provided only as referrals for your further consideration. Please use
due diligence in judging the quality of information contained in these and
all other Web sites and in employing referenced consultants or vendors.
Information from linked sources may be inaccurate or out of date. Traditional
Fine Arta Organization, Inc neither recommends or endorses these referenced
organizations. Although Traditional Fine Art Organization, Inc. includes
links to other web sites, it takes no responsibility for the content or
information contained on those other sites, nor exerts any editorial or
other control over those other sites. For more information on evaluating
web pages see Traditional Fine Arts Organization, Inc.'s General Resources section in Online Resources for Collectors and Students of
Art History.
Search
Resource Library for thousands of articles and essays on American
art.
Copyright 2012 Traditional Fine Arts Organization, Inc., an Arizona nonprofit corporation. All rights
reserved.