Digitizing Initiatives
 
 
Methods and Costs
From 2005:
  
There were, as of 2005, three mass productions options
  for scanning books according to Dustin Goot in a Wired article titled
  3 Ways to Scan a Library. The first method is to remove the
  spines and use "machines [that] cost $25,000 and churn through 90
  black-and-white pages per minute, front and back." Second, libraries
  can have "workers in India, China, and the Philippines earn about
  40 cents an hour to manually turn pages that are zapped by $15,000 overhead
  scanners... Carnegie Mellon's Million Book Project [see above] alone employs
  more than 100 Indians for this activity" Third, libraries or publishers
  can employ automated systems such as Kirtas
  Technologies' automated system that scans 1,200 pages per hour from
  bound books.
  -  
  
- When the end purpose of digitization is the publishing of converted
  material onto the Internet, art books and journal articles present a special
  challenge for conversion of the analog material to digital files. Since
  images of art objects are frequently embedded in pages containing text,
  .pdf digital output for Internet publication is usually not feasible due
  to the time and expense needed for copyright clearance of the images of
  art objects.
  
-  
  
- In addition to the mass scanning methods, organizations can manually
  scan text in bound books page by page. Resource Library recently
  estimated that the time required to manually scan, delete artwork images,
  proofread and convert to HTML a 400 word page of text in a bound book averages
  6 minutes per page, maintaining 99.995% accuracy. The combined direct labor
  cost is estimated at $2.50 per page, or $25 per hour for 10 pages. A 5,000
  word essay would therefore cost $32 to process in direct labor cost. Capital
  equipment and overhead costs need to be added to direct labor costs to
  arrive at total cost. (see the Content presentation
  guidelines from Resource Library
  for further information on its text presentation conventions)
  
-  
  
- In 2006, TFAO conducted research to assess the feasibility of outsourcing
  its text conversion process to service bureaus. TFAO would provide final
  processing into .htm files for online publication. Assumptions for quotes:
      
  - estimate of a minimum input of 3 documents per week x 36 weeks / year;
  each bound document = 2,000 to 100,000 words. Average 10,000 words. All
  source document printing is black ink on white paper.
  
 - output accuracy at 99.995%.
  
 - paper documents in good condition (not fragile): 1. sent by sources
  direct to service bureau for processing, or, 2. scanned at museum and converted
  there to .pdf files, or .pdf and .doc files. For a .PDF image of a sample
  paper source document see this page with a link to a .pdf of a 50 page
  catalogue: < http://www.tfaoi.org/aa/6aa/6aa418.htm >.
  
 - if documents scanned at a service bureau, all source documents in English
  and scannable on a 8 1/2 x 14 inch scanner.
  
 - batch processing ok with quarterly turnaround
  
 - source paper documents not returned to sources or to TFAO -- if sent
  to a service bureau by a museum
  
 - output .doc file formatted to Resource Library text presentation
  conventions and a .pdf file showing the image of each source document page
  both sent by email to TFAO
 
  - For a discussion on the costs related to reading of "open access
  publishing" vs. subscription based articles see "The
  Cost per Article Reading of Open Access Articles" by Jonas Holmström,
  Research Assistant, Swedish School of Economics and Business Administration.
  
-  
  
- For a comparison of costs involved with operating a paper vs. virtual
  library see "Comparing
  Library Resource Allocations for the Paper and the Digital Library"
  by Lynn Silipigni Connaway, Research Scientist, Office of Research, OCLC
  Online Computer Library Center, Inc. and Stephen R. Lawrence, Associate
  Professor of Operations Management, Leeds School of Business, University
  of Colorado. Also see "The
  Return on Investment of Electronic Journals - It Is a Matter of Time"
  by Jonas Holmström, Swedish School of Economics and Business Administration,
  Helsinki, Finland
  
-  
  
- At an image resolution of 300 to 500 dpi. Kirtas estimated in 2005
  that their automated method costs "as low as $.03" per page ($36
  per hour), while manual scanning, at a rate of 100 to 150 pages per hour,
  costs "$.35 to $1.50" per page. (This cost quote is probably
  not applicable to Resource Library's text conversion and text presentation
  conventions requiring 99.995% accuracy.)
  
-  
  
- A November 9, 2005 Wall Street Journal article by David Kesmodel
  and Vauhini Vara discussed costs connected with the book digitizing program
  of Internet Archive, a San Francisco
  nonprofit group that is spearheading the Open Content Alliance, a consortium
  of business and educational groups. Employees manually scan out of copyright
  books in five-hour shifts, four times a week. Pay is just over $10 per
  hour. The article says that the Archive has digitized around 2,800 books,
  at a cost of about $108,000, which is $38.50 per book. It costs "about
  10 cents a page to get a book online, taking into account equipment, labor
  and the cost of hosting the pages on the Internet Archive's Web servers."
  Each special scanning machine costs $20,000 to $40,000. It takes around
  one hour to scan 500 pages or about 8 1/3 pages per minute. (This cost
  quote is probably not applicable to Resource Library's text conversion
  and text presentation conventions requiring 99.995% accuracy.)
  
-  
  
- A December 12, 2005 article in the Wall Street Journal by Jeffrey
  A. Trachtenberg and Kevin J. Delaney said that a major publisher was recently
  told that "it costs as much as 10 cents per page to scan, digitize
  and tag a book, which means a 300-page novel would cost $30." (This
  cost quote is probably not applicable to Resource Library's text
  conversion and text presentation conventions requiring 99.995% accuracy.)
  
-  
  
- A December 14, 2004 announcement
  by Google that the firm will collaborate with institutional libraries to
  digitize large quantities of books spawned numerous articles in the media.
  Digitizing expenses were quoted from $10 to $20 per book. For instance,
  a December 14, 2004 Reuters article by Lisa Baertlein titled "Google
  Bets Big on Bringing Libraries to Web" said "Librarians and non
  profits already involved in scanning books for other projects say it costs
  around $20 to do a 300-page book, but that the cost should soon fall to
  around $10 per book." At $20 that is 7 cents per page and at $10 it's
  3 cents. (This cost quote is probably not applicable to Resource Library's
  text conversion and text presentation conventions requiring 99.995%
  accuracy.)
  
-  
            
From 2006:
  - During 2006 TFAO received quotes from firms to provide text conversion
  service.
  
-  
  
- One firm's subcontractor offered 99.995% accuracy with pricing for
  .doc output files to be: 
  
-  
  
- -- Bound bitone scanning up to 8.5" x 11" = $0.72/each
  
- -- Bound bitone scanning up to 11" x 17" = $1.02/each
  
- -- OCR bitone images = $0.18/each
  
- -- Proofing and formatting = $1.17 per 1,000 characters (later reduced
  to 80 cents in a 2007 requote)
  
- -- CD-R masters = $10.00/each (optional)
  
- -- Shipping = at cost
  
-  
  
- Assuming a 10,000 word essay with 5.3 characters per word, there would
  be 53,000 characters in the document. The (2007) proofreading cost = $42.40.
  If there are 600 words per page the scanning = $12. Adding a CD-R master
  brings the total cost to $64.40
  
-  
  
- Firms quoted for proofreading and formatting service only for an equivalent
  document $42, $156 and $200.
  
-  
  
- For proofreading there are a number of specialty specialty service
  bureaus For example, Canyouproofthis.com
  charges a minimum of $50 as of November, 2006. They provide an online
  rate calculator. Wordsru.com provided
  an "instant estimate" of $78 for a 5,000 word document.
                
 
From 2007:
  - During 2007 TFAO received a quote for scanning, formatting, proofreading
  and emailing of a resultant .doc file at 80 cents per 1,000 characters.
  The source has a $100 minimum, so for maximum efficiency, TFAO would send
  to the contractor 125,000 characters, equivalent to 23,500 words of text.
  
-  
  
- A sample AAR article converted in 2007 has 1,840 words in five pages,
  or 368 words per page. At that rate for AAR articles to maximize use of
  $100 minimum, 23,500 words divided by 368 words per page = 64 pages needed.
  
-  
  
- These quotes are based on adherence to TFAO's text
  presentation conventions.
  
-  
  
- rev. 6/4/07
  
       
Go to:
  - Commercial Ventures
  
- The eBook future
  
- Related Non-Profit Organizations 
  
- Methods and Costs
  
- Notes
     
back to start of Digitizing Initiatives
 
Individual pages in this study will be
amended as TFAO adds content, corrects errors and reorganizes sections for
improved readability. Refreshing or reloading pages enables readers to view
the latest updates. Links to sources of information outside of our web site
are provided only as referrals for your further consideration. Please use
due diligence in judging the quality of information contained in these and
all other Web sites and in employing referenced consultants or vendors.
Information from linked sources may be inaccurate or out of date. Traditional
Fine Arta Organization, Inc neither recommends or endorses these referenced
organizations. Although Traditional Fine Art Organization, Inc. includes
links to other web sites, it takes no responsibility for the content or
information contained on those other sites, nor exerts any editorial or
other control over those other sites. For more information on evaluating
web pages see Traditional Fine Arts Organization, Inc.'s General Resources section in Online Resources for Collectors and Students of
Art History.
Search
Resource Library for thousands of articles and essays on American
art.
Copyright 2012 Traditional Fine Arts Organization, Inc., an Arizona nonprofit corporation. All rights
reserved.