There are many benefits to digitizing your collections, such as increasing usage and access as well as creating captivating marketing materials to publicize your institution. If deciding to take “matters of digitization” into your own hands, you want to ensure you follow archival standards and guidelines.
The North Carolina Digital Heritage Center (DigitalNC) is a great resource. It provides guidelines for scanning specifications (tools, specs, etc.), file storage and organization, and workflow procedures. You may use the information below to create your own digitization procedures.
Most of the content below has been taken directly from the DigitalNC’s Digitization Guidelines and reflects the processes they implement when digitizing collections. It has been altered slightly for general use. For more information regarding digitization services and best practices, please contact a representative of the Digital Heritage Center.
Scanning Specifications
These scanning specifications were chosen to accommodate the Digital Heritage Center’s digitization philosophy while also adhering to commonly accepted imaging best practices, particularly those developed by the Federal Agencies Digitization Guidelines Initiative.
| Item Type | PPI | Hardware | Example Item |
|---|---|---|---|
| Photographs | 600 | Epson Expression 10000XL | Classmates in Albemarle |
| Negatives or Slides | 1200-1600 | Epson Expression 10000XL | Horse-Drawn Carriage in Parade |
| Fragile Looseleaf or Bound Items | 400 | Zeutschel | Copy of Robert E. Lee’s Farewell Address |
| Very Fragile and/or Oversized Items | 300 minimum | PhaseOne | The Full Moon |
| Uniform and Sturdy Looseleaf | 300 | Fujitsu Sheetfed Scanner | Hickory Library Business Vertical Files |
| Bound and Sturdy Publications | 300-450 | Internet Archive Scribe Book Scanner | Lamp and Shield |
| 3-D Objects | 300 minimum | Nikon Digital Camera | Child’s Doll |
The Digital Heritage Center contracts to digitize microfilm off site. Contact them directly for more details about this process.
File Formats Used by the Digital Heritage Center
Below are the file formats most typically created and retained at the Digital Heritage Center:
Access copies:
- JPEG2000 and JPEG (most common)
- PDF (less common – usually city directories, some scrapbooks)
- MP4 (moving images)
- MP3 (audio)
Preservation copies that the Digital Heritage Center retains:
- JPEG2000 (newspaper images)
- TIF (non-newspaper images)
- MP4 (moving images)
- MP3 (audio)
Preservation copies available to partners immediately after digitization:
- TIF (print newspapers)
Digitization Overview
- Unpack materials
- Assess for condition issues, presence of metadata
- Separate out materials by format and/or subject matter
- Decide which hardware to use for digitization
- Calibrate hardware
- Digitize materials
- Review images and apply image rotation if necessary
- Run images through OCR (optical character recognition) software, if appropriate
- Create image metadata in spreadsheet
- Upload images and metadata to your content management system
- Quality control your work
- Re-package and return materials to storage
Description of Scanning Practices
Staff use the following best practices when scanning materials:
- Handling
- Items are handled as little as possible and are stored in secure locations when not in use.
- Bound materials are opened gently and scanned lying flat. Props and book cradles are used when necessary.
- Glass is used to flatten pages only when it poses no danger to the materials and does not inhibit image quality.
- Unless requested by the owning institution, paper materials are handled with bare hands to facilitate dexterity. Photographs, slides, negatives, and museum objects are handled with cotton gloves.
- To avoid shadows in the gutter for bound items being shot with digital cameras, the volumes may be turned perpendicular to the light source.
- Hardware Calibration
- Monitors are calibrated for color consistency.
- When using digital cameras, the hardware is calibrated based on the size and type of object. This calibration is specific to the type of hardware.
- Targets are used to ensure color fidelity and accurate focus for each item captured.
- What To Skip
- Blank pages
- The backs of documents and photos if there is no unique or helpful content
- Spines of bound objects
- About Scrapbooks
- Unfold and scan unique multi-page items that are inside of scrapbooks (like small pamphlets or booklets)
- When scanning multi-page items inside of a scrapbook, scan the entire scrapbook page each time to maintain context.
- Items in enclosures are carefully removed for scanning if no damage will result.
- Do not lift or remove the plastic page protectors that adhere to scrapbook pages.
- Scrapbooks bound by posts are sometimes disassembled, if doing so provides a better scan and/or gentler handling, and if no damage will result.
- Color – Scan in color, using the sRGB colorspace.
Image Manipulation
Following digitization best practices, do minimal manual image manipulation.
- No “touchups,” intensive color correction, dust or scratch cleanup
- Images may be deskewed or rotated
- Images are cropped to leave a small, uniform border of space around each item.
Image Description
Items are described using three sources of information:
- Written information directly on the original item
- Information provided by the owning institution
- Information that can be gleaned from looking at the original item
Use a metadata dictionary and several broadly adopted metadata authorities to help ensure consistency across item descriptions.
Digital Publishing
DigitalNC.org is built on three software packages that exchange information via APIs:
- WordPress (www.digitalnc.org) – Hosting and delivery of blog, information pages, cross-collection searching and management of program-wide data: contributors, counties, titles, collections, and exhibits.
- Invenio hosted by TIND (lib.digitalnc.org) – Hosting and delivery of non-newspaper digitized content. A few types of content (audio, moving images) are hosted by the Internet Archive and delivered via TIND.
- Open-ONI (newspapers.digitalnc.org) – Hosting and delivery of newspaper content.
DigitalNC.org also uses Disqus to manage comments across WordPress and CONTENTdm content.
Items scanned and hosted by the Internet Archive are featured in CONTENTdm using the Internet Archive’s BookReader.
Files and Storage
Images are given intelligent identifiers – names that give some sense of what the item is about and/or where it’s from. Files are stored in directories named after the item’s owning institution, on servers supported by the UNC-Chapel Hill Libraries.
Project and Workflow Management
Multiple staff members are involved in digitization and digital publishing at the Digital Heritage Center. They use Trello for managing workflows.
Spreadsheets and documentation on Google Drive help in sharing information and tracking project progress.