Digitization Project
The Amador Digitization Project is a complex undertaking with many moving parts. Read more about the methodology, standards, and technical aspects of the project below.
Scope, Duration, Methodology, and Work Plan
Scope & Duration
The scope of the Amador family correspondence digitization project includes scanning, creating metadata, and establishing an online digital collection for 15,000 pages of correspondence. The project has utilized six professional staff, seven student technicians, and an advisory committee of five subject experts. The duration of the project, from initial organization and scanning to metadata creation, will be three years.
Student-Centered Approach
One of the goals of this project is to promote a wider understanding of cultural heritage and its relationship to the local community and society as a whole. Approximately 58% of NMSU students are Hispanic. The students hired for this project will learn about curation and preservation of Hispanic cultural heritage through hands-on experience. Our team guides students and involves them in all stages of the Amador family correspondence digitization project, learning about archives and primary source materials; professional standards; handling rare, historical materials; digitization processes; creation of metadata for access and preservation; cultural heritage curation; and digital humanities scholarship. NMSU student employees have been a vital part of the NMSU Library community for decades. The library has a track record of working with student employees on training and implementing detailed workflows to ensure efficiency and high-quality work outcomes. Due to their positive work experience at the NMSU Library, many student employees have pursued careers in librarianship, archival studies, and cultural heritage.
Methodology and Standards
The Amador family correspondence is written in Spanish, with some in English, and a limited amount in French. The letters vary in length from one to several pages. Similarly, the size of base media varies from small note cards to stationary approximately 9x14 inches. Carefully preserved for generations and housed in the NMSU library for more than 50 years, the letters remain in good physical condition. They are stored unfolded and upright in acid-free and lignin-free boxes and folders, according to archival best practices. Particularly fragile items are housed individually in clear, archival polyester sleeves to prevent damage from handling. The materials are kept on standard library shelving, in a dark environment, at 67 degrees with relative humidity between 35 and 45 percent. The detailed finding aid, as well as the letters themselves, serve as primary sources for the multilingual descriptive metadata that are created for this project. During the scanning process, all archival materials are handled and cared for according to relevant standards. Student technicians have been trained by professional library staff appropriately.
Professional Standards
NMSU Library follows the Principles for Digital Content adapted by the Council of the American Library Association in 2007 regarding fostering true value of digital assets, equitable access to and use of digital content with appropriate recognition of intellectual property rights, collaboration on long-term sustainability and preservation of digital collections, their international scope, and importance of building digital assets on open standards and best practices that advance their usefulness for diverse communities of users.
Accordingly, NMSU Library has applied the required standards both in the planning and implementation phases of the Amador family correspondence digitization project. In order to document and streamline the digitization process of the Amador collection, NMSU Library has adhered to the Federal Agencies Digitization Guidelines Initiative (FADGI) Digitization Activities: Project Planning and Management Outline. In line with its recommendations, the project has been divided into four major phases that involve concurrent and overlapping activities. These phases include project planning, pre-digitization, digitization, and post-digitization, all governed by appropriate standards.
For digital image capture, including production of master and derivative files, file naming conventions, formats, storage, and image quality control, the project team has complied with the FADGI Technical Guidelines for Digitizing Cultural Heritage Materials Creation of Raster Image Files, specifically with the FADGI recommendations for handling manuscripts and other rare and special materials.
To ensure that the Amador collection is easily discoverable and accessible by an international scholarly audience, the NMSU Library has adhered to the Dublin Core Metadata Initiative schema and terms, and utilized appropriate controlled vocabularies from the Library of Congress Authorities, as well as the Virtual International Authority File (VIAF), and Getty Vocabularies. These complementary sources of controlled vocabularies prove useful to describe topics of the letters, places and people mentioned, and also various concepts related to hybrid cultural heritages. In addition, the project has followed the Best Practices for Multilingual Access to Digital Libraries developed by Europeana Network that provide a number of strategies for making data, metadata, and user interface multilingual.
To ensure the long-term authenticity and usability of the Amador family collection’s digitized content, the project team has followed the Library of Congress’ PREMIS, a standard of Preservation Metadata Implementation Strategies as well as PRONOM online file format registry. Maintenance, security, and preservation of the digital content have adhered to information technology standards with scheduled regular backups and redundancy of the storage system. In particular, our Network Attached Storage (NAS) has been designed with a Redundant Array of Independent Disks (RAID), a data storage virtualization technology that protects all data from individual drive failures. In addition, a secondary off-site backup of all data has been utilized to ensure no data loss.
The NMSU Library is in the process of developing a comprehensive long-term preservation policy for unique digital collections created or collected by the library. This policy will comply with the Open Archival Information System (OAIS) Reference Model and other appropriate digital preservation standards and practices. NMSU Library also collaborates with LibNova, a vendor for the digital preservation platform, LibSafe Go.
Intellectual Property
The Amador family correspondence dates from 1856 to 1949, with the bulk of the material now in the public domain. All digital assets created for the Amador digitization project are freely available for research, learning, and creative activity. The NMSU Library is committed to providing open access to and reuse of its archival resources, including digital collections data.
Technical Environment
Hardware and software available for this project are listed below:
Hardware
- 3 Epson Perfection V700 flatbed scanners
- 4 Epson Expression 12000XL flatbed scanners
- Dell PowerEdge servers connected to Dell PowerVault storage provide 20TB of redundant network-attached storage
- Buffalo TeraStation WSH5610DN NAS Storage System - Intel Celeron J1900 2 GHz - 6 x HDD Supported - 6 x HDD Installed - 48 TB Installed HDD Capacity - 8 GB RAM DDR3 SDRAM - Serial ATA/300 Controler (2 to be purchased)
- Laptop, doc station, monitor, keyboard, mouse - 7sets
Software
- Epson Scan scanning software
- Adobe Creative Cloud Suite (for image editing)
- CONTENTdm platform
- OpenRefine (for metadata quality control)
- R and RStudio (for data mining, analysis, and visualizations)
Project Workflow
In line with the FADGI recommendations, the Amador family correspondence digitization project has proceeded through four concurrent phases. Such an arrangement improves flexibility in scheduling related activities. During the first phase, project planning, new equipment was purchased, set up, and tested. Also, the project specifications and production workflow along with student training materials were finalized. In addition, the hiring process for student technicians took place.
During the second phase, pre-digitization, following the arrangement documented in the collection’s finding aid, the physical documents have been selected, evaluated for their condition, and catalogued in a project inventory. The inventory includes administrative metadata, such as collection name and number, series, box/folder, copy and access rights, and file names. The inventory also contains multilingual descriptive metadata. The creation of multilingual metadata schema has been informed by the Europeana standards mentioned above and by recommendations of the collection’s users, including our subject experts who are especially interested in tracing vast linguistic, social, economic, geographical, and political networks of the Amadors.
Both multilingual metadata in Dublin Core standard and multilingual user interface are supported by the CONTENTdm platform, which the NMSU Library uses for online public access to its digital collections. Accordingly, the metadata schema will inform the design of the CONTENTdm user interface, including searching and browsing options as well as display of the search results. Throughout the project, metadata collected in the project inventory will be assessed and refined regularly using OpenRefine, an open-source software application for data manipulation and cleanup.
The second phase of the project also involved assembling and preparing original materials for digitization, while ensuring appropriate handling, arrangement retention, and secure storage. After a final review of project specifications, we conducted comprehensive training for staff and student technicians. In addition to standards, equipment procedures, and workflows, the training included detailed instructions on handling archival documents provided by the university archivist, as well as workshops on basic paleography offered by our subject experts.
The third phase, digitization, involved performing digital conversion according to the relevant FADGI standards for manuscript collections. In addition, the Association for Library Collections and Technical Services (ALCTS) Preservation and Reformatting Section emphasizes that although the research value in most textual materials lies in their content, in case of manuscripts, the inks, pen strokes, base media, colors, stains, annotations, and markings provide additional contextual informational value that needs to be depicted accurately in the digital image. Accordingly, correspondence has been scanned to include the total document area with a minimum resolution of 400 ppi, in 24-bit color. To maintain image clarity and integrity, the master files have been saved as uncompressed, lossless TIFFs. The access files have been cropped and edited as needed to ensure derivative image quality. Derivative files have been saved in JPEG and PDF file formats. JPEG files typically are requested for reproduction, while PDF is the most effective and user-friendly format in terms of available interactive functionality for engaging textual documents in the CONTENTdm platform. Unique file names reflect the name of the collection, box number, folder number, date of the letter, page number, and an indication of recto or verso (e.g., Ms0004_B01_F02_18760531_P1R.tif). Derivative files keep the same naming convention. Derivative PDF images uploaded to CONTENTdm also have been given a unique and persistent URL, assuring their accessibility over the long term.
Several measures have been taken to ensure the integrity of digital content. All files have been stored securely on NMSU Library servers with RAID (Redundant Array of Inexpensive Disks) backups and offsite drives, following the 3-2-1 backup rule (three copies, two different storage media, one of them located off-site). Access to master files is restricted to minimize the potential for loss or destruction of data. In addition, regular maintenance of storage media and auditing of files is scheduled. Auditing includes a combination of techniques, such as checksums and digital signatures. Complete sets of the digital content have been stored in an offline “dark archive” environment, on two RAID enclosures physically located in each of our two libraries.
During the digitization phase, technical metadata generated during image capture has been added to the inventory. These are file type, file size, and creation date/time. In addition, preservation metadata, such as the type of scanner used, original scanning resolution, image editing specifications, have also been recorded in the project inventory. After assessing the inventory metadata for their accuracy, consistency, completeness, and context, all missing or incorrect values are being added and refined using the open-source software application - OpenRefine. Eventually, these refined metadata along with all files are ingested into the trusted digital repository serviced by LibNova.
The fourth phase, post-digitization, has started with transferring boxes with digitized original manuscripts back to the NMSU Archives and Special Collections for secure storage under the supervision of the NMSU archivist. Special care has been given to reviewing storage conditions of original items, replacing folders and sleeves as needed, and addressing any preservation concerns for documents. At the same time, metadata and data collected in the project inventory have been transferred to CONTENTdm for access. After an assessment of uploaded metadata records for accuracy and completeness, the updated collection has been published online, and the maintenance audits have been scheduled and documented. Similarly, conditions for storing data and backup procedures for disaster recovery have been reviewed and documented. The appropriate data maintenance audits and updates have been scheduled. As all of the planned activities reach their completion, the digitization project will be evaluated for its workflow effectiveness, drawbacks, teamwork, and final outcomes.
Digital Collections as Open Online Archives and Data Repositories
Hosted on CONTENTdm via the NMSU Library website, the Amador family correspondence digital collection has been readily accessible to local and international users. In addition, CONTENTdm supports the harvesting of metadata into WorldCat via Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), and it offers search engine optimization (SEO) service to increase traffic to the collection website and secure its high position in search engine results for relevant keywords. These strategies increase the visibility and use of the Amador family digital correspondence.
The Amador family correspondence digital collection has also created an opportunity for innovative data exploration, experimentation, and will foster the development of new research questions and methodologies. To this end, data visualizations will be added to the landing page for quick content overviews and insights into the Amador family’s networks. The collection’s aggregated open metadata eventually will be stored at GitHub and be available for computational re-use in data mining, data visualizations, and topic modeling projects.
Work Plan
The Amador family correspondence digitization project will last three years, from August 2022 through August 2025. The work has been divided into four co-occurring phases: project planning, pre-digitization, digitization, and post-digitization. The project team includes six library staff and seven student technicians who are capturing and editing images and assisting with metadata creation. Because student technicians play a vital role in this project, the timeline is built around the academic year. The first two phases (project planning and pre-digitization) have focused on purchasing equipment, developing project documentation and workflows, reviewing original documents, creating correspondence inventory as well as hiring and training student technicians. The third phase, digitization, occupies the first two years of the project (90 weeks), with 7 students each scanning and editing. After quality checks, master and derivative image files are transferred to secure storage. Technical and bilingual descriptive metadata are created and recorded in the correspondence spreadsheets after scanning. Staff and student technicians keep developing the bilingual descriptive metadata for each letter along with appropriate controlled vocabularies. During the final phase, post-digitization, original documents are reviewed carefully and transferred back to the archives under the supervision of the project archivist. All metadata are assessed for quality and completeness and then uploaded to CONTENTdm along with PDF files. The collection again is reviewed thoroughly before publishing it online. All files and metadata are transferred to secure LibSafe Go archives for long-term digital preservation. The final evaluation of the digitization project will be carried out and documented in detail.