Beyond the Digital Color Line: About The Digital Colored American Magazine
Brian Sweeney and Eurie Dahn, Project Directors
In recent decades, mass digitization projects like Google Books (2004-present) and more specialized and curated digital libraries like the Modernist Journals Project (1995-present) have expanded access to rare books and periodicals once confined to the reading rooms of research libraries. The era of digitization has fueled scholarship in numerous traditional disciplines as well as in interdisciplinary fields such as print culture studies and periodical studies.
As professors of literature and periodical studies at a small liberal arts college, we have benefited from digitization not only as scholars but as teachers. Digital libraries have enabled us to introduce our students to archival materials and research methods that once would have been available only to students at universities with world-class archival collections.
However, while digitization has expanded and democratized access to rare books and periodicals, access remains extremely uneven. For example, while decades of consecutive issues of Harper’s, the Century, and the Atlantic Monthly are freely available as part of Cornell University’s Making of America archive, other significant periodicals like Godey’s Lady’s Book are available on the “free web” only piecemeal, while full runs are ensconced behind the paywalls of proprietary subscription databases. As Benjamin Fagan has written, this unevenness of access is especially the case when it comes to African American periodicals. Black print, he writes, has been comparatively neglected by many paywall-free digital archives on which teachers and researchers outside of elite institutions depend; and while important black newspapers and magazines have been digitized and added to various subscription databases, few libraries outside of elite institutions have the resources to subscribe to more than a handful of such databases, leaving many important black periodicals largely inaccessible to many researchers, students, and teachers, as well as the general public (10).
The Colored American Magazine has been more fortunate than most black periodicals of the nineteenth and early-twentieth centuries in that a full run of the magazine is hosted by the subscription-free HathiTrust Digital Library. However, only members of HathiTrust partner institutions (most of them elite research universities or liberal arts colleges) have unfettered ability to download or print multiple pages at a time. This creates a significant barrier to use for individuals unaffiliated with member institutions.
One impetus for The Digital Colored American Magazine project was our belief that digital humanists need to create alternatives to proprietary databases that re-enclose the commons of public domain print, and in particular black print. We agree with Fagan that scholars should resist a state of affairs where early black print is “treated as the property of private corporations” (12). The Digital Colored American Magazine aims to make a digital version of this important magazine that is useful, accessible, and free to all.
In addition to a commitment to open access, The Digital Colored American Magazine arises from our concerns about mass digitization projects, such as Google Books, that treat books as containers of text to be aggregated and data-mined. Such “big data” digitization projects favor high-contrast black-and-white scans that lend themselves to greater accuracy in optical character recognition (OCR) but at the cost of distorting the aesthetic and material character of the original. Moreover, because the goal of such projects is to digitize vast quantities of print, the quality often suffers, resulting in scanning errors, quality-control issues, and unreliable OCR.
These problems of mass digitization are evident in the Google Books scan of the Colored American Magazine available through HathiTrust. The Google scan was based not on original issues of the magazine but on black-and-white reprints of bound volumes of the magazine published by the Negro Universities Press in 1969. The result of this digitization of a printed reproduction of a bound version of the original magazine, on which much scholarly work on the Colored American has depended, is flawed in all the ways one might expect. The vibrant colors–red, blue, green–that enliven the covers and advertising pages of the magazine have been bleached away. Text is difficult to read and at times illegible. Faded images lend the magazine a spectral, immaterial quality.
The weaknesses of the Google Books version of the Colored American Magazine justify concerns increasingly raised about digitization by literary scholars. As Brian Connolly warns in his essay “Against Accumulation,” mass digitization projects like Google Books threaten to undo the work of print culture studies by re-dematerializing texts, converting them to text-searchable images ripe for “neoempiricist” computer analysis (177). All that is solid melts into data.
Convinced of the inadequacy of the Google Books digitization of the Colored American Magazine, we approached our project with the guiding principle to make available digital reproductions of the periodical that, as far as possible, would combine the benefits of digital accessibility with fidelity to the material character of the originals. While it may seem paradoxical to embrace digitization as a way to enable greater engagement with the materiality of print, that is what we aim to do. Ryan Cordell recently cautioned scholars against digital projects that “attempt to replicate the reading room,” for “considered only as a surrogate for the physical newspaper or magazine, the digitized periodical can only disappoint. The digitized periodical, like the digitized book, constrains its original to the size of the computer screen, smooths its textures, and even ‘plasticizes’ its aroma” (4). Cordell is indubitably correct, but it is all too easy for scholars to forget the unevenness of access to “the reading room.” While it is true no digital reproduction can fully “replace embodied encounters with texts in physical archives” (Hyde and Rezek 157), we believe that responsible digitization can better translate into the digital medium the material character of the printed object.
Nor is this concern for materiality mere book-fetishism. The most dramatic moment in the history of the Colored American Magazine, its transformation when it moved from Boston to New York City in 1904, is told as much through the changing nature of its contents as through changes in its material character–paper quality, printing quality, the presence or absence of color–that are hard to discern in the Google Books reproduction.
The reliance of previous reproductions of the Colored American Magazine on bound volumes poses an additional concern. Unlike modern periodicals, nineteenth- and early twentieth-century genteel magazines like the Colored American restricted advertisements to separately numbered or unnumbered front and back pages. When issues of these magazines were bound for preservation, these advertising pages, evidently deemed insignificant, were routinely torn out and discarded. Moreover, front and back covers were frequently discarded along with them. Such preservation practices have ensured that comparatively few truly complete copies of nineteenth- and early twentieth-century magazines still exist, a state of things Sean Latham and Robert Scholes refer to as “the hole in the archive” (520). This practice is frequently in evidence in the Google Books digitization. A comparison of the Google Books version of the May 1901 issue with our version of the same issue reveals that the source volume for the Google Books scan omits not only the cover but numerous pages of advertisements.
Our intention to make freely available a more faithful and complete digital version of the Colored American Magazine led us in 2015 on a search for unbound copies of the magazine. After many false starts, our search brought us to the Beinecke Rare Book and Manuscript Library at Yale University, which as far as we can determine is the only research library to possess unbound issues of the magazine. The Beinecke generously agreed to photograph every page of the 35 unbound issues of the Colored American Magazine in its collection and provide them to us for us in this project.
As work on this project was underway, the Beinecke acquired 16 additional issues of the magazine at auction. All but four of these are issues the library did not previously possess. Taken together, these 47 distinct issues owned by the Beinecke constitute slightly fewer than half of all issues of the periodical. The Digital Colored American Magazine will eventually make all 47 of these issues available in reliably searchable PDF format.
Our Digitization Practices
- We work from lossless TIFF images supplied to us by the Beinecke’s Digital Imaging Studio.
- We work from true color–not black-and-white high contrast–images. We also do not crop out the ragged edges of cut pages. Before OCRing, page images are cropped to the gutter, however, so that the file, when opened in two-page view, will better simulate the form of the printed magazine. Our guiding intention is for the digital copy to convey to some degree the material character of the original.
- Searchable page images are created using the OCR software ABBYY FineReader. The text layer produced by ABBYY is then manually corrected–a time-consuming and painstaking process.
- Consistent with current digitization standards, searchable page images are saved individually in PDF/A format. They are then combined into a single PDF file of the entire issue. The PDF format was selected for its portability: we want users to be able to easily download entire issues for offline use.
- For OCRd issues, live links are embedded in the issue’s Table of Contents page to allow users to click through to a particular item. These links are invisible, so as to conserve the appearance of the original page image. No other links are embedded in the PDF file.
- As we OCR individual pages we are also generating text files of each page, allowing us to work toward a long-term goal, contingent upon funding, of creating a database that will allow users to perform full-text searches across all digitized issues. Until funding is available to support the creation of such a database, users can use the search engine in the sidebar to search the text of the website itself. While this tool is not as powerful as a full-text searchable database, it will allow users to search across Tables of Contents supplied on individual issue pages.
- As stated above, OCR correction is painstaking and time-consuming labor. It will take many months before all issues are available in reliably searchable format. For issues on which OCR work has not been completed, we have made non-searchable PDFs available for download. Over time, these will be replaced by searchable PDFs.
- The site will feature scholarly commentary on selected issues.
Connolly, Brian. “Against Accumulation.” J19: The Journal of Nineteenth-Century Americanists, vol. 2, no. 1, Spring 2014, pp. 172-179.
Cordell, Ryan. “What Has the Digital Meant to American Periodicals Scholarship?” American Periodicals: A Journal of History & Criticism, vol. 26, no. 1, 2016, pp. 2-7.
Fagan, Benjamin. “Chronicling White America.” American Periodicals: A Journal of History & Criticism, vol. 26, no. 1, 2016, pp. 10-13.
Hyde, Carrie, and Joseph Rezek. “The Aesthetics of Archival Evidence.” J19: The Journal of Nineteenth-Century Americanists, vol. 2, no. 1, Spring 2014, pp. 155-162.
Latham, Sean, and Robert Scholes. “The Rise of Periodical Studies.” PMLA, vol. 121, no. 2, March 2006, pp. 517–531.