In 2010 the Pennsylvania state archives gets six figures from NHPRC to do MPLP on 26000 cubic feet, or about 37% of its holdings. Two years later, 11000 feet had been processed, and 4000 weeded, which is basically herculean. And still barely enough to keep up with growth. By way of really offhand comparison, the much smaller state of Vermont’s records center in 2011 grew by 12,000 cubic feet to 98,000 total (PDF, page 8). In the same year, 6,000 cubic feet met retention.

I think 1/3 backlog across all our collections is dramatic understatement. MPLP is barely enough to fight the bulk. Every generation of archivists promises to be the first to slay the backlog, and bulk has always won. Jenkinson was terrified by the impossible bulk of World War I records. Bulk-and-opacity in archives is not a bug, it’s a feature.

Our instinctive reaction to the feature is to apply more science to where science has already failed. Take this really interesting project at Harvard last spring. Flipped, or digitization-first processing has the capacity to bring item-level (really, face-level, since we’ll be shooting verso and recto a lot) description to smallish collections. But the process presumes that the collections have been curated and all the dross weeded out already. Absent that, we’re filling our digital repository with exactly the same dejecta we filled the bricks-and-mortar one with. Yes, faceted browse will help us narrow any given group of objects, but indiscriminate processing gives us no sense that one part of a collection might be more significant than any other. Or to differentiate one collection from another. Or one repository from another.

And for some users, that won’t matter and flipped processing will be hunky-dory. Let’s do an imaginary browse for all collections containing postcards. Some are from the holdings of a dedicated deltiologue. Some are from the intimate correspondence of a Maj. Gen. stationed in Haiti in the 1920s. A researcher interested in the use and dissemination of images of the tropics, or a researcher interested in the style and syntax of what’s written on postcards, would be fine; that is, non-historical research wouldn’t suffer a bit from wading through thousands of decontextualized images. Fragmented collections, shorn of context, are fit for disciplines which see intrinsic attributes in objects.

But for historical research, context is paramount, and no object has intrinsic value. We should, being, you know, people who went to high school, be able to figure out the intrinsic significance of this 19th century telegram from Savannah:
It’s a piece of text, like the Gettysburg Address, or the Declaration of Independence, or the Letter from Birmingham Jail, which has iconic power. And it’s a wild outlier. Of the tens of millions of pages in the tens of thousands of unprocessed feet at any given repository, a vanishingly small number will have that kind of significance. For everything else, someone will have to guide researchers to the significances of the archival corpus.

When you don’t know what you’re looking at, that’s where history steps in. History lives on third-person-omniscient perspective. Creators and donors virtually never understand what their collections might mean. Even the best researchers only have a parochial, or a third-person-limited perspective, seeing archival collections through the lens of their prior publications. I recently worked on a collection of photographs from Iran of the 1910s. The creator had been interested in economic development and ethnography. The donor was interested in family history and evangelism. What I found were 1″ x 1-1/2″ photographs of executions and war crimes, taken during what I at first assumed was the Persian campaign in World War I. After reading diaries, and finding captions for other similar photographs, and checking the period of time the creator was in-country, it became much more likely that these were images of royalist retribution against supporters of the new Iranian Majlis during the revolution of 1906-1911. This half-a-foot collection took about three hours to get its world-historical significance unspooled.

Creating meaning from archives demands context. Access can be delivered en masse. Context can’t. Access without context makes for opaque bulk. Processing cuts bulk into something legible. Digitize-first is an interesting adjunct to processing, but the idea that we can slap search on top of an undigested bolus of images and walk away and users will pick up the pieces is motherfucking hilarious.

