Tag Archives: Archives–Processing


I want to talk to you about your bodies, O archivists, not because I’m not interested in your minds, but because, as with all forms of labor, it’s your bodies that are at stake. (This is a long hat-tip/dérive from Allana’s work from last year.)

Enki, God of Waters, at peace in the pure land of Dilmun, heard the cries of its own god for water. Enki orders the sun to bring water from the Earth, and the land is awash. He and his consort Ninhursag let flow the “waters of the heart” — Sumerian ab is both water and semen — and in 9 days, Lady Greenery is born.

The other day at work I was helping someone understand OCR, and really reaching the limit of my knowledge swiftly, and this someone asked if manuscript text could be made CTRL+F-able, and I said “Oh heavens no,” and said that non-typescript character recognition was basically in research-and-development, and someone please correct me but it seems like the sheer variability of human handwriting would make any machine-learning exercise too expensive for the use to which it would be put, I mean if you need to build Skynet in order to do a “find” in Lenin’s Paris notebooks, I mean, and they said “Yeah but Ancestry does it” and I was able to explain that what you, Dear Someone, assumed to be the product of a machine because surely SURELY IN THIS MODERN AGE we would never consign human beings to maddening, gut-wrenching soul-killing piecework, was in fact a 21st century version of Maelzel’s Folly, which was that thing of where instead of a robot playing chess you have an actual midget under the table moving pieces with magnets, that is, Ancestry does it using swarms of poorly-paid humans in China and the Philippines, and the Smithsonian uses swarms of volunteers to tag images with their texts, and we let them, mostly because we want to use our bodies to sit in meetings, wherein we govern others’ bodies.

Ninhursag leaves Enki, and he, wandering by the waters, sees a woman resembling her, who accepts him. Neither knows this is incest, and Lady Greenery bears her father’s child, Ninkurra, Lady Fruitfulness. Somehow this happens a third time, and the offspring of the God of Waters and Lady Fruitfulness is Utta, the Weaver, the Spider Goddess, Who Makes The Web of Life. Utta’s grandmother gravely instructs her granddaughter to keep away from the riverbanks, the marshes, anywhere the water-lord travels.

When we think about the labor of archives, we mostly think of its gaps, lacunae, diminutions, and disappearances — we don’t have agency, autonomy, respect, solidarity, hands, heads, or feet. The body of the archivist is not, in the official account, present. So let’s sing for the traffic of bodies in the stacks, cry for their wounds, exult in their power, and by so doing remind the insensate crowd that we’ve been here, burrowing through the sources that make their memes.

The analogous presence of human beings among archival material in the stacks, that is our bodies working among our bodies of work, is the great disavowed other of our profession. (I realize like every other year I find another objet petit a for the archives, so maybe this is where you hop off the bus.) It’s evident from the literature that we replace care and concern for our own bodies and those of others with care and concern for the material bodies on the shelf. Search for “injury” in American Archivist and you will hit an ancient piece on restoration, another aged work on the bindery, a treatise on flattening paper, one on English manuscript repair, und so wieter. We write about the skins and flesh of motion pictures, the pellicules of photographic negatives, the broken spines of bound volumes, the baby goats encasing books who just want to return to being three-dimensional goats instead of goat-skins, the dismemberment of collections, and we couch our writing about this charnel-house in the arid language of the medical inquest, and we seem to have never written about injuries suffered by archivists’ own moving bodies.

Inevitably Enki lusts after Utta, and they have sex, and Utta’s attendant retrieves Enki’s semen from Utta’s womb, and plants it in eight parcels near the riverbank. The seeds become various plants, which fruit. Utta’s attendant shows Enki the sundry new fruits, and Enki eats one of each, which is his own semen. So doing, he falls ill with tumors or pregnancies “in his jaw, his teeth, his mouth, his hip, his throat, his limbs, his side and his rib.” Unable, as a male, to give birth to these swellings, he writhes in exquisite agony while the rest of the gods figure out whether they should do anything.

Instead our concern for bodies is subsumed into our writing about archival description; our bodies and our collections become bodies of work; our presences and their presences are packed into workflows and descriptive standards. Again, as a sample, go looking for “bodies” in AA. I got pieces on appraisal, “theory,” description, processing; in short, answers to the question of putting the whole bodies of collections into the hands of researchers. Habeas corpus. This control of bodies via descriptive regimes of course extends itself into the common prison metaphors for our work: stewards, caretakers, custodians, gatekeepers. And we panic at the idea that the alien bodies under our care are proliferating on their own, unchecked, unchecklisted. Search for “bulk” and you’ll find “reduction.” There is a cure for paper/cancer, and we’re working on it, in 1940, in 1967, in 1978, over and over again, sampling, selecting, appraising, reducing, liposuctioning, and stitching back together the terrifying obesity we’ve shoved into our steel catacombs. So instead of anthropomorphizing the collections and then, with academic fig-leaves, papering over their obscene bulk, I’d like to just think of how our bodies got their bodies onto the shelf.

Reed Group are some fuckers who help SSDI people figure out whether or not to pay out on your disability claim, and here is their comprehensive description of our work. Note here that the chief health risks for us are pregnancy and major depression, and that our work is classed as sedentary:

Exerting up to 10 pounds (4.5 kg) of force occasionally and/or a negligible amount of force frequently or constantly to lift, carry, push, pull, or otherwise move objects, including the human body.

Which is a bunch of hot bullshit, insofar as any archivist tasked with accessioning or processing has to lift 40 pounds by theyself. Which is why there’s a pretty strong correlation between entry-level work and manual labor, why, look at this job listing here, which requires independent lifting of 50 pounds, or this one, which my god I would fail just for the vision portion, or this one for 40 lbs. Lifting is the first task of bringing in a collection, and how we’re capable of moving objects around affects how we represent them to researchers. To in part and haptically answer @meau:

I feel like MPLP has been applied poorly or unevenly in part because our bodies naturally and inevitably limit the size and scope of depeche-mode processing.

For me, to do a quick initial sort and triage on a largish collection (again, for me), say 50 feet, I need a room with six three-by-eight tables. I need to be able to load 50 boxes ranging in weight from 20 to 35 pounds onto carts and then load these boxes onto the tables. I need to be able to open everything at once, stack like items with like, identify oversize stuff, identify media, pitch all the publications, find out what has worms, etc. This involves standing, mainly hunched — we don’t tend to make 3-1/2 foot high folding tables really — for, if I’m lucky and uninterrupted — four hours at a time for eight hours a day.

You’ll note that for a heterogeneous collection any larger than this, MPLP is not scalable. You can cut a giant collection into homogeneous chunks and box and label them 10 feet at a time, but for giant groups with no incoming order, forget it. You’ll also note that MPLP does and should emphasize description of the gestalt or the oeuvre or the corpus, that is work on the whole body, but again, unless you’re working with a collection which came to you already pretty assiduously cared-for by a phalanx of women in central filing (see main image), there is no such thing as work on the whole without a serious bodily commitment. This means repetitive stress injuries to the back, knees, neck, tendinitis in the elbow and wrist, and so on ad infinitum. A thousand tiny indignities welling up into chronic conditions.

The woman in my position before me developed arthritis in both knees, and routinely had hand surgery on her hands and forearms for carpal tunnel syndrome. (My job initially was basically to serve as her arms and legs.) I pulled the same old lower lumbar muscles I always do right in the middle of writing this blog, and have had to return to my old regimen of core exercises recommended for 70-year-olds just to maintain. The grande dame archivist of my region has a persistent cough which her pulmonologist attributes to forty years of work in basements, breathing dust and red rot. I have a colleague so sensitive to active mold that he’s our canary in the coal mine: if he’s sneezing, I’ve got to quarantine something.

The gods ask Ninhursag for help, and she relents, again taking ab from Enki, and giving birth to eight gods of healing for each of Enki’s afflicted regions. Waters ebb and flow across the land, bringing life to the parched, bearing fruit. The waters bring along with life, suffering. And for each form of suffering, there is a healing genius.

And here’s where I would pivot from the bodies to the intellectual corpus of work. The archival profession, like Enki floating in the river which is himself, has eaten dire fruits and has abscesses. We have a fetish for conservation science, where we need the god of triage. We prize visualizations of description over the grunt work of tilling the bulk of the 19th and 20th centuries to bear real fruit. We enshrine the rights of donors at the expense of the sovereign powers and rights of users, of society writ large. We have an absolute paranoia about copyright, which can really only be lanced by the goddess of not giving a shit and wishing a motherfucker would. We have work to do, and, unlike poor Enki, no external source of relief, and so we will heal ourselves by ourselves, or languish in our excruciating insufficiency to the given task.


J. J. Audubon, Bluebirds

Why is it a shock that we don’t have good access to this stuff? Think of the Library of Congress’ harvest of Twitter as if it were paper: 24 billion pages of text. Probably 6 million cubic feet. Growing by 14,000 cubic feet a _day_. Created by 140 million authorities. With integrity, chain of custody, privacy and political problems, such as: How do you reveal that a post by a Thai blogger committing lèse majesté against Bhumibol has been suppressed in his homeland by Twitter Co.?

What’s revealing is that the writer’s angle here — Can deleted tweets now be made accessible? — is almost shamanic: “Now, through magic, we can hear the 18 1/2-minutes that Rose Mary Wood erased! Or failing that, we can see Rep. Anthony Weiner’s chest again.” The carrier — UTF-8 instead of paper — has seduced us into thinking that since storage isn’t a problem, intellectual control isn’t a problem. Digital stuff is magic; in the interwebs, access is innate.

But intellectual control and access are built into physical care and handling of paper in a way that we haven’t fully replicated with born-digital collections. And so the Big Twitter Capture totally flouts the cardinal rule of good collections of ephemera: Define Narrowly, and Weed Ruthlessly.

There’s a kind of shallow populism at work here, the kind that believes that appraisal is strictly disciplinary and recapitulates in collections power-dominance over people. This is why we acquire widely: to guarantee that neglected parties have a voice in the susurrus of the archives. Theoretically speaking, this rhetoric of empowerment is bogus; people or groups marginalized from our collections are not themselves without power or voice, it’s just that we haven’t trapped either one of them in amber, we haven’t institutionalized their infra-power. The assumption that we can bring everything in without a solid plan for access, and just leave IT gurus and researchers to make sense of the pile is precisely the opposite of populism.

In 2010 the Pennsylvania state archives gets six figures from NHPRC to do MPLP on 26000 cubic feet, or about 37% of its holdings. Two years later, 11000 feet had been processed, and 4000 weeded, which is basically herculean. And still barely enough to keep up with growth. By way of really offhand comparison, the much smaller state of Vermont’s records center in 2011 grew by 12,000 cubic feet to 98,000 total (PDF, page 8). In the same year, 6,000 cubic feet met retention.

I think 1/3 backlog across all our collections is dramatic understatement. MPLP is barely enough to fight the bulk. Every generation of archivists promises to be the first to slay the backlog, and bulk has always won. Jenkinson was terrified by the impossible bulk of World War I records. Bulk-and-opacity in archives is not a bug, it’s a feature.

Our instinctive reaction to the feature is to apply more science to where science has already failed. Take this really interesting project at Harvard last spring. Flipped, or digitization-first processing has the capacity to bring item-level (really, face-level, since we’ll be shooting verso and recto a lot) description to smallish collections. But the process presumes that the collections have been curated and all the dross weeded out already. Absent that, we’re filling our digital repository with exactly the same dejecta we filled the bricks-and-mortar one with. Yes, faceted browse will help us narrow any given group of objects, but indiscriminate processing gives us no sense that one part of a collection might be more significant than any other. Or to differentiate one collection from another. Or one repository from another.

And for some users, that won’t matter and flipped processing will be hunky-dory. Let’s do an imaginary browse for all collections containing postcards. Some are from the holdings of a dedicated deltiologue. Some are from the intimate correspondence of a Maj. Gen. stationed in Haiti in the 1920s. A researcher interested in the use and dissemination of images of the tropics, or a researcher interested in the style and syntax of what’s written on postcards, would be fine; that is, non-historical research wouldn’t suffer a bit from wading through thousands of decontextualized images. Fragmented collections, shorn of context, are fit for disciplines which see intrinsic attributes in objects.

But for historical research, context is paramount, and no object has intrinsic value. We should, being, you know, people who went to high school, be able to figure out the intrinsic significance of this 19th century telegram from Savannah:
It’s a piece of text, like the Gettysburg Address, or the Declaration of Independence, or the Letter from Birmingham Jail, which has iconic power. And it’s a wild outlier. Of the tens of millions of pages in the tens of thousands of unprocessed feet at any given repository, a vanishingly small number will have that kind of significance. For everything else, someone will have to guide researchers to the significances of the archival corpus.

When you don’t know what you’re looking at, that’s where history steps in. History lives on third-person-omniscient perspective. Creators and donors virtually never understand what their collections might mean. Even the best researchers only have a parochial, or a third-person-limited perspective, seeing archival collections through the lens of their prior publications. I recently worked on a collection of photographs from Iran of the 1910s. The creator had been interested in economic development and ethnography. The donor was interested in family history and evangelism. What I found were 1″ x 1-1/2″ photographs of executions and war crimes, taken during what I at first assumed was the Persian campaign in World War I. After reading diaries, and finding captions for other similar photographs, and checking the period of time the creator was in-country, it became much more likely that these were images of royalist retribution against supporters of the new Iranian Majlis during the revolution of 1906-1911. This half-a-foot collection took about three hours to get its world-historical significance unspooled.

Creating meaning from archives demands context. Access can be delivered en masse. Context can’t. Access without context makes for opaque bulk. Processing cuts bulk into something legible. Digitize-first is an interesting adjunct to processing, but the idea that we can slap search on top of an undigested bolus of images and walk away and users will pick up the pieces is motherfucking hilarious.