Kate Crawford in TNI has written about the lived reality of big data, and it’s great, and there are abundant lessons for archivists, since like the spooks at NSA and GCHQ, we’re collecting broadly in order to generate a political and economic product, and like the spooks’ our mission is a kind of cultural preservation though their fires are fast and our fires are slow, and like the spooks we prefer the politics of covering our ass to the politics of truth and reconciliation.
Like the spooks, we find our modes of comprehending the world breaking down, and we crave synergy with other disciplines — the ironic reversal is that the NSA is bringing in humanities people to winnow its data trove, and humanities people are bringing in data-crafters to handle their culture-troves. But the deepest link between espionage and archivism, or between web-scraping and analysis, is that pace Daston and Galison, “all epistemology begins in fear — fear that the world cannot be threaded by reason, fear that memory fades, fear that authority will not be enough.” We’re two little Fausts.
Epistemology begins in fear, and I’m pretty sure that the people who make The Data Science aren’t the ones with the frantic, insatiable, indiscriminate cravings. Garbage in, garbage out. I feel like they have control of scope — the collection is functional. If one batch of words gets you an English-like automatic transcription, or a cure for cancer, or a heads-up on a dude who bought a bunch of ammonium nitrate, then the collection has worked. So define the function of a collection narrowly, and problem solved, it will stabilize at a size which lets it work. Thus do our friends with the audio-transcriber-machine test blocks of text and scrap useless ones. Thus do archivists bring in 50 feet of grandma’s personal papers from her garage and process them down to 12. THAT WAS A LOT OF MAGAZINES GRANDMA.
Which is what archivists’ literature relates, mainly, but our literature also relates secret epistemological fear only in a different dimension. The NSA may be on some horizontal panic level — we must sweep it all in from everywhere, collecting an inch thick and a mile wide — but archivists manifest vertical panic — we can’t possibly be getting all the records that are out there. I know they’re out there. Hiding from us in attics and basements. Those sons of bitches. COME OUT COME OUT WHEREVER YOU ARE and so here, for example, is Susan Grigg in 1985 [PDF; SAA paywall] writing about the horizontal panic response of the Golden Gophers’ Immigration History Research Center:
Because the collection was founded on the idea that an important segment of historical documentation was generally neglected and needed urgently to be saved, the initial stress was on “gathering in” as much as possible in a short period of time. Notwithstanding the continuing deficiency of all existing collections even after twenty years of effort, it is now clear that except for the early years, a great deal more material has survived than is ever likely to be collected. The gaps in the collection are a challenge to the collecting policy not so much because they are large as because so much is available to fill them.”
…and kind of manifesting her own vertical panic response with that “so much is available to fill them.” There’s no doubt that to have a comprehensive collection, an archives can’t collect from all the genres of human knowledge and experience and endeavor, on this you and I and Susan all agree. But it’s rare for us to interrogate the meaning of comprehensive.
Take the embedded cravings of the otherwise ordinary position [(1983)|SAA paywall] held by James Fogerty — namely that oral histories provide a sound complement to personal papers — again, there’s nothing wrong with this. It’s transparently correct. But read these lines to yourself as if Fogerty were a CIA analyst:
Collections of personal papers are especially weak in the information they provide on the formative years of their donors–years that often hold the keys to perceptions that influenced their subsequent actions. Even correspondence does not betray the author’s inner thoughts […]
Let’s review. We desire not just a comprehensive corpus of all the works of a given person, family or corporate entity, we’re looking for the anima within her. Not content with the actions she took, or the perceptions that precipitated the actions, we’re hunting for the keys to the perceptions. We’re after the letters which betray their author. Our definition of comprehensive pace Fogerty drifts way into the realm of psychoanalysis, if not fascist surveillance — if letters won’t betray the truth, surely three hours of interviewing will deliver us the subject!
And so we arrive at the paranoid style in acquisitions. History is powered by secret movers, unknown pleasures, leaving behind them a seam of records so rich no one could ever plumb it, and even if we knew who might have made them, or where they are now, or how to get our hands on them, we’d still be stuck with the intransigent fact that any sense we made of these keys would be our own, contingent, fleeting, tantalizingly incomplete.