I have been thinking a bit this past week or so about books–books as objects, things made of paper; and books as concepts, as long-form written works that might be on paper or a computer screen, or a yet-to-be-invented beautiful electronic reading machine.

I was prompted in part to think about this from the comment that David Lee King wrote on my post Writing and talking about library 2.0.* Here’s the part I’m thinking about:

I don’t denigrate books. I denigrate the container, not the content – two very different things. Books as a format I think will stick around for a very long time. The paper they are printed on? Well… I have a Sony E-Reader in my office right now for staff to play with.

If prodded, I’ll bet David would admit that most paper books on our shelves today will outlast the Sony E-Reader on the order of a few hundred years at least. But that’s being overly specific, and not the real problem. The problem, I believe, is that a book isn’t really a “container.” A book is a book.

That doesn’t mean that electronic books aren’t a worthwhile endeavor, or that it is impossible to make e-books that are worth using. I have read books on my Palm Pilot, and expect to read many more e-books on more usable devices in my lifetime. But if we fail to take into account the “bookishness” of books, we run the risk of making some terrible errors.

I had already been trying to get my thoughts together along these lines when I found Paul Duguid’s recent article for First Monday, “Inheritence and loss? A brief survey of Google Books.” In the article, Duguid uses the Google Books results for Tristram Shandy to see how the project handles a problematic text like Laurence Sterne’s novel.

One would think that scanning the pages would be enough to create a usable e-book, but in the cases that Duguid examines, it just isn’t. Some of the reasons Duguid covers:

  • some scans are simply bad, missing parts of the page or illegible for significant parts of the page, or completely blank;
  • there are mistakes or omissions in the metadata, such as mistaking the list of illustrations for the book’s table of contents, or not clearly identifying the parts of a multi-volume set;
  • Google’s ranking algorithm seems to prefer odd, substandard versions of the work due to copyright or other restrictions.

Some of these things would be less problematic for a less complicated work than Tristram Shandy, but I expect the problems with Shandy are by no means unique.

From Duguid’s conclusion:

Google Books takes books as a storehouse of wisdom to be opened up with new tools. They fail to see what librarians know: books can be obtuse, obdurate, even obnoxious things. As a group, they don’t submit equally to a standard shelf, a standard scanner, or a standard ontology….Even with some of the best search and scanning technology in the world behind you, it is unwise to ignore the bookish character of books. More generally, transferring any complex communicative artifacts between generations of technology is always likely to be more problematic than automatic.

Not incidentally, I believe I first came across a link to Duguid’s article in Dorothea Salo’s del.icio.us stream. Dorothea, as ever, is way ahead of me on this, having been “ranting” about similar topics since 2003 (and possibly before). In her post from that year, No, it really is that hard–a response to a person who thinks that encoding books digitally is a simple, straightforward process–she writes,

Nine times out of ten, these yahoos have utterly forgotten that there’s any book in the world more complicated than, say, a Robert Ludlum novel. (I don’t think these yahoos actually set foot in libraries, though I suppose I could be wrong—they could merely suffer from acute tunnel vision.) The rest of us don’t have that luxury…. We have to sweat over math, art, indexes, tables, links, complex layouts, production workflows, metadata, non-Roman alphabets, digital preservation issues, and all that fun stuff.

(Several of those e-books I mentioned reading on my Palm Pilot were Cory Doctorow’s novels. Dorothea did the HTML markup on them. Which is random, and cool, and sorta beside the point, but I thought I’d mention it here anyway.)

If you find the Duguid article interesting, you might also try these links:

* If it seems like David Lee King is my new bête noir, I don’t really think that’s the case. I believe he and I have a lot of views in common, so the areas where differ stand out to me, and I find them worth investigating. It’s also evident that he welcomes the discussion.