It is interesting to read an article in the library literature that I feel is well-researched and well-written, but then to disagree completely with its conclusion.

That’s how I felt when I read Xiaotian Chen’s article “Google Scholar’s Dramatic Coverage Improvement Five Years after Debut,” which appears in the December Serials Review. (It is not freely available online but can be found at DOI 10.1016/j.serrev.2010.08.002 for those with Science Direct subscriptions.) The article demonstrates that Google Scholar is providing 98 to 100 percent coverage of the databases it is allowed to crawl, either because those databases are freely available, or because Google has an agreement with that database publisher.

I first learned of Chen’s article through Peter Murray’s post to the Library Society of the World. Early in that discussion, John Dupuis called attention to the last line of the article: “The conclusion cannot be clearer: libraries can seriously consider cancelling a large number of subscription-based abstracts and indexes since their unique contents and value are rapidly evaporating.”

It’s possible that I’m missing an important piece of information that would change my mind, but I really don’t think that conclusion is clear at all.

Google Scholar doesn’t provide the full text of anything. So if libraries want readers to be able to get past the citation at JSTOR or other subscription-based databases, we can’t drop those subscriptions.

So the logical databases to drop would be the ones that provide indexing and abstracting, but not full text. But there are two problems I can see with that. One, I doubt that those databases would let Google crawl them, so they wouldn’t be duplicated in the Google Scholar database. Second, and more important, the non-full-text abstracting and indexing databases that I’m famliar with in the humanities and social sciences tend to index a lot of works that are not journal articles. And as Chen says in the article, Google Scholar doesn’t do so well with those citations:

It is always possible that a gap exists between Google Scholar and a database that does not allow Google Scholar to crawl. In the 2005 Neuhaus et al. study, databases such as ABI/INFORM, CINAHL, and Historical Abstracts all had low coverage by Google Scholar. Part of the reason was that these databases include some records that Google Scholar does not or cannot index: non-journal records and some records from journals that have ceased publication. Non-journal records include records of newspapers, magazines, trade journals, book chapters, pamphlets, reports, conference proceedings, theses and dissertations. Ceased journals may not have publicly accessible tables of contents on the Web for Google Scholar to index.

So. If we can’t cancel JSTOR and Science Direct and so on because that’s where the full text comes from, and we can’t cancel ABI/INFORM, CINAHL, and Historical Abstracts (and MLA Interntional Bibliography and Philosopher’s Index and ATLAS and so on), what is left to cut? Just the databases that do nothing but index articles that are already held in those full-text archives? I don’t know that we subscribe to anything like that.

So I can’t agree with Chen that the impact of Google Scholar on abstracting and indexing databases “cannot be clearer.” I doubt that Google Scholar is a specialty database killer. It almost certainly is a federated search killer. If a library has already decided that they are interested in sacrificing precise, predictable searching for simple searching and broad results, I’d think they’d be much better off if they foregrounded links to Google Scholar and came up with a coordinated approach to teaching it to students, rather than sinking time into customizing a vendor’s product and money into paying a vendor’s fees.

But Google Scholar as a replacement for subject-specific A&I databases doesn’t make sense to me.