Open Access Publishing as an Alternative to Extractive AI

Imagining democratic AI in academic publishing.

Aug 01, 2024

Back in May, Informa (parent company of Taylor & Francis) penned a $10b deal with Microsoft granting them access to researchers’ work, presumably to train an AI system. I don’t want to linger on the specifics of the deal or my feelings about it—much better takes can be found here, here, and here. Instead, I want to consider how open-access publishing might be an alternative to “academic fracking,” the term Lance Eaton uses to describe the ways academic publishers extract value from academic laborers (mainly researchers) to gain a profit, at the expense of the scholarly environment:

If we understand fracking as an attempt to extract more from an oil well that a company thought was already sucked dry and to do so through a questionable and caustic practice that is environmentally toxic—well, the metaphor fits pretty dang well.
Publishers are taking what they already made money on and trying to get even more from it. In the case of Taylor & Francis, without letting people know about it. This not only creates a (more) toxic environment within academic publishing, where many authors already felt frustrated and disempowered before this but also creates a murky future that is unclear about how scholars’ work will be effectively recognized in ways that count.

Taylor & Francis isn’t the only culprit. Wiley has a deal in place with an as-yet-unnamed technology company, and similar reactions are likely to ensue.

Scopus (owned by Elsevier) has taken a different approach, building its own Scopus AI from its scholarly database. This approach is somewhat better because it is designed to draw attention back to the scholars’ work using the same basic data points that are already represented in their traditional search, with additional capabilities (e.g., summaries and concept maps) added on via genAI. They have also open-sourced their RAG-Fusion model of AI-powered search. Still, they do sell Scopus AI at a premium without providing additional, direct monetary value for authors, so while I give them kudos for keeping the operation in-house and working from responsible AI principles, they haven’t completely gotten out of the extractive model.

So what about truly open-access publishers?

One of my professional roles is as an Associate Publisher for the WAC Clearinghouse, an open-access publisher of original scholarly books and journals, the CompPile database, and resources in the discipline of rhetoric, composition, and writing studies (RCWS). As of this writing, the Clearinghouse has 221 books, 2,053 articles, 116,934 citations in CompPile, and over 37 million downloads. All books and articles are available free in .pdf or .epub formats, and many are available for purchase in print from Parlor Press or the University Press of Colorado (from which the Clearinghouse generates a small amount of revenue).

Screen shot of the WAC Clearinghouse homepage, courtesy of yours truly.

Founding Publisher Mike Palmquist has managed to keep the Clearinghouse running on occasional funds from Colorado State University, fundraising through the Colorado State University Foundation, and a whole lot of volunteer labor (over 200, including me). The result is a collaborative model of open-access publishing that is highly decentralized and high-quality. Many of the published works have received prestigious awards, and Palmquist himself earned the 2024 Exemplar Award from the Conference on College Composition and Communication, not least because of the Clearinghouse’s lasting impact on the field.

To be sure, few, if any, of the Clearinghouse’s volunteers are immune to the high-pressure academic environment that Eaton says is an essential part of academic fracking. However, our response to that pressure is not to extract monetary value from scholarly labor. Instead, we work to add our labor (free of charge) to other academics’ scholarly labor (also typically done free of charge, as Eaton notes) in order to render as open and accessible as possible our scholarly conversations about writing. What money there is has gone to production (mainly copyediting) or to fellowships that bring emerging scholars into the publishing collaborative. The result is almost idyllic: a community of scholars working together—and yes, sometimes arguing—to produce and disseminate scholarly knowledge in a field about which they are passionate. It isn’t a utopia, but it isn’t toxic either.

It may be that the WAC Clearinghouse is possible because RCWS is a relatively tight-knit field compared to, say, the life sciences. Professional relationships have enabled our success along with scholarly expertise and publishing experience. Still, my point is that there are models of publishing that are not extractive, should we wish to put our time, energy, and money into supporting them.

And where does AI fit with all this? Well, we haven’t fixed all aspects of the extraction problems. Truly open-source scholarly work can still be scraped up as training data for LLMs. But at least open-access publishers in the WAC Clearinghouse model won’t be perpetuating academic fracking. Instead, the WAC Clearinghouse has begun publishing resources on AI and writing instruction, thus contributing to scholarly conversations about AI rather than building AI for profit. The publisher has also empowered individual book series and journals to create their own AI policies based on the more general statement about publishing ethics at the organizational level. No doubt, debate about the value and shape of such policies and statements will also continue to be an important part of our work.

To end with some bald speculation: perhaps open-access publishers will begin to make use of AI down the line. If so, they have the opportunity to do it in much more democratic fashion than the big hitters have. If, for example, the CompPile database were to integrate AI search, the code would need to be open source. Authors would need to have the option of their work being represented in the training data or not. Users would need a way to flag results that seem biased in favor of one pedagogical or theoretical school over another, and thus train the model for better representation. And the tension between these last two—authorial choice and representative results—would need to be an ongoing topic of discussion and negotiation.

In short, a non-extractive, AI-enriched publishing operation will take work. But it may be that that work is worthwhile if it empowers authors, editors, and publishers to decide collaboratively how work is represented and disseminated to the discipline and the wider public.

Lance Eaton, Ph.D.

Aug 2, 2024

really appreciate this take (and the mention)...yes, diamond open access is ideal to go, going forward...I still worry about most of scholarship is current trapped behind paywalls and copyright (e.g still works from Einstein that are behind paywalls!!!)...

Expand full comment

1 reply by Christopher Basgier

1 more comment...

Discussion about this post

Ready for more?