A symptom of strategic inertia at a moment when the industry can least afford it.


When Microsoft unveiled its Publisher Content Marketplace (PCM) in early February 2026, the announcement ricocheted through technology and media circles with predictable enthusiasm.

Search Engine Land, Axios, Windows Central, and a parade of digital media analysts parsed the mechanics, debated the economics, and speculated about competitive responses from Google and OpenAI. In the trade book publishing press, the silence has been deafening.

This absence is not merely an editorial oversight. It reveals a fundamental disconnect between how news publishers and book publishers understand their relationship to the emerging AI economy.

While both industries face the same existential threat from AI-mediated discovery, their responses have diverged sharply.

News publishers, having watched Google erode their traffic for two decades, recognized PCM as a potential lifeline.

Book publishers, still operating within distribution paradigms shaped by Amazon and legacy retail, appear not to have noticed at all.

How the Marketplace Works – And Why It Matters

The Publisher Content Marketplace operates as a two-sided exchange where publishers upload content with defined licensing conditions and pricing tiers, while AI developers browse the catalogue and license material for grounding responses or training models, with publishers receiving usage-based compensation and granular reporting.

Unlike the bilateral licensing deals that have characterised the AI content economy thus far – HarperCollins infamously negotiating $5,000 per title with Microsoft for training rights, academic presses selling backlists to tech firms in bulk – PCM promises scalable infrastructure. Publishers set terms. AI companies select what they need. Usage determines payment.

Not a single trade book publisher appears on the roster. Not a single academic press.
The early partners include Business Insider, Condé Nast, Hearst, The Associated Press, USA TODAY, and Vox Media – all news and magazine publishers optimised for velocity and volume. Not a single trade book publisher appears on the roster. Not a single academic press.

The structural logic becomes clear when we examine what PCM was designed to reward: fresh, frequently updated, high-churn content that can ground real-time AI responses. The kind of material that ages in weeks, not decades.

Why Book Publishers Weren’t Invited to the Table

The pilot partners were selected for good reason. They produce the commodity PCM needs most: current, authoritative material that AI systems can cite when answering queries about breaking news, emerging trends, or rapidly evolving domains.

At Microsoft’s invite-only Partner Summit in Monaco last September, the company told attendees: “You deserve to be paid on the quality of your IP” – a message aimed squarely at publishers whose content was being ingested without compensation while their referral traffic collapsed.

Book publishers occupy a different value architecture entirely. Their content is deep rather than broad, evergreen rather than ephemeral, structured for sustained argument rather than modular citation. The economics of PCM – designed around per-use compensation tied to specific AI queries – don’t map cleanly onto books, which function as complete intellectual artifacts rather than databases of extractable facts.

More fundamentally, book publishers have been slower to engage with AI licensing because their business model has not yet been catastrophically disrupted by it.

Book discovery still flows largely through Amazon, physical retail, and social platforms.
News publishers watched Google’s AI Overviews and ChatGPT’s summarization capabilities gut their traffic through 2025. Book discovery still flows largely through Amazon, physical retail, and social platforms. The immediate crisis feels less acute, even as the long-term threat may be more severe.

The Grounding Versus Training Distinction – And Why It Matters for Books

PCM makes a critical architectural distinction that book publishers must understand: the difference between retrieval for grounding and training for model improvement. Grounding means the AI system pulls specific content to support a particular response – citing an AP article when asked about a recent event. Training means the content becomes part of the model’s foundational knowledge base, influencing how it generates language across all contexts.

For news content, this distinction is relatively straightforward. An AI citing today’s Reuters report about interest rates is grounding. An AI having read 10 million Reuters articles and absorbed journalistic sentence structure is training. News publishers can monetise both: per-query grounding fees and upfront training licenses.

For books, the implications are more complex. A book represents months or years of intellectual labor compressed into a coherent argument. When an AI grounds a response by citing a specific chapter, the value extraction is contained. When an AI is trained on thousands of books and absorbs their rhetorical patterns, narrative structures, and conceptual frameworks, the value transfer is total and irreversible. Training commodifies the entire editorial investment of a catalogue.

This is why the Authors Guild has argued forcefully that AI training rights remain with authors unless explicitly negotiated, noting that training is “not a new book format, not a new market, not a new distribution mechanism” but an entirely separate right.

That’s arguable, but for another time. I’m just glad to see a rational argument, rather than the kneejerk nonsense from the Authors Guild’s Luddite UK counterpart, the Society of Authors.

Here’s the thing: HarperCollins’ approach – seeking individual author permission for Microsoft’s training deal at $5,000 per title, split 50-50 – represents one model. But PCM proposes something different: a marketplace where usage patterns determine value dynamically, and where grounding and training may ultimately blur.

What Book Publishers Stand to Lose

The strategic danger for book publishers is not that they were excluded from PCM’s pilot phase. It’s that the infrastructure being built may not accommodate their content model at all.

Consider the marketplace mechanics. PCM rewards content that performs in the AI discovery layer – material that AI systems want to cite frequently because it’s timely, specific, and audience-aligned. News fits this model perfectly. An AI asked about inflation policy wants yesterday’s Federal Reserve analysis, not a 2019 economics textbook. But an AI asked to explain monetary theory might well draw on that textbook without ever surfacing it to the user. The value is extracted invisibly.

This creates a perverse incentive structure. Publishers whose content serves as foundational knowledge – the books that shape how AI systems understand history, science, philosophy, business – may generate less measurable “usage” than publishers whose content sits atop the citation layer. PCM’s usage-based compensation model, designed for programmatic media buying, may systematically undervalue depth in favor of immediacy.

Academic and educational publishers face an even sharper version of this problem. Their content is expensive to produce, slow to monetise, and essential to authoritative AI responses. A medical AI trained on peer-reviewed journals or an engineering AI built on technical manuals depends entirely on that specialised corpus. Yet the usage that generates revenue may be invisible: the AI doesn’t cite the textbook, it simply knows because it read the textbook during training.

The Economic Mismatch

Book publishers also operate under rights structures that complicate marketplace participation. News organisations typically hold clear, unencumbered rights to their own content. Book publishers negotiate complex webs of author contracts, territorial splits, subsidiary rights, and reversion clauses. Licensing a book for AI training requires author consent, agent negotiation, and often amendments to decades-old contracts that never contemplated machine learning.

Many existing publishing agreements include clauses stating that rights not expressly granted are reserved to the author. This means publishers can’t simply upload their backlists to PCM without securing individual permissions – a logistical nightmare for houses with catalogues running to thousands of titles. News publishers, by contrast, can make platform-wide decisions and move quickly.

What should a book earn when an AI cites a single paragraph? When it synthesises ideas from five different books without explicit attribution?

The revenue question is equally vexed. HarperCollins’ $5,000-per-title deal suggests one valuation framework, but that’s for training rights sold in bulk, not per-use grounding fees. What should a book earn when an AI cites a single paragraph? When it synthesises ideas from five different books without explicit attribution? When it’s trained on a novel and later generates prose in a similar style? PCM’s usage-based reporting promises transparency, but the pricing models remain opaque, and book-specific benchmarks don’t yet exist.

Microsoft Versus the Alternatives

It’s worth noting that Microsoft is not the only player building AI content marketplaces. Startups abound – I won’t name them for fear of accusations of bias to those I remember and malice to those I forget. But publishers generally view Microsoft favorably in this space, appreciating the company’s messaging that “you deserve to be paid on the quality of your IP” and its efforts to build what could be a functioning information economy. OpenAI, Google, and Meta have pursued bilateral licensing deals but have shown less interest in open marketplace infrastructure.

If Microsoft succeeds, it could establish the de facto standard for how AI systems compensate content creators. If it fails, or if book publishers remain outside the system as it scales, the result may be a bifurcated AI economy: news and magazine content flowing through structured, compensated channels while book content remains in the legal gray zone of scraping, fair use arguments, and class-action lawsuits.

Scenarios for Trade Books in the AI Marketplace

Looking forward, several pathways seem plausible:

Scenario 1: Microsoft Expands PCM to Books Microsoft could adapt PCM to accommodate book-specific economics – longer licensing terms, higher per-work valuations, hybrid training/grounding models. This would require solving the rights clearance problem, developing new pricing mechanisms, and convincing book publishers that the revenue potential justifies the administrative burden. Possible, but not imminent.

Scenario 2: A Competitor Builds a Book-Specific Marketplace OpenAI, Google, or a specialised entrant could design a marketplace optimised for deep content rather than real-time news. This might look more like music licensing – blanket deals with royalty flows based on model usage – than programmatic ad tech. The challenge is that large language models have already been trained on millions of books scraped from the internet (not all illegally). Why pay now?

Scenario 3: Regulatory Intervention Forces Licensing The UK’s Competition Markets Authority, the EU’s AI Act, or legislative action in the U.S. could mandate that AI companies license training data rather than rely on fair use defenses. This would transform scraping from a low-cost default into a legal liability, creating sudden demand for structured licensing. But Executive Order aside, regulators are unlikely to force a separation between search indexing and AI training on a timeline that matters for 2026, meaning any regulatory solution remains years away.

Scenario 4: Book Publishers Stay on the Sidelines – And Lose If book publishers wait for the perfect marketplace to emerge, they may find the AI economy has moved on without them. Training data harvested via scraping becomes the norm. Grounding happens through unlicensed retrieval. Revenue flows only to publishers who acted early, and everyone else is left arguing about fair use in court while AI systems built on their content generate billions in value.

Why Trade Publishers Can’t Afford to Ignore This

The silence in the trade press is not a sign that book publishers are thinking deeply about their AI strategy. It’s a sign that they haven’t recognised the stakes.

Book publishers face the same dynamic, just on a delayed fuse.
PCM represents a structural shift in how digital content gets valued and compensated in the age of generative AI. News publishers are racing to participate because they understand that AI-mediated discovery will replace search traffic as the primary driver of audience attention. Book publishers face the same dynamic, just on a delayed fuse.

If AI becomes the dominant interface for research, learning, and knowledge work – and all signs suggest it will – then books that exist only as unstructured text on servers or in print warehouses will become invisible. Books that are structured, tagged, and available through licensing systems will become the authoritative sources AI systems prefer. This is not about protecting yesterday’s business model. It’s about ensuring relevance in tomorrow’s information architecture.

The deeper question is whether Microsoft, or any platform operator, will design a marketplace that values what book publishers produce. If PCM remains optimised for high-velocity news, books will be structurally disadvantaged. If no viable alternative emerges, publishers may face a choice between accepting unfavorable terms or remaining outside the AI economy entirely – neither of which is sustainable.

The Fiction Question: Why Narrative May Be the Most Valuable Content of All

The discussion thus far has inevitably centered on non-fiction – news, reference material, academic content – because that’s what PCM’s pilot partners produce. But fiction represents a profoundly different proposition, and in fact a far more valuable one for AI developers.

When AI companies train language models, they’re not merely teaching systems to retrieve facts. They’re teaching them to generate language that feels human: to construct narratives, develop character voices, build dramatic tension, deploy metaphor, manage pacing, and navigate the infinite subtle choices that make prose compelling rather than mechanical. Fiction is the training ground for linguistic sophistication.

This is why AI companies have been particularly aggressive in ingesting fiction without permission, and why authors’ groups have been particularly vocal in opposition.

A model trained on Reuters content can answer factual queries. A model trained on Alice Munro, Kazuo Ishiguro, and Colson Whitehead can write in ways that move people. The commercial applications – marketing copy, screenplay drafts, interactive narratives, personalized storytelling – are vast and lucrative. Fiction isn’t supplementary to AI training; it’s central.

Yet fiction has been conspicuously absent from the structured licensing conversation. HarperCollins’ infamous Microsoft deal covered “select nonfiction backlist titles” specifically, avoiding the thornier questions around creative works. This suggests AI companies recognise that fiction licensing is both more valuable and more legally fraught – authors have stronger moral rights claims, fan communities are more protective, and the “originality” that makes fiction valuable is precisely what to some makes unauthorised use feel like theft rather than indexing.

Fiction’s Valuation Problem in a Grounding-First Marketplace

PCM’s current architecture creates a strange inversion for fiction. The platform rewards content that grounds specific queries – answering questions, providing citations, delivering factual precision. Fiction rarely functions this way. Users don’t ask an AI to cite The Remains of the Day when discussing memory and regret, even though the novel is arguably more illuminating than any psychology paper on the subject. Fiction’s value to AI systems is almost entirely in training, where it shapes the model’s generative capabilities across millions of future outputs.

This means fiction operates in the least transparent part of the value chain.

When a news article grounds an AI response, usage can be tracked and compensated. When a novel improves an AI’s ability to write dialogue, the value accrues invisibly across every conversation the model has forever after. Under a grounding-based compensation model, fiction would be systematically undervalued despite being essential to what makes AI systems commercially viable.

A functional fiction marketplace would need to flip the economic model: high upfront training fees based on literary quality and influence, rather than per-use grounding fees based on citation frequency. This is closer to how music licensing works – blanket licenses for catalogues, with royalties distributed based on usage proxies – than how programmatic advertising works.

The Minor-Language Opportunity: Where Fiction Becomes Strategic Infrastructure

Here’s where the fiction question intersects with a broader strategic opportunity that Microsoft – or a competitor – could pursue: minor-language content licensing.

Current frontier AI models are overwhelmingly trained on English-language content, with secondary focus on major languages like Mandarin, Spanish, and French. This creates significant performance gaps: models struggle with languages that have limited digital corpora, producing lower-quality translations, weaker cultural understanding, and less sophisticated generation in those languages. For AI companies expanding globally, this is a critical bottleneck.

Fiction publishers sitting on extensive catalogues in Dutch, Swedish, Polish, Korean, Turkish, Arabic, and dozens of other languages possess exactly what AI developers need most: high-quality, linguistically rich narrative content that can teach models how these languages actually work in creative contexts. Not technical manuals or news articles, but novels and short stories where the full expressive range of the language is deployed.

The Global Fiction Marketplace That Doesn’t Exist Yet

Imagine a marketplace designed not for real-time grounding but for comprehensive training access, with particular emphasis on fiction and minor-language content:

Publishers and authors upload complete works – novels, story collections, poetry, drama -with clear metadata about genre, language, cultural context, and literary significance.

AI developers license catalogues rather than individual works, paying tiered fees based on language scarcity (content in Icelandic, for example, commands a premium over English), literary quality (prize-winners and canonical works cost more), and exclusivity (first-mover access to a publisher’s entire backlist).

Compensation reflects training value rather than grounding frequency, acknowledging that a single novel might improve model performance across billions of outputs without ever being directly cited.

Authors receive transparent reporting on which works were included in which training runs, with ongoing royalties as models are updated and redeployed.

This would serve multiple strategic purposes simultaneously. For AI companies, it solves the minor-language bottleneck and provides legal clarity around fiction training. For publishers in smaller markets – Norwegian houses, Czech imprints, Thai publishers – it transforms their catalogues from regional assets into global AI infrastructure. For authors, it creates a new revenue stream from work that’s currently being scraped without compensation.

Why Fiction Publishers Might Resist – And Why They Shouldn’t


The obvious concern is that licensing fiction for training will enable AI systems to generate competing content. If a model is trained on John Grisham, won’t it produce Grisham-style thrillers that cannibalise his market?

This fear is not irrational, but it misunderstands both how AI generation works and where the real threat lies.

AI systems trained on fiction don’t produce exact stylistic clones; they learn general patterns of narrative construction that inform all their outputs. A model that has read Grisham generates better legal dialogue in any context, not just when writing courtroom dramas.

The competitive threat from AI-generated fiction exists regardless of whether publishers license their catalogues – models are already being trained on scraped content. The question is whether authors get paid.

More importantly, the alternative to structured licensing is not that AI companies abandon fiction training. It’s that they training. It’s that they continue scraping without permission, arguing fair use, and fighting class-action lawsuits for years while the training continues. Right now that is likely happening. Publishers received greenlights from two judges in quick succession, saying training on copyright content is fine so long as it’s acquired legally.

Sure, authors may ultimately prevail in court, but by the time they do, the models will already be deployed and the economic value extracted. Licensing offers something litigation cannot: immediate compensation and ongoing participation in the value chain.

Fiction as the Test Case for AI Content Economics

If Microsoft or a competitor successfully builds a fiction and minor-language marketplace, it would establish a crucial precedent: that training rights for creative works have distinct economic value and require explicit licensing. This would ripple across the entire content landscape, forcing AI companies to negotiate with publishers, literary agencies, and authors’ groups rather than simply scraping and defending later.

It would also reveal whether the AI industry is genuinely willing to pay for the content that makes their models valuable. News licensing is relatively cheap – articles have short commercial lifespans and abundant alternatives. Fiction licensing, especially for established literary works, would cost real money. If AI companies balk at those terms, it suggests the economics of “fair compensation” were never serious, just a public relations strategy to forestall regulation.

For trade fiction publishers, the strategic imperative is clear: engage now, while leverage still exists, before training on scraped fiction becomes so entrenched that licensing feels unnecessary to AI developers. The window in which publishers can negotiate from strength rather than desperation is limited.

And for publishers in minor-language markets, this may be the single best opportunity to monetise their catalogues in a generation. The global English-language market is saturated and competitive. The global AI training market for linguistic diversity is wide open, and publishers who move first will set the terms.

The Case for Strategic Engagement

Book publishers should treat PCM not as an immediate opportunity but as a signal. It demonstrates that structured, licensed content marketplaces can exist, that tech companies are willing to pay for quality IP under the right conditions, and that the era of free-for-all scraping is ending – at least in some sectors.

The strategic move is not to demand entry into PCM as it exists today but to begin building the infrastructure that would make book licensing viable: clear author consent mechanisms, standardized metadata, usage tracking capabilities, and industry-wide pricing frameworks. Publishers who invest in these systems now will be positioned to negotiate from strength when marketplace opportunities scale. Publishers who wait will be presented with take-it-or-leave-it terms designed for someone else’s content.

The trade press silence is not a minor editorial gap. It’s a symptom of strategic inertia at a moment when the industry can least afford it. News publishers have mobilised because they’ve already felt the pain. Book publishers still believe they have time. They may be right. But the margin for error is narrowing fast.


This post first appeared in the TNPS LinkedIn newsletter.