News Corp and Meta have signed a multi year content licensing agreement that pays the publisher up to $150 million over three years, structured as up to $50 million per year, in exchange for the rights to use News Corp journalism and books inside Meta's AI products. The deal, reported first by The Media Copilot and confirmed by Dataconomy, The Decoder, and Engadget, covers what is arguably the most commercially valuable English language news bundle on the planet: the Wall Street Journal, the New York Post, The Times of London, the Sunday Times, Dow Jones newswire copy, and the backlist of HarperCollins. Meta gets two things in return. It gets the right to train its AI models on the corpus, and it gets the right to surface that content inside chatbot responses across its consumer surfaces. News Corp gets a predictable annual check and a seat at a table that, until recently, nobody was entirely sure existed.
That table now exists, and the seat costs roughly $50 million a year. What is remarkable about the News Corp deal is not the number, which is large but not industry shaking, but the way News Corp CEO Robert Thomson has chosen to describe what his company just did. In public remarks after the signing, Thomson repeatedly characterized News Corp's journalism, its backlist, and its business newswire as "AI inputs." He did not hedge. He did not frame the licensing agreement as a defensive move forced on an unwilling seller. He described the transaction as the natural consequence of the product News Corp now makes, which is, on his telling, training data and grounding material for large language models. That framing is the actual news, and it is the framing that every publisher in the English speaking world is going to have to respond to over the next twelve months.
The Math on $50 Million a Year
Begin with the price, because the price is where the argument lives. Fifty million dollars a year is a large line item for a single licensing customer and a small one compared to the economics of the underlying publications. The Wall Street Journal alone generates subscription revenue that comfortably exceeds a billion dollars a year. The New York Post's digital operation, Dow Jones newswire, The Times, and HarperCollins book sales layer on top of that. Against that base, $50 million is somewhere between a single percentage point and a rounding error, depending on which parts of News Corp you are counting and how you account for shared infrastructure. The check from Meta will not save the WSJ newsroom. It was never going to.
The reason the price still matters is that it establishes a comparable. Every publisher who sits down with a lab from this point forward has a reference transaction for one of the most prestigious Anglophone news bundles in existence. If News Corp extracted $50 million a year for the WSJ, the Post, The Times, Sunday Times, HarperCollins, and Dow Jones combined, then a thoughtful agent representing a regional daily or a mid sized magazine group now knows the rough ceiling on what their own catalog can fetch. The math is cold. The Meta deal values the entire News Corp content universe at somewhere in the neighborhood of $50 million annually on the highest tier of its tiered payment structure, which implies a per article price that is small even by digital advertising standards. A wire story worth pennies on a CPM basis is worth fractions of a penny as training data. The aggregate is only interesting because training runs ingest the whole corpus.
The comparable that matters more is the OpenAI deal News Corp signed in May 2024. That agreement was worth up to $250 million over five years, or roughly $50 million a year, and it covered substantially the same content pool. The Meta arrangement is therefore not a step up in price. It is a second check from a second buyer at approximately the same rate. News Corp has effectively doubled its run rate on licensing revenue without changing what it sells or what it charges, by persuading a second major buyer that the first buyer had set a reasonable price. That is a meaningful precedent. The training data market is now a market where the same inventory can be sold repeatedly to non overlapping buyers, and the second sale tells you that the first sale was not a fluke.
Robert Thomson and the AI Input Framing
The more consequential shift is rhetorical. Robert Thomson has been News Corp's chief executive since 2013, and he has spent most of that time describing the company's mission in the familiar language of journalism: trusted reporting, quality brands, premium audiences, subscription relationships, and so on. In the post signing commentary around the Meta deal, that vocabulary has shifted. Thomson is now describing News Corp as a company that produces inputs for AI systems, in addition to everything else it produces. He has used the phrase publicly, repeatedly, and without qualifying it as metaphor. The framing is deliberate, and it is worth taking seriously as a statement about how a legacy media CEO now understands the product his company sells.
There are two ways to read this. The cynical reading is that Thomson is performing for his shareholders and for Wall Street analysts who want to see a growth story that is not subscription growth, because subscription growth at the WSJ has been flat for most of the past three years. The AI input framing is a way of reclassifying News Corp from a declining media conglomerate into a provider of scarce training data for an industry that is visibly scaling. That repositioning is valuable to a stock price even if the underlying cash flows do not change dramatically. The cynical reading is probably partially correct. It is also incomplete.
The more interesting reading is that Thomson is describing what has actually happened to News Corp's revenue mix. The company now has two large annual checks from two of the most aggressive frontier labs, plus an internal AI tool called NewsGPT, plus a growing line item in its financial disclosures that did not exist three years ago. The "AI input" language is a way of acknowledging that a non trivial slice of News Corp's top line now comes from selling its archive by the pound to training pipelines rather than by the article to human readers. When a CEO changes his vocabulary about what his own company does, it is usually because something underneath the vocabulary has already changed. The money always precedes the language. The language is the lagging indicator.
The Dual Strategy and NewsGPT
What makes the News Corp strategy interesting, as opposed to simply profitable, is that Thomson is pursuing two tracks at once. Track one is licensing the corpus to other companies for training and retrieval, which is the strategy on display in the Meta and OpenAI deals. Track two is building internal AI tooling on top of that same corpus, under the NewsGPT banner, which gives News Corp reporters and editors retrieval augmented access to their own archive and to external context. Each track individually is a rational response to the arrival of large language models. The combination is something more ambitious.
The combination says that News Corp intends to occupy both sides of the training data trade. It will sell its archive to anyone willing to pay the license fee, and it will simultaneously use the same archive to compete on the output side, with an AI assisted newsroom and, eventually, with AI assisted products that sit somewhere between a newsletter and a chatbot. The strategic logic is that the licensing revenue underwrites the internal tooling, and the internal tooling keeps the archive fresh and valuable as an input. Every article a WSJ reporter publishes with NewsGPT assistance is simultaneously a piece of journalism, a training example, and a justification for the next licensing renewal. The three uses compound.
This is the move that sets News Corp apart from most other publishers in 2026. The New York Times, for example, is pursuing one track (litigation) and gesturing vaguely at the other (internal tooling). The Associated Press has licensed to OpenAI but has not built a visible internal AI product. Most regional chains are doing neither. News Corp is doing both, publicly, and doing them at the same time. The dual strategy reduces the company's dependence on any single buyer and gives it optionality if the licensing market softens or if the frontier labs decide that synthetic data has replaced the need for premium news. In a market where nobody is sure how long the training data gold rush will last, holding two different claims is better than holding one.
Two Flavors of Response: Sue or Sign
The broader media industry in 2026 has effectively sorted itself into two camps on the question of how to respond to the arrival of frontier models that consume the entire open web as a feedstock. The first camp is the lawsuit camp, anchored by the New York Times's litigation against OpenAI and Microsoft, and joined by a growing roster of smaller publishers and authors who are pursuing copyright and unfair competition claims. The theory of the lawsuit camp is that the labs cannot be allowed to train on premium journalism without permission, and that the courts will eventually set a price for that permission that is higher than anything a voluntary negotiation would produce. It is a high variance strategy. If the publishers win at trial or force a large settlement, they set a precedent that reshapes the entire training data market. If they lose on fair use grounds, they have handed the labs a permanent green light.
The second camp is the license camp. News Corp is its most visible member, but it is not alone. The Associated Press, Axel Springer, Le Monde, the Financial Times, Vox Media, Dotdash Meredith, and Condé Nast have all signed some version of a licensing agreement with one or more frontier labs over the past twenty four months. The theory of the license camp is simpler: the content is already being used, the labs have more cash than the courts can extract in the near term, and a bird in hand at $50 million a year is worth more than a hypothetical billion dollar judgment several years from now. The license camp is playing for cash flow, not precedent.
The two camps are not entirely opposed. A publisher can sue one lab while licensing to another, and several have quietly done exactly that. But the two strategies reflect genuinely different readings of how the legal and commercial landscape will evolve. The license camp is betting that settlement norms will drift toward a standard rate card and that early signers will be treated as preferred partners. The lawsuit camp is betting that the norms have not hardened yet and that a successful case will rewrite the rate card from above. Both bets could theoretically pay off. Only one of them pays off in 2026 cash flows. News Corp has chosen the one that pays off now, and it has paired that choice with a public theory of the case, delivered in Thomson's AI input language, that is designed to make the choice look inevitable rather than merely convenient.
The Training Data Market Has a Size Now
Until about eighteen months ago, it was possible to argue in good faith that there was no real market for news content as training data. Labs either scraped the open web and hoped fair use covered them, or they paid small sums to secondary aggregators, or they bought bulk access to academic and book corpora where the rights picture was clearer. News publishers were largely outside the frame. The Reuters Institute's 2025 digital news report noted that essentially no mainstream publisher had received material AI licensing revenue before the OpenAI News Corp deal closed.
That is no longer true. A training data market for premium journalism exists, it has observable prices, and it has a rough total addressable size that analysts are now willing to estimate in public. The OpenAI News Corp deal was $250 million over five years. The OpenAI Axel Springer deal was reportedly in the tens of millions per year. The Reddit data licensing arrangement with Google was worth around $60 million a year. Meta's new News Corp deal lands in the same neighborhood. Stacking the visible public agreements together produces a current run rate for licensed training content that is approaching the low single digit billions of dollars annually, and that is before including the substantially larger sums the labs are paying to hyperscalers for synthetic data pipelines, to wire services, and to scientific publishers. It is a real market now, with real comparables, and News Corp has made itself one of the top two or three suppliers in it.
The more interesting question is who is not selling. The New York Times is in active litigation and is therefore not a seller. Bloomberg has so far declined to license its terminal content for obvious reasons, since the terminal is the crown jewel and the value of its real time financial data is entirely incompatible with a training rights grant. Reuters has licensed some historical archive but has been careful about its breaking news feed. Several major European public broadcasters have stayed out entirely because of public service mandates that make commercial licensing politically complicated. The holdouts form a meaningful second layer of the market, and the question of whether any of them break in the next twelve months is the single most important variable in how the training data economy looks by the end of 2026.
What Meta Actually Gets That Scraping Could Not Provide
A legitimate objection to deals like this one runs as follows: Meta could already scrape most of the News Corp corpus from the public web. The WSJ has a paywall, but it has also had Google's first click free and various syndication arrangements, and large chunks of its archive have been indexed by search engines for years. The New York Post is largely open. HarperCollins books are not on the open web but they are on the libraries and shadow libraries that have been used to train most existing models. If Meta wanted the underlying text, it could get the underlying text. What does $150 million buy that scraping does not?
Three things, and they are worth enumerating because they explain why the labs are paying at all. The first is legal cover. A signed license agreement transforms an ambiguous copyright question into an unambiguous contract, and the value of that transformation is a function of how much the lab stands to lose in a worst case adverse ruling. For a company of Meta's size, the downside of losing a high profile copyright trial to a News Corp caliber plaintiff is catastrophic both in damages and in precedent. A $150 million license, viewed as insurance, is obviously cheap against that downside.
The second is clean data. Scraped content is messy. It contains navigation cruft, advertising markup, paywall fragments, duplicated syndication, and a hundred other artifacts that have to be cleaned out before the text is usable for training. A direct license typically comes with a feed, meaning News Corp hands Meta a structured stream of articles with metadata attached. The cost of cleaning scraped data at scale is not small, and in some cases it exceeds the licensing fee that would have been paid for clean data in the first place. Lab engineers have been making this argument internally for years. The Meta deal is, in part, a vindication of their spreadsheet.
The third is freshness. Scraping the historical archive is one problem. Getting tomorrow's article ninety seconds after it publishes is a different problem. A direct feed from Dow Jones or the WSJ gives Meta's chatbot surfaces grounded, current, and citable answers to questions about events in the world. For a consumer product that is trying to compete with ChatGPT on the question of whether the user can ask "what happened today" and get a useful answer, freshness is the product. Meta has been visibly behind OpenAI on that dimension, and the News Corp agreement closes most of the gap at one stroke. It is not a coincidence that the structured rights in the deal explicitly cover chatbot responses as well as training.
Copyright, Fair Use, and the Slow Drift of Settlement Norms
Underneath all of this is a legal question that has not actually been answered yet: is training a large language model on copyrighted text a fair use? The US courts have been circling the question for two years without resolving it. The Andersen v. Stability AI case produced a set of narrow rulings. The NYT v. OpenAI case is still in discovery. The authors' class actions against Meta and OpenAI have survived motions to dismiss but have not reached summary judgment. No appellate court has issued a decision that settles the core question of whether a training run is a transformative use of the source material.
In the absence of a legal answer, the industry is producing a commercial one. Every licensing deal that gets signed, and every deal that gets publicly disclosed, contributes to a settlement norm that does not require a court to enforce it. If enough labs pay enough publishers enough money for long enough, a future court that finally reaches the fair use question will be writing its opinion into a landscape where licensing is already the default commercial practice. That changes the legal calculus. A judge who might have been willing to declare training fair use in 2023, when no meaningful license market existed, is less likely to do so in 2027 after three years of $50 million deals have established that a market is both possible and active. The existence of the market becomes its own evidence that the market should exist.
That drift is what makes the News Corp strategy structurally clever, and what makes the New York Times strategy structurally risky. The Times is litigating in an environment where the settlement norms are hardening against its position every quarter. Every time another publisher signs, the Times's argument that training on news is an uncompensated taking becomes slightly less persuasive, because the counter factual where publishers are in fact being compensated is visibly unfolding in the trade press. News Corp is not just extracting cash from Meta. It is also, incidentally, eroding the factual premise of the Times's lawsuit. Whether Thomson intended that effect or not, it is one of the things his deal is accomplishing.
What to Watch Next
The next ninety days are likely to produce three data points that will clarify whether the News Corp deal is the ceiling of the training data market or merely another step on a longer staircase. The first is whether any wire service signs a Meta agreement of its own. Reuters and the Associated Press are the obvious candidates. Both have unique value as real time news sources, and both have been in quiet conversations with multiple labs for over a year. A wire service deal would signal that the licensing market has moved past feature archives and into live feeds, which would materially change what chatbots can honestly answer.
The second is whether any of the litigation camp publishers defect. The most interesting question is not whether the New York Times settles, since the Times has staked too much of its public posture on the lawsuit to walk it back cheaply. The more interesting question is whether a mid tier publisher currently riding the NYT's legal coattails decides that a signed check is worth more than a theoretical judgment share. If even one publisher in that posture flips, the litigation alliance loses some of its collective weight.
The third is whether News Corp's "AI input" framing starts showing up on other CEOs' earnings calls. Language like that spreads slowly and then all at once, and it is the single clearest indicator of how the publishing industry has internalized its relationship with the frontier labs. If other chief executives begin describing their own archives in similar terms, the market will have moved from a handful of opportunistic deals to an emergent consensus about what a modern media company actually sells. At that point, the question will no longer be whether to license. It will be whose license is worth more, and how many labs are bidding.
For now, News Corp has two checks, a public theory, an internal tool, and a rhetoric that reframes its entire business around its usefulness to AI systems. That is a lot of weight to put on a single deal. It is also the most articulate statement any major publisher has made about where the center of gravity in the media business is moving. Whether Robert Thomson is right about his own company is almost beside the point. He is saying something out loud that other CEOs have only been thinking, and in a market where settlement norms are drifting toward licensing as the default, saying it out loud first is a form of pricing power. The rest of the industry now has to decide whether it agrees, and how much its own agreement is worth.