A renaissance for structured journalism

A renaissance for structured journalism
Photo by Glen Carrie / Unsplash

For all the anxiety over how AI will upend journalism as we know it, I continue to believe its best immediate uses in our profession remain exceedingly mundane.

Among them, extracting structured data from unstructured text: identifying people, places, and events buried in articles; applying metadata and complex taxonomies that reporters and editors don’t have the patience to maintain; normalizing things like locations, spellings, names, and entities so they line up cleanly with external databases.

It’s tedious and unglamorous work. It also happens to be the foundation of some of the most impactful product, storytelling, and business model innovations our industry has launched over the last 20 years.

The New York Times turned archived recipe articles into a product that now forms a core pillar of its bundle strategy. Politifact won the Pulitzer Prize and completely rebooted the concept of the fact check. On the business side, the Washington Post and others use taxonomies to target ads contextually and ensure brand safety. And on and on.

Doing this used to be really hard. Now, with the help of large language models and their supporting technologies, it is, if still not quite “easy,” at least achievable to most news organizations at a reasonable cost.

In light of that, I want to turn some attention back to articles written by friends, colleagues, and people I admire — some of which were published as far back as the early 2000s. Given the culture and technology of the time, some of these ideas felt like science fiction. They’re worth a fresh look as we roll into 2026:

  • A fundamental way newspaper sites need to changeby Adrian Holovaty, September 2006: “The problem here is that, for many types of news and information, newspaper stories don’t cut it anymore. So much of what local journalists collect day-to-day is structured information: the type of information that can be sliced-and-diced, in an automated fashion, by computers. Yet the information gets distilled into a big blob of text — a newspaper story — that has no chance of being repurposed.”
  • The annotated archiveby Derek Willis, May 2005: “I love archives. But you know what I’d really love? An annotated archive. An archive that doesn’t just display vertical depth going back years but can show relationships between archived items and the individuals and institutions named within them. An archive that can help find connections easier and can help new or unfamiliar users get up to speed quickly.”
  • The end of the story — as we know it, by Jeff Jarvis, October 2008: “Online, we have so many more means to present and explain news. News becomes a process more than a product. We can stretch the timetable so that news need not expire into chip paper after a day. News can be updated, corrected, expanded, discussed, linked. So what is that essential unit of news, post-article?”
  • Finding stories in the structure of data, by Matt Waite, May 2013: “If you think about it long enough, nearly every routine kind of story that journalists cover has a structure. Fires, car accidents, government meetings and crimes, sure. But what about sports, wedding announcements, obituaries, local festivals and business openings? If you could harness that structure, you could show people things that would be considered stories, aggregated by being near their house, or, slicing it differently, by type of event (like thefts), or by demographics, instead of by local proximity.”

There are many, many more examples of this. Remember CircaHomicide Watch? The BBC’s pioneering efforts around knowledge graphs? All of them were based on the premise that if we can structure information, we can pull it apart and remix it, surface things that people might have missed, or even synthesize new knowledge. They’re also mostly gone now. Not because they were bad ideas, but because they were ahead of their time.

It might be the data journalist in me, but whether you want to build chatbots, launch innovative story forms, craft coverage strategies, target ads, build reliable propensity models, create hyperlocal newsletters, personalize your homepage, find value in your archives, or even sell your data to AI companies, reliable structured data is the rare-earth raw material required to make your vision real.

The cost of creating it, both in terms of money and time, is sprinting toward zero. Startups are raising millions of dollars to do this for other industries. I suspect that in 2026, more news organizations will turn a fresh eye toward exploring what it could mean for the mission and business of journalism.

Originally published in Nieman Lab's Predictions for Journalism 2026.

Share