Science & technology | Organising the web

The science of science

How to use the web to understand the way ideas evolve

|

COMPUTER scientists have long tried to foist order on the explosion of data that is the internet. One obvious way is to group information by topic, but tagging it all comprehensively by hand is impossible. David Blei, of Princeton University, has therefore been trying to teach machines to do the job.

He starts with defining topics as sets of words that tend to crop up in the same document. For example, “Big Bang” and “black hole” often will co-occur, but not as often as each does with “galaxy”. Neither, however, would be expected to pop up next to “genome”. This captures the intuition that the first three terms, but not the fourth, are part of a single topic. Of course, much depends on how narrow you want a topic to be. But Dr Blei's model, which he developed with John Lafferty, of Carnegie Mellon University, allows for that.

This article appeared in the Science & technology section of the print edition under the headline "The science of science"

What's wrong with America's economy?

From the April 30th 2011 edition

Discover stories from this section and more in the list of contents

Explore the edition

More from Science & technology

Many mental-health conditions have bodily triggers

Psychiatrists are at long last starting to connect the dots

Climate change is slowing Earth’s rotation

This simplifies things for the world’s timekeepers


Memorable images make time pass more slowly

The effect could give our brains longer to process information