clipped from: www.spectrum.ieee.org   

Google News tackles this problem by using a technique called hierarchical agglomerative clustering. Basically, it puts news articles with similar phrasing together into distinct piles. It starts by analyzing the content of articles to find those that share keywords or key phrases; articles that have enough language in common are assumed to be covering similar topics. The articles in each pile are connected based on the strength of their similarity. To visualize these connections, imagine a treelike structure where the articles are the leaves. If we grab a branch from the tree, the many leaves on that branch are all similar articles—that is, articles about the same general event. Thus a group of leaves near one another on a branch of the tree constitutes a cluster.