virus: Newsblaster

From: Jonathan Davis (jonathan.davis@lineone.net)
Date: Wed Mar 06 2002 - 11:18:34 MST


Via: http://searchenginewatch.com/searchday/

Newsblaster: An Automatic Weblogger

Newsblaster is a great new tool for news junkies, and also points the way
toward some seriously cool automated web harvesting technologies that will
be a boon to searchers.

Online news has gone through several iterations since the first media web
sites appeared in the mid 1990s. During the "portalization" craze news
was one of the first features bolted on to the major search engines and
directories. Later, headline aggregators became popular, providing
tailored newsfeeds to anyone with a web site.

And of course, the whole weblog phenomenon started as a source of
"alternate" news on the web, with the first bloggers "editing" the web
with links to news with a distinct point of view and often annotated with
opinionated commentary.

Newsblaster, a project developed by the Columbia NLP (natural language
processing) Group, represents the next level of evolution for news on the
web. The service monitors seventeen major web news services, and groups
related stories together for easy access.

What's so special about that? Isn't that what other news aggregators do?

Yes and no. What makes Newsblaster different is that it "reads" the news,
using natural language and artificial intelligence techniques, and then
actually writes short summaries of each major news event based on what it
has "understood." And it's remarkably good at what it does.

Here's how Newsblaster summarized a recent U.S. Supreme Court review of
copyright:

     'Limitless' Copyright Case Faces High Court Review

     "The U.S. Supreme Court agreed Tuesday to hear a case
     that could determine when hundreds of thousands of
     books, songs and movies will become freely available
     over the Internet or in digital libraries. A nonprofit
     Internet publisher and other plaintiffs argue that
     Congress sided too heavily with writers and other
     creators when it passed a law in 1998 that
     retroactively extended copyright protection by 20
     years. On Tuesday, the U.S. Supreme Court announced it
     would hear a challenge to the 1998 Copyright Term
     Extension Act, in which Congress extended the term of
     existing and future copyrights by 20 years. Billions of
     dollars and the future earning power of some of the
     nation's most cherished cultural icons are at stake
     as the U.S. Supreme Court considers a constitutional
     challenge to a 1998 copyright extension law, legal
     experts said Wednesday."

Beneath this summary, Newsblaster includes links to the news stories it
has read to generate the summary.

While Newsblaster is an excellent tool for gleaning a quick summary of the
most important news stories of the day, it won't replace journalists or
editors any time soon. As good as the NLP techniques are at extracting
and synthesizing information from news, the program lacks the perspective
and critical mindset of a professional journalist -- at least for now.

And though Newsblaster uses credible news sources, it can't yet account
for bias or inaccurate reporting.

But these are just quibbles, given the time-saving utility Newsblaster
offers. As the underlying technology improves and is extended, it's easy
to see how this sort of approach could be used to develop customized web
crawlers that you tailor to recognize your own interests and send out on
autonomous search missions.

If such a system were combined with a URL monitoring service, and seeded
with a taxonomy of subjects personally interesting to you, it could
effectively create your own web "advisory" service, automatically building
directories of promising sites annotated with high-level summaries that
would spare you the time of manual searching.

Just as Newsblaster won't replace journalists, this type of hybrid
crawler-agent wouldn't replace information professionals. But it would
make a powerful addition to our arsenal of web search tools.

Newsblaster
http://www.cs.columbia.edu/nlp/newsblaster/
Columbia NLP's "automatic system for event tracking and summarization."

Columbia Natural Language Processing Group Projects
http://www.cs.columbia.edu/nlp/projects.html
Descriptions and links to other projects under development at the Columbia
NLP Group.

Search method melds results
TRN News, January 9, 2002
http://www.trnmag.com/Stories/2002/010902/Search_method_melds_results_010902
.html

Description of a system that uses Newsblaster-like techniques to summarize
a set of results generated by a search engine -- another Columbia NLP
Group project.



This archive was generated by hypermail 2.1.5 : Wed Sep 25 2002 - 13:28:44 MDT