Achievements of the Recent (5-10 years) IDM Projects & DB/IR Fields

Jianwen Su and Susan Gauch


Mission

This working group was charged with collecting information that can convey the impact of Information Retrieval/Database/Knowledge Management research to Congress using terms and outcomes accessible to an informed public. One of our goals was to show the roots of commercial success in earlier, fundamental research.

Deliverables

We produced three deliverables:
  • an updated chart of technologies
  • a diagram showing current commercial sucesses and their research roots
  • a collection of success stories
  • Discussion Summary

    New Research Topics of the Late 1990s

    We discussed research topics broken down into five categories: Database, Information Retrieval, Multimedia, Knowledge Bases and Digital Libraries. One of the main themes of the discussion was the increasing overlap between the traditional fields. To present just one example of this overlab, there was quite some debate about where "Multimedia" belonged, with both Information Retrieval and Database researchers claiming it as an integral hot research area in "their" field. This lead to the not surprising conclusion that Multimedia was a research topic that spanned across the IDM fields and, of course, into the fields of Image Processing, Computer Vision, Mathematics and Robotics. Similarly, Digital Libraries projects span across many fields of technology, plus bring in researchers in library science and providers and users of the digital library's content. Finally, the World Wide Web, particularly markup, indexing and retrieval of semi-structured information brings the fields of database and information retrieval together.

    The following is a list of the new research topics discussed by the group, presneted by area:

  • Database: semi-structured data, data integration, OLAP, object-relation data model, results ranking, index structures
  • Information Retrieval: distributed search, search agents, personalization, information fusion
  • Multimedia: content-based indexing, compression, presentation, video servers, access methods, segmentation
  • Knowledge Management: data warehouses, data mining, domain-specific knowledge management
  • Digital Libraries: geographic, scientific, and educational libraries, XML interchange language
  • Technology Deployments of the Late 1990s

    This list was primarily put together based on the knowledge of the individuals involved in the discussion group. Clearly, there have been many, many more technology deployments in the Information Technology field whose origins can be traced back to earlier fundamental research. The most obvious technology deployments whose origins can be traced back to NSF are those deployed on the World Wide Web. It is hard to underestimate the impact the Web is having on how business is conducted and how individuals interact on society. Many of the Web technologies have their origins in IR/DB work of the last two decades. Potentially even more important, the Human Genome project is an example of a huge, collaborative digital library that has the potential to radically change the quality of life and health on the planet. Again, we broke the technology deployments into the five categories as the research topics.

    It is interesting to consider the current boom in e-commerce. Most, if not all, e-commerce site are based on databases to track product, orders, and customers. They rely on network protocols to exchange data. They are trusted because of fundamental work in authentication and encryption. They are accessible to users due to the development of http and HTML, an outgrowth of research in Hypertext. None of this would be possible without decades of research in seemingly disparate fields coming together into an effective, reliable, efficient and user-friendly environment for online business.

  • Database: Object-Relational Databases (ORDB), financial tracking, Global Positioning System (GPS) applications, E-commerce applications
  • Information Retrieval: Web search, information filtering, meta-search electronic publishing, natural language search, popularity search
  • Multimedia: video databases, map databases
  • Knowledge Management: EDGAR database, fraud detection
  • Digital Libraries: genome database
  • Success Stories

    We were able to identify several successful, well-known, commercial products who can clearly trace their origins back to NSF-sponsored research. These were:
  • Lycos - Web spider and search
  • Virage - Multimedia database
  • Google - Popularity based ranking
  • Oracle Spatial Database Option (SDO) - geographical database
  • FBI Bastille System
  • CNN Multimedia News Database
  • Diagrams Showing the Research Roots of Commercially Successful Products