Information Capture and Access
The information capture and access research group works on
ways that computers can locate information in the ever increasing
volume of online data, determine its structure, and extract the
information for human users. The group was founded by John Hopcroft
and Jim Davis in 1992.
Current areas of research
- Extracting structured material from online documents when the
structure is not explicit in the document - e.g. extracting
information presented in tabular form into a relational
database.
- Constructing summaries and overviews of collections of
texts.
- Construction of a nationwide library of computer science
technical reports. We have begun digitizing the Cornell Computer
Science technical report collection, in order to make the work more
accessible on the Internet. The collection is available through a
WWW server. In addition
to its utility to the general CS research community, We use this
document collection as test material for our research in
information access.
The group consists of Cornell researchers Dean Krafft and visiting
scientist Jim Davis as
well as a number of graduate and undergraduate students.
Fall 95: The project is not active any longer. - JRD
Publications
James Allan et al. Information Agents for Building
Hyperlinks, Proceedings of the 2nd Conference on Information
and Knowledge Management, 1993.