Resources:

This section mostly contains information on how to reproduce the experimental results presented in (Sigletos et al. 2004, 2005).

Datasets:

Laptops products. A set of 50 annotated pages describing laptop products
The following three datasets were graciously provided by Dayne Freitag (Freitag, 2000).
CS Courses. A set of 101 annotated pages describing Computer Science courses.
Research Projects. A set of 96 annotated pages describing research projects.
Seminars. A set of 485 annotated pages describing seminar announcements from CMU.
Read the Annotation.pdf that describes the format of the annotation files.

Meta-level data:

The following files contain the output of the base-level IE systems for each of the three domains.
The feature vectors can then be easily constructed.
(Note: No OPD constraint has been assumed in the following files for Courses, Projects and Seminars).

Laptops_meta.zip For the domain of laptop products.
CS_Courses_meta.zip For the domain of Computer Science courses.
Projects_meta.zip For the domain of research projects.
Seminars_meta.zip For the domain of seminar announcements.

Preprocessed files for Amilcare:

The following files contain the proprocessed files that were used for training and testing Amilcare. greek_laptops_amilcare_full.tar.gz For laptop products (29MB).
webkb_cs_courses_amilcare_full.tar.gz For webkb computer science courses (12MB).
webkb_projects_amilcare_full.tar.gz For webkb research projects (8.5MB).
Read the following text files for further explanation.

Evaluation:

The following text files contain information on how each corpus was split into training and testing parts during cross-validation.
Laptops_splits.txt For the domain of laptop products.
CS_Courses_splits.txt For the domain of CS courses.
Projects_splits.txt For the domain of research projects.

Related bibliography (incomplete list):