Resources:
This section mostly contains information on how to reproduce the experimental results presented in (Sigletos et al. 2004, 2005).
Datasets:
Laptops products. A set of 50 annotated pages describing laptop productsThe following three datasets were graciously provided by Dayne Freitag (Freitag, 2000).
CS Courses. A set of 101 annotated pages describing Computer Science courses.
Research Projects. A set of 96 annotated pages describing research projects.
Seminars. A set of 485 annotated pages describing seminar announcements from CMU.
Read the Annotation.pdf that describes the format of the annotation files.
Meta-level data:
The following files contain the output of the base-level IE systems for each of the three domains.The feature vectors can then be easily constructed.
(Note: No OPD constraint has been assumed in the following files for Courses, Projects and Seminars).
Laptops_meta.zip For the domain of laptop products.
CS_Courses_meta.zip For the domain of Computer Science courses.
Projects_meta.zip For the domain of research projects.
Seminars_meta.zip For the domain of seminar announcements.
Preprocessed files for Amilcare:
The following files contain the proprocessed files that were used for training and testing Amilcare. greek_laptops_amilcare_full.tar.gz For laptop products (29MB).webkb_cs_courses_amilcare_full.tar.gz For webkb computer science courses (12MB).
webkb_projects_amilcare_full.tar.gz For webkb research projects (8.5MB).
Read the following text files for further explanation.
Evaluation:
The following text files contain information on how each corpus was split into training and testing parts during cross-validation.Laptops_splits.txt For the domain of laptop products.
CS_Courses_splits.txt For the domain of CS courses.
Projects_splits.txt For the domain of research projects.
Related bibliography (incomplete list):
- G. Sigletos, G. Paliouras, C.D. Spyropoulos, Michalis Hatzopoulos, Combining Information Extraction systems Using Voting and Stacked Generalization, Journal of Machine Learning Research (JMLR), 2005 (pdf).
- G. Sigletos, G. Paliouras, C.D. Spyropoulos, Takis Stamatopoulos, Stacked Generalization for Information Extraction, 16th European Conference on Artificial Intelligence (ECAI), Valencia, Spain, August 2004, IOS Press (pdf).
- G. Sigletos, G. Paliouras, C.D. Spyropoulos, Takis Stamatopoulos Meta-learning beyond classification: A framework for information extraction from the Web . Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik (Croatia), September 2003.
- Wolpert, D., Stacked Generalization, Neural Networks,5(2): 241-260, 1992.
- Vilalta, R., Drissi, Y., A perspective view and survey of meta-learning, Artificial Intelligence Review, 18(2), pp. 77-95, 2002.
- Dietterich, T.G., Machine Learning research: Four current directions. AI Magazine, 18(4): 97-136, 1997.
- Freitag, D., Machine Learning for Information Extraction in Informal Domains, Machine Learning, 39, 169-202, 2000.
- Halteren, H., Zavrel J., Daelemans, W., Improving Accuracy in Word Class Tagging through Combination of Machine Learning Systems, Computational Linguistics, 27(2), 199-230, 2001.