Machine Learning for Cancer Diagnosis and Prognosis

This page describes various linear-programming-based machine
learning approaches which have been applied to the diagnosis and
prognosis of breast cancer. This work is the result of a
collaboration at the University of Wisconsin-Madison between Prof. Olvi L.
Mangasarian of the Computer Sciences Department and
Dr. William H. Wolberg of the departments of Surgery and Human
Oncology.
Here
is a copy of the press release distributed at the American Cancer
Society Science Writers seminar in March of 1994. It provides a
good overview of this research.
Table of Contents
Diagnosis
This work grew out of the desire by Dr. Wolberg to accurately
diagnose breast masses based solely on a Fine Needle Aspiration
(FNA). He identified nine visually assessed characteristics of an
FNA sample which he considered relevant to diagnosis. In
collaboration with Prof. Mangasarian and two of his graduate
students, Rudy Setiono and Kristin Bennett, a classifier was
constructed using the multisurface method (MSM) of pattern
separation on these nine features that successfully diagnosed 97%
of new cases. The resulting data set is well-known as the
Wisconsin Breast Cancer Data.
The image analysis work began in 1990 with the addition of Nick Street
to the research team. The goal was to diagnose the sample based on
a digital image of a small section of the FNA slide. The results of
this research have been consolidated into a software system known
as Xcyt, which is currently used by Dr. Wolberg in his
clinical practice. The diagnosis process is now performed as
follows:
- An FNA is taken from the breast mass. This material is then
mounted on a microscope slide and stained to highlight the cellular
nuclei. A portion of the slide in which the cells are
well-differentiated is then scanned using a digital camera and a
frame-grabber board.
- The user then isolates the individual nuclei using Xcyt.
Using a mouse pointer, the user draws the approximate boundary of
each nucleus. Using a computer vision approach known as "snakes",
these approximations then converge to the exact nuclear boundaries.
This interactive process takes between two and five minutes per
slide. Here
is an image showing Xcyt in use.
- Once all (or most) of the nuclei have been isolated in this
fasion, the program computes values for each of ten characteristics
of each nuclei, measuring size, shape and texture. The mean,
standard error and extreme values of these features are computed,
resulting in a total of 30 nuclear features for each sample.
- Based on a training set of 569 cases, a linear classifier was
constructed to differentiate benign from malignant samples. This
classifier consists of a single separating plane in the space of
three of the features: Extreme Value of Area, Extreme Value of
Smoothness, and Mean Value of Texture. By projecting all the cases
onto the normal of this separating plane, approximate probability
densities of the benign and malignant points were constructed.
These allow a simple Bayesian computation of probability of
malignancy for new patients. These densities are shown to the
patient, allowing her to judge the "confidence" of her diagnosis by
comparison to hundreds of previous samples.
To date, this system has correctly diagnosed 176 consecutive new
patients (119 benign, 57 malignant). In only eight of those cases
did Xcyt return a "suspicious" diagnosis (that is, an
estimated probability of malignancy between 0.3 and 0.7).
A small subset of the source images used in this research can be
found here.
These are very good test cases for image segmentation or object
recognition algorithms. If your pet segmentation algorithm can
automatically identify all of the nuclei in these images, please
email me (street@cs.wisc.edu) and let's work together.
Prognosis
The second problem considered in this research is that of
prognosis, the prediction of the long-term behavior of the disease.
We have approached prognosis as a function-approximation problem,
using input features -- including those computed by Xcyt --
to predict a time of recurrence in malignant patients, using
right-censored data. Our solution is termed the Recurrence Surface
Approximation method (RSA), and utilizes a linear program to
construct a surface which predicts time of recurrence for new
patients. By examining the actual recurrence of those training
cases with similar predicted recurrence times, we can plot the
probability of disease-free survival for various times (out to 10
years) for an individual patient. This capability has been
incorporated into Xcyt and an example is shown here. These survival
curves plot the probability of disease-free survival versus time
(in years). The black disease-free survival curve represents all
patients in our original study; the red curve represents the
probability of disease-free survival for the sample case. This
particular case therefore has an above-average prognosis, with a
probability of being disease-free after 10 years equal to about
80%.
The RSA procedure can also be used to compare the predictive
power of various prognostic factors. Our results indicate that
precise, detailed cytological information of the type provided by
Xcyt gives better prognostic accuracy than the traditional
factors Tumor Size and Lymph Node Status. If corroborated by other
researchers, this result could remove the need for the often
painful axillary lymph node surgery.
Chronological Bibliography
Linked papers are provided in postscript format; if you don't have
a postscript viewer, you can download the file (e.g., shift-click
in Netscape) and print it. Abstracts are ASCII text. To obtain
papers which are not linked, please contact the first author.
- O.L. Mangasarian, R. Setiono and W.H. Wolberg.
- Pattern Recognition via Linear Programming: Theory and
Application to Medical Diagnosis. In Proceedings of the
Workshop on Large-Scale Numerical Optimization, 1989, pages
22-31, Philadelphia, PA. SIAM.
- O.L. Mangasarian and W. H. Wolberg.
- Cancer Diagnosis via Linear Programming. SIAM News,
Vol. 23, 1990, pages 1 & 18.
- W.H. Wolberg and O.L. Mangasarian.
- Multisurface Method of Pattern Separation for Medical Diagnosis
Applied to Breast Cytology. Proceedings of the National Academy
of Sciences, U.S.A., Vol. 87, 1990, pages 9193-9196.
- W.N. Street.
- Toward Automated Cancer Diagnosis: An Interactive System for
Cell Feature Extraction. Technical Report 1052, Computer Sciences
Department, University of Wisconsin, October 1991.
- W.H. Wolberg, K.P. Bennett and O.L. Mangasarian.
- Brast Cancer Diagnosis and Prognostic Determination from Cell
Analysis. Manuscript, 1992, Departments of Surgery and Human
Oncology and Computer Sciences, University of Wisconsin, Madison,
WI 53706.
- W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
- Breast
Cytology Diagnosis via Digital Image Analysis. Analytical
and Quantitative Cytology and Histology, Vol. 15 No. 6, pages
396-404, December 1993. (abstract)
- W.N. Street, W.H. Wolberg and O.L. Mangasarian.
- Nuclear
Feature Extraction For Breast Tumor Diagnosis. In IS&T/SPIE
1993 International Symposium on Electronic Imaging: Science and
Technology, volume 1905, pages 861-870, San Jose, CA, 1993.
(abstract)
- W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
- Machine learning
techniques to diagnose breast cancer from fine-needle
aspirates. Cancer Letters Vol. 77, pages 163-171,
1994. (abstract)
- W. N. Street
- Cancer
Diagnosis and Prognosis via Linear-Programming-Based Machine
Learning. Ph.D. Dissertation, University of Wisconsin-Madison,
August 1994. Available as UW Mathematical Programming Technical
Report 94-14. (abstract)
- W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L.
Mangasarian.
- Computerized breast
cancer diagnosis and prognosis from fine needle aspirates.
Archives of Surgery 1995; 130:511-516. (abstract)
- W.H. Wolberg, W.N. Street, and O.L. Mangasarian.
- Image
analysis and machine learning applied to breast cancer diagnosis
and prognosis. Analytical and Quantitative Cytology and
Histology, Vol. 17 No. 2, pages 77-87, April 1995. (abstract)
- W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L.
Mangasarian.
- Computer-derived
Nuclear Features Distinguish Malignant from Benign Breast
Cytology. Human Pathology, Vol. 26, pages 792-796,
1995. (abstract)
- W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L.
Mangasarian.
- Computer-derived
Nuclear ``Grade'' and Breast Cancer Prognosis. Analytical
and Quantitative Cytology and Histology, Vol. 17 No. 4, pages
257-264, August 1995. (abstract)
- O.L. Mangasarian, W.N. Street and W.H. Wolberg.
- Breast
cancer diagnosis and prognosis via linear programming.
Operations Research, 43(4), pages 570-577, July-August
1995. Available as UW Mathematical Programming Technical Report
94-10. (abstract)
- W. N. Street, O. L. Mangasarian, and W.H. Wolberg.
- An inductive
learning approach to prognostic prediction. Proceedings of
the Twelfth International Conference on Machine Learning, A.
Prieditis and S. Russell, eds., pages 522-530, Morgan Kaufmann,
1995. (abstract)
- M. W. Teague, W. H. Wolberg, W. N. Street, O. L.
Mangasarian, S. C. Call and D. L. Page.
- Indeterminate Fine Needle Aspiration of the Breast: Image
Analysis Aided Diagnosis. Cancer, submitted. (abstract)
- W. N. Street, O. L. Mangasarian, and W. H. Wolberg.
- Individual
and collective prognostic prediction.
Technical Report 96-01, Computer Sciences Department, University
of Wisconsin, Madison, WI, January 1996. Submitted to ICML and AAAI
conferences. (abstract)
Citation in the Medical and Popular Press
- News from Medicine segment, CNN Prime News, March 12,
1994.
- Breast Biopsy Without Surgery. Tim Friend, USA Today,
March 24, 1994.
- Cancer Detection Imitates Oil Prospecting. Joe Manning,
Milwaukee Sentinel, March 24, 1994.
- Analyzing Breast Cancer. Detroit News, March 28,
1994.
- A High-tech Cancer Hunt. Marilynn Marchione, Milwaukee
Journal, March 28, 1994.
- Computerized Interpretation of Breast FNA Biopsies: Progress
Reported, Oncology Times, April 1994.
- Computer Program Hunts Breast Cancer, Ruth SoRelle, Houston
Chronicle, April 22, 1994.
- Computer Program May Improve Interpretation of Aspirate,
Oncology News International, May 1994.
- New Data Suggest Needle Biopsies Could Replace Surgical Biopsy
for Diagnosing Breast Cancer. Journal of the American Medical
Association, Medical News & Perspectives column, June 9, 1994,
Vol. 271, No. 22.
- Diagnosis Via Image Analysis and Machine Learning,
Cope, September/October 1994.
- Computer Seeks Out Breast Cancer, Madison Capital
Times, January 17, 1995.
- Computer-Aided Cancer Prediction, Los Angeles Times,
January 25, 1995.
Local Related Links
Other Related Links
paulb@cs.wisc.edu