TABLE OF CONTENTS

THIRD EDITION

 

Chapter 1     Introduction

1.1  Is Pattern Recognition Important?
1.2  Features, Feature Vectors and Classifiers
1.3  Supervised versus Unsupervised Pattern Recognition
1.4  Outline Of The Book
 
Chapter 2  Classifiers Based on Bayes Decision Theory

2.1   Introduction
2.2   Bayes Decision Theory
2.3   Discriminant Functions and Decision Surfaces
2.4   Bayesian Classification for Normal Distributions
2.5   Estimation of Unknown Probability Density Functions
      2.5.1   Maximum Likelihood Parameter Estimation
      2.5.2   Maximum a Posteriori Probability Estimation
      2.5.3   Bayesian Inference
      2.5.4   Maximum Entropy Estimation
      2.5.5   Mixture Models
      2.5.6   Nonparametric Estimation
      2.5.7   The Naive-Bayes Classifier
2.6   The Nearest Neighbor Rule
2.6   Bayesian Networks
 

Chapter 3   Linear Classifiers

3.1   Introduction
3.2   Linear Discriminant Functions and Decision Hyperplanes
3.3   The Perceptron Algorithm
3.4   Least Squares Methods
      3.4.1   Mean Square Error Estimation
      3.4.2   Stochastic Approximation and the LMS Algorithm
      3.4.3   Sum of Error Squares Estimation
3.5   Mean Square Estimation Revisited
      3.5.1   Mean Square Regression
      3.5.2   MSE Estimates Posterior Class Probabilities
      3.5.3   The Bias-Variance Dilemma
  3.6   Logistic Discrimination
  3.7   Support Vector Machines
      3.7.1   Separable Classes
      3.7.2   Nonseparable Classes
      3.7.3   v-SVM
      3.7.4   Support Vector Machines: A Geometric Viewpoint
      3.7.5   Reduced Convex Hulls

Chapter 4   Non Linear Classifiers

 4.1   Introduction
 4.2   The XOR Problem
 4.3   The Two-Layer Perceptron
       4.3.1   Classification Capabilities of the Two-Layer Perceptron
 4.4   Three Layer Perceptrons
 4.5   Algorithms Based on Exact Classification of the Training Set
 4.6   The Backpropagation Algorithm
 4.7   Variations on the Backpropagation Theme
 4.8   The Cost Function Choice
 4.9   Choice of the Network Size
 4.10   A Simulation Example
 4.11  Networks with Weight Sharing
 4.12  Generalized Linear Classifiers
 4.13  Capacity of the l-Dimensional Space in Linear Dichotomies
 4.14  Polynomial Classifiers
 4.15  Radial Basis Function Networks
 4.16  Universal Approximators
 4.17  Support Vector Machines: The nonlinear Case
 4.18  Decision Trees
       4.18.1  Set of Questions
       4.18.2  Splitting Criterion
       4.18.3  Stop-Splitting Rule
       4.18.4  Class Assignment Rule
 4.19  Combining Classifiers
       4.19.1  Geometric Average Rule
       4.19.2  Arithmetic Average Rule
       4.19.3  Majority Voting Rule
       4.19.4  A Bayesian Viewpoint
 4.20  The Boosting Aprroach to Combine Classifiers
 4.21  Discussion

Chapter 5   Feature Selection

5.1   Introduction
5.2   Preprocessing
      5.2.1   Outlier Removal
      5.2.2   Data Normalization
      5.2.3   Missing Data
5.3   Feature Selection Based on Statistical Hypothesis Testing
      5.3.1   Hypothesis Testing Basics
      5.3.2   Application of the t-Test in Feature Selection
5.4   The Receiver Operating Characterisitcs (ROC) Curve
5.5   Class Separability Measures
      5.5.1   Divergence
      5.5.2   Chernoff Bound and Bhattacharyya Distance
      5.5.3   Scatter Matrices
5.6   Feature Subset selection
      5.6.1   Scalar Feature Selection
      5.6.2   Feature Vector Selection
5.7   Optimal Feature Generation
5.8   Neural Networks and Feature Generation / Selection
5.9   A Hint on the Vapnik-Chernovenkis Learning Theory
5.10   The Bayesian Information Criterion
 
Chapter 6    Feature Generation I: Linear Transforms

6.1   Introduction
6.2   Basis Vectors and Images
6.3   The Karhunen-Loeve Transform
6.4   The Singular Value Decomposition
6.5   Independent Component Analysis
      6.5.1   ICA Based on Second- and Fourth-Order Cumulants
      6.5.2   ICA Based on Mutual Information
      6.5.3   An ICA Simulation Example
6.6   The Discrete Fourier Transform
      6.6.1   One-Dimensional DFT
      6.6.2   Two-Dimensional DFT
6.7   The Discrete Cosine and Sine Transforms
6.8   The Hadamard Transform
6.9   The Haar Transform
6.10   The Haar Expansion Revisited
6.11  Discrete Time Wavelet Transform (DTWT)
6.12  The Multiresolution Interpretation
6.13  Wavelet Packets
6.14  A Look at Two-Dimensional Generalizations
6.15  Applications
 

Chapter 7    Feature Generation II

7.1   Introduction
7.2   Regional Features
      7.2.1   Features for Texture Characterization
      7.2.2   Local Linear Transforms for Texture Feature Extraction
      7.2.3   Moments
      7.2.4   Parametric Models
7.3   Features for Shape and Size Characterization
      7.3.1   Fourier Features
      7.3.2   Chain Codes
      7.3.3   Moment-Based features
      7.3.4   Geometric Features
7.4   A Glimpse at Fractals
      7.4.1   Self-Similarity and Fractal Dimension
      7.4.2   Fractional Brownian Motion
7.5   Typical Features for Speech and Audio Classification
      7.5.1   Short Time Processing of Signals
      7.5.2   Cepstrum
      7.5.3   The Mel-Cepstrum
      7.5.4   Spectral Features
      7.5.5   Time Domain Features
      7.5.6   An Example
 
Chapter 8   Template Matching

8.1   Introduction
8.2   Similarity Measures Based on Optimal Path Searching Techniques
      8.2.1   Bellman's Optimality Principle and Dynamic Programming
      8.2.2   The Edit Distance
      8.2.3   Dynamic Time Warping in Speech Recognition
8.3   Measures Based on Correlations
8.4   Deformable Template Models
 

Chapter 9   Context Dependent Classification

9.1   Introduction
9.2   The Bayes Classifier
9.3   Markov Chain Models
9.4   The Viterbi Algorithm
9.5   Channel Equalization
9.6   Hidden Markov Models
9.7   HMM with State Duration Modeling
9.8   Training Markov Models via Neural Networks
9.9   A Discussion on Markov Random Fields
 

Chapter 10   System Evaluation

10.1   Introduction
10.2   Error Counting Approach
10.3   Exploiting the Finite Size of the Data Set
10.4   A Case Study from Medical Imaging
 

Chapter 11   Clustering: Basic Concepts

11.1   Introduction
       11.1.1   Applications of Cluster Analysis
       11.1.2   Types of Features
       11.1.3   Definitions of Clustering
11.2   Proximity Measures
       11.2.1   Definitions
       11.2.2   Proximity Measures Between Two Points
       11.2.3   Proximity Functions Between a Point and a Set
       11.2.4   Proximity Functions Between Two sets
 

Chapter 12   Clustering Algorithms I: Sequential Algorithms

12.1   Introduction
       12.1.1   Number of Possible Clusterings
12.2   Categories of Clustering Algorithms
12.3   Sequential Clustering Algorithms
       12.3.1   Estimation of the Number of Clusters
12.4   A Modification of BSAS
12.5   A Two-Threshold Sequential Scheme
12.6   Refinement Stages
12.7   Neural Network Implementation
       12.7.1   Description of the Architecture
       12.7.2   Implementation of the BSAS Algorithm
 

Chapter 13   Clustering Algorithms II: Hierarchical Algorithms

13.1   Introduction
13.2   Agglomerative Algorithms
       13.2.1   Definition of Some Useful Quantities
       13.2.2   Agglomerative Algorithms Based on Matrix Theory
       13.2.3   Monotonicity and Crossover
       13.2.4   Implementational Issues
       13.2.5   Agglomerative Algorithms Based on Graph Theory
       13.2.6   Ties in the Proximity Matrix
13.3   The Cophenetic Matrix
13.4   Divisive Algorithms
13.5   Hierarchical Algorithms for Large Data Sets (the CURE, ROCK, Chameleon Algorithms)
13.6   Choice of the Best Number of Clusters
 

Chapter 14   Clustering algorithms III: Schemes Based on Function Optimization

14.1   Introduction
14.2   Mixture Decomposition Schemes
       14.2.1   Compact and Hyperellipsoidal Clusters
       14.2.2   Geometrical Interpretation
14.3   Fuzzy Clustering Algorithms
       14.3.1   Point Representatives
       14.3.2   Quadric Surfaces as Representatives
       14.3.3   Hyperplane Representatives
       14.3.4   Combining Quadric and hyperplane Representatives
       14.3.5   A Geometrical Interpretation
       14.3.6   Convergence Aspects of the Fuzzy Clustering Algorithms
       14.3.7   Alternating Cluster Estimation
14.4   Possibilistic Clustering
       14.4.1   The Mode-Seeking Property
       14.4.2   An alternative Possibilistic Scheme
14.5   Hard Clustering Algorithms
       14.5.1   The Isodata or k-means or c-means Algorithm
       14.5.2   k-Medoids Algorithms (The PAM, CLARA, CLARANS Algorithms)
14.6   Vector Quantization
 

Chapter 15   Clustering algorithms IV

15.1   Introduction
15.2   Clustering Algorithms Based on Graph Theory
       15.2.1   Minimum Spanning Tree Algorithms
       15.2.2   Algorithms Based on Regions of Influence
       15.2.3   Algorithms Based on Directed Trees
15.3   Competitive Learning Algorithms
       15.3.1   Basic Competitive Learning Algorithms
       15.3.2   Leaky Learning Algorithm
       15.3.3   Conscientious Competitive Learning Algorithms
       15.3.4   Competitive Learning-Like Algorithms
                Associated with Cost Functions
       15.3.5   Self-Organizing Maps
       15.3.6   Supervised Learning Vector Quantization
15.4   Binary Morphology Clustering Algorithms (BMCAs)
       15.4.1   Discretization
       15.4.2   Morphological Operations
       15.4.3   Determination of the Clusters in a Discrete Binary Set
       15.4.4   Assignment of Feature Vectors to Clusters
       15.4.5   The Algorithmic Scheme
15.5   Boundary Detection Algorithms
15.6   Valley-Seeking Clustering Algorithms
15.7   Clustering via Cost Optimization (Revisited)
       15.7.1   Branch and Bound Clustering Algorithms
       15.7.2   Simulated Annealing
       15.7.3   Deterministic Annealing
       15.7.4   Cluster Using Genetic Algorithms
15.8   Kernel Clustering Methods
15.9   Density-Based Algorithms for Large Data Sets
       15.9.1   The DBSCAN Algorithm
       15.9.2   The DBCLASD Algorithm
       15.9.3   The DENCLUE Algorithm
15.10   Clustering Algorithms for High-Dimensional Data Sets
       15.10.1   Dimensionality Reduction Clustering Approach
       15.10.2   Subspace Clustering Approach
15.11   Other Clustering Algorithms
 

Chapter 16   Cluster Validity

16.1   Introduction
16.2   Hypothesis Testing Revisited
16.3   Hypothesis Testing in Cluster Validity
       16.3.1   External Criteria
       16.3.2   Internal Criteria
16.4   Relative Criteria
       16.4.1   Hard Clustering
       16.4.2   Fuzzy Clustering
16.5   Validity of Indvidual Clusters
       16.5.1   External Criteria
       16.5.2   Internal Criteria
16.6   Clustering Tendency
       16.6.1   Tests for Spatial Randomness
 

Appendix A
Hints from Probability and Statistics

Appendix B
Linear Algebra Basics

Appendix C
Cost Function Optimization

Appendix D
Basic Definitions from Linear Systems Theory