Speech - Music Discrimination Demo V.1.0 – HELP


Signal and Image Processing Lab

Dept. Of Information and Telecommunications

University of Athens, Greece


Demo implemented by:

Theodoros Giannakopoulos, Dr. Aggelos Pikrakis and Prof. Sergios Theodoridis.

For more information on the algorithm, please visit our website at www.di.uoa.gr/~sp_mu


Algorithm Description

The algorithm consists of 3 stages:

Stage 1: A simple segmentation algorithm is applied on the original signal, in order to detect speech and music segments with a high class probability. The algorithm uses chroma entropy as a feature and its parameters are set so as the precision of the segmentation process is maximized. A part of the audio stream (usually between 30-60%) is left unclassified, but the classified segments are correctly classified with a ratio of more than 98%.

Stage 2: The unclassifed segments are fed as input to a more sophisticated (and therefore more time consuming) segmentation algorithm. The algorithm is based on a hybrid Hidden Markov Model and Bayesian Network architecture.

Stage 3: A number of post-processing procedures is executed, in order to improve the final result of the classification process.