Popis: |
Speaker clustering is a key component in many speech processing applications. We focus on Broadcast News meta data annotation and speaker adaptation. In this setting, speaker clustering consists of identifying who spoke, and when they spoke in a long news broadcast. Speaker clustering is given a set of short audio segments. Ideally, it will discover how many people are speaking in the broadcast, and when they are speaking. The same problem can be transposed to a different domain. In this paper, we present two techniques that do not require a priori training. The speaker clustering is based on information collected solely on encountered test data. They aim at being portable across domains. The first method is based on a Bayesian information criterion (BIC), with single full-covariance Gaussians. It is fairly primitive but effective. The second method, called speaker triangulation, constructs a coordinate system based on conditional likelihoods of the audio segments. Clusters are located in this coordinate system. We are able to achieve state-of-the-art performance on NIST evaluations across different data sets. |