Abstract:
In blind speech separation methods based on the assumption of W-disjoint orthogonality (W-DO), musical noise is inevitable in separated signals because the assumption does not include the case of existing multiple source signals in the time-frequency domain. A blind speech separation method based on channel estimation is proposed for partial approximate W-disjoint orthogonality. The time-frequency cells with only one source are detected and normalized to be independent of frequency, which overcomes not only the shortcoming of W-DO property but also the frequency permutation problem, and then the channel estimation is obtained by K-means clustering. Finally, signal subspace method is exploited to reconstruct sources. Simulation results demonstrate that the novel method can effectively reduce the musical noise in the separated speech signals, and it outperforms the typical time frequency binary masking method, the averaged signal to distortion ratio (SDR) is improved by 3.02 dB and the averaged signal to interference ratio (SIR) is improved by 4.61 dB.