Home >

news ヘルプ

論文・著書情報


タイトル
和文: 
英文:MSDET: Multitask Speaker Separation and Direction-of-Arrival Estimation Training 
著者
和文: Hartanto Roland, Sakriani Sakti, 篠田 浩一.  
英文: Roland Hartanto, Sakriani Sakti, Koichi Shinoda.  
言語 English 
掲載誌/書名
和文: 
英文:Proc. Interspeech 2024 
巻, 号, ページ         pp. 2170-2174
出版年月 2024年9月1日 
出版者
和文: 
英文:International Speech Communication Association (ISCA) 
会議名称
和文: 
英文:Interspeech 2024 
開催地
和文: 
英文:Kos Island 
公式リンク https://interspeech2024.org/
 
DOI https://doi.org/10.21437/Interspeech.2024-2537
アブストラクト The information on the spatial location of speakers can be effectively used for multi-channel speaker separation. For example, Location-Based Training (LBT) uses the order of azimuth angles and distances of speakers to solve the permutation ambiguity problem. This location information can be used to improve the separation performance further. This paper proposes a multitask learning approach, Multitask Speaker Separation and Direction-of-Arrival Estimation Training (MSDET), jointly optimizing speaker separation and Direction-of-Arrival (DoA) estimation. In our evaluation using SMS-WSJ dataset, it outperforms LBT by 0.13 points in SI-SDR and 0.35 points in ESTOI.

©2007 Institute of Science Tokyo All rights reserved.