Xuanjun Chen

Ph.D. Student, EECS
National Taiwan University

National Taiwan University

About me

I am a Ph.D. student at National Taiwan University (NTU), where I working with Prof. Hung-yi Lee and Prof. Jyh-Shing Roger Jang. I received M.S. degree from NTU in 2023 and B.S. degree from Taiwan Tech in 2020. My research interests include, Speech Processing, Audio/Text LLMs, and Audio Deepfakes. I'm honored to receive Google Student Travel Grant in 2024.


Selected Publications (* equal contribution)

💡 I am a newcomer to the field of Audio/Text Large Language Models. Prior to this, I spent over 5 years working on Audio Deepfake problems.

🎙️ Audio Large Language Models

[5] Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Sung-Feng Huang, Chih-Kai Yang, Chee-En Yu, Chun-Wei Chen, Wei-Chih Chen, Chien-yu Huang, Yi-Cheng Lin, Yu-Xiang Lin, Chi-An Fu, Chun-Yi Kuan, Wenze Ren, Xuanjun Chen, et al., "DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment" in arXiv 2025.
arXiv / Code

[4] Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, Andy T. Liu, Chen-An Li, Yu-Xiang Lin, Wei-Cheng Tseng, Anuj Diwan, Yi-Jen Shih, Jiatong Shi, William Chen, Xuanjun Chen, et al., "Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks," in ICLR 2025.
arXiv / Code

[3] Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-wei Chang, Ho-Lam Chung, Alexander H. Liu, and Hung-yi Lee. "Towards audio language modeling-an overview," Overview Report, Feb. 2024.
arXiv / Awesome List

[2] Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, and Hung-yi Lee. "Codec-SUPERB: An In-Depth Analysis of Sound Codec Models," in Findings of ACL 2024.
ACL / arXiv / Leaderboard / Code / Huggingface

[1] Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Jiawei Du, Kai-Wei Chang, Ke-Han Lu, Alexander Liu, Ho-Lam Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan, James Glass, Shinji Watanabe, and Hung-yi Lee. "Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural codec models," in IEEE SLT 2024.
IEEE / arXiv

🔍 Retrieval Augmented Generation

[2] Wei-Chieh Chou*, Xuanjun Chen*, Jian-Ren Lin, Claire Lin, Hung-yi Lee, Jyh-Shing Roger Jang, "Efficient Retrieval-Augmented Generation via Grounded Planning." Working in Progress.

[1] Claire Lin*, Bo-Han Feng*, Xuanjun Chen*, Te-Lun Yang, Hung-yi Lee, Jyh-Shing Roger Jang, "A Preliminary Study of RAG for Taiwanese Historical Archives" in ROCLING 2025 (🏆 Best Paper Award).
arXiv

🛡️ Audio Deepfake Detection, Localization, Attribution, and Reliability

[10] Xuanjun Chen*, Jiawei Du*, Haibin Wu, Lin Zhang, I-Ming Lin, I-Hsiang Chiu, Wenze Ren, Yuan Tseng, Yu Tsao, Jyh-Shing Roger Jang, Hung-yi Lee, "CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset," Preprint, 2025.
arXiv / Project Page / Hugging Face / Code

[9] Xuanjun Chen*, Shih-Peng Cheng*, Jiawei Du, Lin Zhang, Xiaoxiao Miao, Chung-Che Wang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang. "Localizing Audio-Visual Deepfakes via Hierarchical Boundary Modeling," in arxiv 2025.
arXiv

[8] Xuanjun Chen, Chia-Yu Hu, I-Ming Lin, Yi-Cheng Lin, I-Hsiang Chiu, You Zhang, Sung-Feng Huang, Yi-Hsuan Yang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang, "How Does Instrumental Music Help SingFake Detection?" in arXiv 2025.
arXiv

[7] Xuanjun Chen*, I-Ming Lin*, Lin Zhang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang. "Towards Generalized Source Tracing for Codec-Based Deepfake Speech," in IEEE ASRU 2025 (🏆 Best Student Paper nominee).
arXiv / Code

[6] Xuanjun Chen*, I-Ming Lin*, Lin Zhang, Jiawei Du, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang. "Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy," in INTERSPEECH 2025.
arXiv / Code

[5] Xuanjun Chen, Haibin Wu, Jyh-Shing Roger Jang, and Hung-yi Lee. "Singing Voice Graph Modeling for SingFake Detection," in INTERSPEECH 2024 (Oral).
ISCA / arXiv / Code / Lightning Talk

[4] Xuanjun Chen*, Jiawei Du*, Haibin Wu, Jyh-Shing Roger Jang, and Hung-yi Lee. "Neural Codec-based Adversarial Sample Detection for Speaker Verification," in INTERSPEECH 2024.
ISCA / arXiv / Code / Poster

[3] Jiawei Du, I-Ming Lin, I-Hsiang Chiu, Xuanjun Chen, Haibin Wu, Wenze Ren, Yu Tsao, Hung-yi Lee, Jyh-Shing Roger Jang. "DFADD: The Diffusion and Flow-Matching based Audio Deepfake Dataset," in IEEE SLT 2024.
IEEE / arXiv / Code / Huggingface

[2] Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-yi Lee, and Jyh-Shing Roger Jang, "Multimodal Transformer Distillation for Audio-Visual Synchronization," in ICASSP 2024.
IEEE / arXiv / Code / Poster

[1] Xuanjun Chen*, Haibin Wu*, Helen Meng, Hung-yi Lee, and Jyh-Shing Roger Jang, "Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection," in IEEE SLT 2022, Jan 2023.
IEEE / arXiv / Demos / Poster / Video


Selected Serivces

2025: Co-Organizer, Responsible Speech & Audio Generative AI, Special Session at IEEE ASRU 2025

2024: Technical Committee, Codec-SUPERB Challenge, Special Session at IEEE SLT 2024

2023-Now: Reviewer: AAAI, ACL, EMNLP, ICASSP, INTERSPEECH, ASRU, COLING, MLSP, IJPRAI, IALP