Publications

X. Mei, X. Liu, M. D. Plumbley, and W. Wang, “Automated Audio Captioning: an Overview of Recent Progress and New Challenges.” Journal on Audio, Speech, and Music Processing, volume 26, 2022. (pdf)

X. Mei, X. Liu, J. Sun, M. D. Plumbley, and W. Wang, “On Metric Learning for Audio-Text Cross-Modal Retrieval,” in INTERSPEECH, ISCA, 2022. (arXiv), (pdf)
X. Mei, X. Liu, J. Sun, M. D. Plumbley, and W. Wang, “Diverse Audio Captioning via Adversarial Training,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022. (arXiv), (pdf)
X. Mei, X. Liu, Q. Huang, M. D. Plumbley, and W. Wang,“Audio Captioning Transformer,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), Barcelona, Spain, November 2021, pp. 211–215. (arXiv), (pdf)
X. Mei, Q. Huang, X. Liu, G. Chen, J. Wu, Y. Wu, J. ZHAO, S. Li, T. Ko, H. Tang, X. Shao, M. D. Plumbley, and W. Wang, “An Encoder-Decoder based Audio Captioning System with Transfer and Reinforcement Learning,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), Barcelona, Spain, November 2021, pp. 206–210. (arXiv), (pdf)
X. Liu, Q. Huang, X. Mei, T. Ko, H. Tang, M. D. Plumbley, and W. Wang, “Cl4AC: A Contrastive Loss for Audio Captioning,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), Barcelona, Spain, November 2021, pp.196–200. (arXiv), (pdf)

X. Liu, X. Mei, Q. Huang, J. Sun, J. Zhao, H. Liu, M. D. Plumbley, V. Kılı c ̧ , and W. Wang, “Leveraging pre-trained bert for audio captioning,” arXiv preprint arXiv:2203.02838, 2022. (arXiv), (pdf)

X. Mei, X. Liu, H. Liu, J. Sun, M. D. Plumbley, and W. Wang, “Automated Audio Captioning with Keywords Guidance”, DCASE2022 Challenge, July 2022. (pdf)
X. Mei, X. Liu, H. Liu, J. Sun, M. D. Plumbley, and W. Wang, “Language-Based Audio Retrieval with Pre-trained Models”, DCASE2022 Challenge, July 2022. (pdf)
X. Mei, Q. Huang, X. Liu, G. Chen, J. Wu, Y. Wu, J. Zhao, S. Li, T. Ko, H. L. Tang, X. Shao, M. D. Plumbley, and W. Wang, “An encoder-decoder based audio captioning system with transfer and reinforcement learning for DCASE challenge 2021 task 6”, DCASE2021 Challenge, July 2021. (pdf)