Jianbo Ma

This is a home page of Jianbo Ma, who is a researcher in the area of machine learning/deep learning. The world is changing fast and we ought to share and exchange ideas more frequently. This motivates me to create this page, in order to share ideas with you who may also have the same interests.

news

No news so far...

latest projects

Jan 3, 2026	Transcript service with Whisper
Jan 2, 2026	A conversatinal bot with unmute
Oct 12, 2025	Equilibrium Matching Generative Modeling with Implicit Energy-Based Models
Aug 25, 2025	WAN Video Generative Models
Aug 23, 2025	Diffusion models

latest posts

Dec 12, 2023	Quick Paper Post

selected publications

Gotta hear them all: Sound source aware vision to audio generation

Wei Guo, Heng Wang, Jianbo Ma, and 1 more author

arXiv preprint arXiv:2411.15447, 2024

Bib PDF

@article{guo2024gotta,
  title = {Gotta hear them all: Sound source aware vision to audio generation},
  author = {Guo, Wei and Wang, Heng and Ma, Jianbo and Cai, Weidong},
  journal = {arXiv preprint arXiv:2411.15447},
  year = {2024},
}

Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models

Saksham Singh Kushwaha, Jianbo Ma, Mark R. P. Thomas, and 2 more authors

In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025

Bib PDF

@inproceedings{10888882,
  author = {Kushwaha, Saksham Singh and Ma, Jianbo and Thomas, Mark R. P. and Tian, Yapeng and Bruni, Avery},
  booktitle = {ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models},
  year = {2025},
  volume = {},
  number = {},
  pages = {1-5},
  keywords = {Measurement;Scalability;Spatial audio;Semantics;Phase estimation;Noise;Diffusion models;MONOS devices;Speech processing;Spectrogram;Spatial audio generation;Ambisonics},
  doi = {10.1109/ICASSP49660.2025.10888882},
}

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

Dongdi Zhao, Jianbo Ma, Lu Lu, and 6 more authors

arXiv preprint arXiv:2401.02673, 2024

Bib PDF

@article{zhao2024unified,
  title = {A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model},
  author = {Zhao, Dongdi and Ma, Jianbo and Lu, Lu and Li, Jinke and Ji, Xuan and Zhu, Lei and Fang, Fuming and Liu, Ming and Jiang, Feijun},
  journal = {arXiv preprint arXiv:2401.02673},
  url = {https://arxiv.org/pdf/2401.02673},
  year = {2024},
}

V2a-mapper: A lightweight solution for vision-to-audio generation by connecting foundation models

Heng Wang, Jianbo Ma, Santiago Pascual, and 2 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence, 2024

Bib PDF

@inproceedings{wang2024v2a,
  title = {V2a-mapper: A lightweight solution for vision-to-audio generation by connecting foundation models},
  author = {Wang, Heng and Ma, Jianbo and Pascual, Santiago and Cartwright, Richard and Cai, Weidong},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  volume = {38},
  number = {14},
  pages = {15492--15501},
  year = {2024},
}

A low latency attention module for streaming self-supervised speech representation learning (second version of ’low latency attention’)

Jianbo Ma, Siqi Pan, Deepak Chandran, and 2 more authors

arXiv preprint arXiv:2302.13451, 2024

arXiv Bib PDF

@article{ma2023low,
  title = {A low latency attention module for streaming self-supervised speech representation learning (second version of 'low latency attention')},
  author = {Ma, Jianbo and Pan, Siqi and Chandran, Deepak and Fanelli, Andrea and Cartwright, Richard},
  journal = {arXiv preprint arXiv:2302.13451},
  year = {2024},
  url = {https://arxiv.org/abs/2302.13451},
}

Low latency transformers for speech processing

Jianbo Ma, Siqi Pan, Deepak Chandran, and 2 more authors

arXiv preprint arXiv:2302.13451, 2023

arXiv Bib PDF

@article{ma2023lox,
  title = {Low latency transformers for speech processing},
  author = {Ma, Jianbo and Pan, Siqi and Chandran, Deepak and Fanelli, Andrea and Cartwright, Richard},
  journal = {arXiv preprint arXiv:2302.13451},
  year = {2023},
  url = {https://arxiv.org/abs/2302.13451},
}

Hidden Markov Models and Connectionist Temporal Classification

Jianbo Ma

In , 2022

Bib PDF