Efficient infusion of self-supervised representations in Automatic Speech Recognition Paper • 2404.12628 • Published Apr 19, 2024 • 9
Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion Paper • 2506.01365 • Published Jun 2 • 8
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing Paper • 2406.08802 • Published Jun 13, 2024 • 8
Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning Paper • 2403.15469 • Published Mar 20, 2024 • 9
In-Domain African Languages Translation Using LLMs and Multi-armed Bandits Paper • 2505.15069 • Published May 21 • 7
Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic Languages Paper • 2505.21937 • Published May 28 • 8
Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction Paper • 2501.15219 • Published Jan 25 • 7
Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs Paper • 2412.20440 • Published Dec 29, 2024 • 10
Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection Paper • 2412.20455 • Published Dec 29, 2024 • 9
Fiducial Focus Augmentation for Facial Landmark Detection Paper • 2402.15044 • Published Feb 23, 2024 • 8
Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance Paper • 2503.00147 • Published Feb 28 • 8
DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic Paper • 2506.21260 • Published Jun 26 • 8
M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation Paper • 2206.02187 • Published Jun 5, 2022 • 9