Language Interaction

Conversational AI to enhance/augment machine and human capabilities

Speech Technology group

Design conversional AI to enhance/augment user capabilites

Infrastructure
maintenance

Online
Education

Training &
Support

Remote
meetings

Collaborative
Robots

Health
Care

The Speech Technology Group (STG) is at the heart of modern artificial intelligence by designing novel algorithms for automatic speech recognition and data-driven dialogue systems enabling the creation of advanced and natural, speech enabled, human-machine interfaces. STG has been established in 2002 and since then worked on a wide range of speech technologies that include text-to-speech synthesis, speech intelligibility, automatic speech recognition (ASR) and dialogue modelling. Our focus is to develop advanced natural spoken human-machine interfaces and develop products and services that facilitate easy access to information, thereby improving productivity and quality of human life. STG has made significant contributions to the next generation of Toshiba’s speech recognition, HMM-based speech synthesis and statistical dialogue modelling. We work in collaboration with the speech R&D groups at the Knowledge Media Lab, Toshiba RDC, Kawasaki, Japan and Toshiba China R&D Centre, Beijing, China, and business divisions of Toshiba Group, Japan. Working with groups within Toshiba, we have a tight coupling between our R&D efforts and current and future product development. STG has a long history to work and collaborate with academia, and also constantly strives to forge new relations. We fund research and have academic collaborations with groups in various UK and European Union Universities and Research Centres. Combining the strengths of our group with these collaborations, we address various research topics related to Speech Technology for the future.

Automatic Speech Recognition

Automatic transcription of speech to text plays a critical role in the human-machine interaction. Background noise, reverberation, competing speakers and natural speech variability across speakers make the task challenging. Toshiba aims to improve the state-of-the-art in automatic speech recognition by combining signal processing and machine learning approaches. Our research focuses on both front-end (signal enhancement) and back-end (acoustic modelling for end-to-end streaming ASR, adaptation of end-to-end models).

Dialogue Modelling

The Vision & Learning Group (VLG) focuses on learning from interaction in physical environments. Complex and safe manipulation and navigation technology leverage precise 3D geometry and scene understanding in conjunction with strong world-aware action selection frameworks. Learned concepts are effectively transfered to new domains.

Language & Interaction Group Latest Publications

Information contained in news and other announcements is current on the date of posting, but subject to change without notice.

2024

DiaLoc: An Iterative Approach to Embodied Dialog Localization

Chao Zhang, Mohan Li, Ignas Budvytis, Stephan Liwicki
Toshiba Europe Ltd

2024

ReCoRe: Regularized Contrastive Representation Learning of World Model

Rudra P.K. Poudel, Harit Pandya, Stephan Liwick, Roberto Cipolla
Cambridge Research Laboratory Toshiba Europe Ltd, UK

2023

Cumulative Attention based streaming transformer ASR with internal language model joint training and rescoring

M. Li, C-T Do and R. Doddipatla
Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes Island, Greece, June 2023

Language Interaction

Conversational AI to enhance/augment machine and human capabilities

Speech Technology group

Design conversional AI to enhance/augment user capabilites

Infrastructure
maintenance

Online
Education

Training &
Support

Remote
meetings

Collaborative
Robots

Health
Care

Automatic Speech Recognition

Dialogue Modelling

Language & Interaction Group Latest Publications

DiaLoc: An Iterative Approach to Embodied Dialog Localization

ReCoRe: Regularized Contrastive Representation Learning of World Model

Cumulative Attention based streaming transformer ASR with internal language model joint training and rescoring

Contact Us

Vacancies

Newsroom

Global R & D

Cambridge Research Laboratory

Digital Transformation

Sustainable energy solutions

Research & Development

Semiconductor & Storage Solutions

About Toshiba

Quantum Technology

Newsroom

Other Business Divisions

Language Interaction

Conversational AI to enhance/augment machine and human capabilities

Speech Technology group

Design conversional AI to enhance/augment user capabilites

Infrastructure maintenance

Online Education

Training & Support

Remote meetings

Collaborative Robots

Health Care

Automatic Speech Recognition

Dialogue Modelling

Language & Interaction Group Latest Publications

DiaLoc: An Iterative Approach to Embodied Dialog Localization

ReCoRe: Regularized Contrastive Representation Learning of World Model

Cumulative Attention based streaming transformer ASR with internal language model joint training and rescoring

Contact Us

Vacancies

Newsroom

Global R & D

Cambridge Research Laboratory

Digital Transformation

Sustainable energy solutions

Research & Development

Semiconductor & Storage Solutions

About Toshiba

Quantum Technology

Newsroom

Other Business Divisions

Infrastructure
maintenance

Online
Education

Training &
Support

Remote
meetings

Collaborative
Robots

Health
Care