Caption on Glass

Real-Time Captioning on Google Glass to Assist the Communication of the Hearing-impaired


 

Motivation

The overarching goal of the project is to assist the hearing-impaired and the deaf to communicate with the assistance of Google Glass. The aim is to use the visual display capabilities of Glass to assist the deaf by displaying real-time captions of the conversation, and also to explore possibilities of using the auditory capability of Glass to enhance and/or amplify the audio for the hearing-impaired in addition to the visual assistances. Our targeted users are the people with hearing loss and the deaf. Most of them are very used to reading captions. Instant captioning would make their life a lot easier in many aspects. 

 
 

Features

Transcription Accuracy: User can select an alternate transcriptions from the five best context based possibilities or manually edit text.

 
 

Real Time Transcription: Transcription of the conversation as close to real-time as possible sending transcriptions on the fly.

 
 

Seamless Connection: System remembers the last device that it was paired to and connects to the device if it is in its vicinity.

 
 

Accessibility: Size of the font can be altered to account for Glass users with vision issues.

 

 

Prototype

cogArch small.jpg
 

The system requires the hearing impaired user to wear Google Glass and give his paired Android phone to his conversation partner. Using the Android phone’s microphone and speech processing capabilities, an Android application would send the text to the Glass over a secured Bluetooth connection. Google’s Speech to Text API is used to help with that transcription. The captions would be sent to the Glass instantly. 

 
 

 

Research

We conducted a series semi-structured interviews and diary studies with deaf and hard-of-hearing people to identify usability problems and seek design insights.

 
 

 

Design Iterations

We decided to keep the Glass interface simple and clean ensure seamless connection and quick and easy start of the Glassware. We conducted research and testing on the font size, background color, the idle time before the screen goes to sleep, as well as viewing previous transcriptions.

 
voice-command.png
Android UI Mockup1.png
Android UI1.jpg
 

During our testing of the initial prototype, we have discovered a usability issue with the "Tap to Talk" button. Participants have complained about that they have to tap every time when they start a sentence, which interrupts conversation flow. Also, the transcription sometimes gets cut off prematurely because there is a longer pause in their speech.

  1. Tap to Talk: Tap to start speech to text, automatically ends transcription when no speech coming in.

  2. Walkie-Talkie: Push to talk. Hold for speech to text to be active, release to turn off.

  3. Continuous Speech: Tap once to start, keep on talking (or not talking) for as long as desired, then tap again to turn off transcription.

 
 

hrough our research, we found out that the most common use case would be the back and forth, fast paced conversation between two people. With the "Walkie-Talkie" button, people can control both of when the transcription starts and when it ends with one action. It will also avoid the situation when the app ends the transcription prematurely because someone talks too slow or have a pause in the sentence. Then we have the use case where people want to have continuos transcriptions, such as when they are in a lecture, where it's more one directional transcription rather than a conversation. In this case, the "Continuous Speech" mode is more appropriate. 

We encountered a design problem when trying to incorporate both the "Walkie-Talkie" and "Continuous Speech" modes into the design.  After several iterations, we have come up with the idea to combine both transcription modes into one UI element that supports smooth transition between the two modes.