FCC rules require TV stations to provide closed captions that convey speech, sound effects, and audience reactions such as laughter to deaf and hard of hearing viewers. YouTube isn’t subject to those rules, but thanks to Google’s machine-learning technology, it now offers similar assistance.
YouTube has used speech-to-text software to automatically caption speech in videos since 2009 (they are used 15 million times a day). Today it rolled out algorithms that indicate applause, laughter, and music in captions. More sounds could follow, since the underlying software can also identify noises like sighs, barks, and knocks.
The company says user tests indicate that the feature significantly improves the experience of the deaf and hard of hearing (and anyone who needs to keep the volume down). “Machine learning is giving people like me that need accommodation in some situations the same independence as others,” says Liat Kaver, a product manager at YouTube who is deaf.
Indeed, YouTube’s project is one of a variety that are creating new accessibility tools by building on progress in the power and practicability of machine learning. The computing industry has been driven to advance software that can interpret images, text, or sound primarily by the prospect of profits in areas such as ads, search, or cloud computing. But software with some ability to understand the world has many uses.
Last year, Facebook launched a feature that uses the company’s research on image recognition to create text descriptions of images from a person’s friends, for example.
Researchers at IBM are using language-processing software developed under the company’s Watson project to make a tool called Content Clarifier to help people with cognitive or intellectual disabilities such as autism or dementia. It can replace figures of speech such as “raining cats and dogs” with plainer terms, and trim or break up lengthy sentences with multiple clauses and indirect language.