Friday, 31 May 2019

Automating Lip-reading.



It's taken them a long time to understand what lip-readers (and sign users), already know.  Ergo lip-readers use the entire visual image to follow, the same as sign users do, we don't just look at the face, we try to take in the whole picture.  

If you observe the deaf signer then concentration is not on the hands most of the time, as averse to lip-readers where the face is all.  The issue with lip-reading is that it is assumed unless the total concentration is on the face it is hard to follow and there are fewer visuals that can add to it.   The issue we have is people cannot orate properly and the ideal situation for effectiveness doesn't exist nor do classes approach tuition to accommodate that. Less than 5% of deaf or hard of hearing people attend a lip-reading class.  

The ratio with deaf pre-signers attending classes is that even fewer of them do. The signer doesn't use many assistive aids to follow as the lip-reader tends to do, but often it adds to lesser understanding rather than improves it because we don't really know what we can hear, then guesswork gets involved, some of it educated some totally 'Half past two, how are you..'.  There are issues with body language as regards to different cultures and people too, as well as their etiquettes.

The study investigates a model that can use hybrid visual features for optimizing lip-reading. Lip reading, also known as speech-reading, is a technique of understanding speech by visually interpreting the movements of the lips, face and tongue when normal sound is not available. 

Experiments over many years have revealed that speech intelligibility increases if visual facial information becomes available. The research was carried out by Fatemeh Vakhshiteh with the supervision of professors Farshad Almasgan and Ahmad Nickabadi. In an interview with ISNA, Vakhshiteh said using a variety of sources for extracting information substantially helps the lip-reading process. According to Vakhshiteh, this model was inspired by the function of the brain because the human brain also processes several sources of information in production and reception of speech. 

In this model, deep neural networks are used to make the recognition of lip-reading as well as phone recognition easier, she said. “The neural networks were specially used for situations that audial and visual features must be processed simultaneously.” “This is especially helpful in noisy environments where the audial data produced by speakers might become less clear or incomprehensible.” “This would also help the people with speech difficulty because they can use their visual data to compensate for the interruption in the speech signal they receive,” she added. The research results demonstrated that the proposed method outperforms the conventional Hidden Markov Model (HMM) and competes well with the state-of-the-art visual speech recognition works.

No comments:

Post a Comment