We are thankful to Mr. Atsuyuki Takahashi and the announcers at the NHK Communications Training Institute, and to the English teachers at our university who cooperated in the creation of an utterance database. This work was supported by JSPS KAKENHI Grant Number 25330418.
Above figures show the movement history of the characteristic points of the lips in an announcer and an utterance learner, respectively, when uttering ‘Attara aisoyoku aisatsushinasai: If you meet a friend, greet him or her with sufficient sociability.’ The vertical broken lines indicate the pauses between the words of the sentence.
In the case of ‘Attara aisoyoku aisatsushinasai,’ the announcer’s speaking speed was slower than that of the utterance learner, with the announcer taking 1.2 seconds to pronounce ‘Attara’ and the learner taking only 0.8. Additionally, when comparing the utterance of "A" at the head of each phrase, the announcer’s displacement of the lower lip increased gradually as the sentence progressed, while the same tendency was not observed in the learner.

Above left figure is the line graphs of the English teacher’s lip movements and Participant A’s lip movements when they pronounced the word “cat”. In the middle one is Participant A’s lip movements before starting the training. On the right side one is Participant A’s lip movements after the 10 repetitions of the training. The English teacher opened her mouth widely after she pronounced /kæ/, while Participant A did not open her mouth widely and did not speak clearly. However, her lip movements improved somewhat and become more like the English teacher’s after the 10 repetitions of the training.
If you use "Edge", movie is not played, Please use "Internet Explorer".

Figure 1 is a display for acquisition of lip feature points. Figure 2 is a display for lip movement training. When you are recognized by a lip feature point collecting application, your face feature points in Figure 1 are shown by it. This application was developed in our laboratory by using a Seeing Machines Inc.'s Face API. To operate this application: first, select a file for training from “Set up” button and type a file name for saving data. Second, start training by clicking “Start” button. When you close your mouth, the application recognizes your mouth and you can start pronouncing. If you are not recognized by the application, click “Re-recognition” button. Third, click “Stop” button for stopping recording. Fourth, a lip movement training display is shown after stopping recording. A red line diagram in Figure 2 is teacher’s lip movement that is used as model data. A black line diagram in Figure 2 is participants lip movements. When you click “Start” button in Figure 2, you can compare your lip movements with teacher’s lip movements.

Development of a speech training system by lip movements

本文へジャンプ Feb. 2.2016 

Speech recognition technology is spreading with personal digital assistants such as smart phones. However, we are concerned about the decline in the recognition rate at places with multiple voices and considerable noise. Therefore, we have been studying a lip operation that would recognize the content of an utterance by reading from an image. Based on this research, we created a database of utterances by Japanese television announcers and English teachers for utterance training in Japanese and English. Furthermore, applying the technology we developed, we propose a method of utterance training using specific equipment.

References
1)Tomoki Yamamura, Miyuki Suganuma, Eiki Wakamatsu, Yuko Hoshino and Mitsuho Yamada,Development of a speech training system by lip movements,2015 International Conference on Computer Application Technologies (ICCAT2015)
,Matsue,Sep.2015
2)Miyuki SUGANUMA,Tomoki YAMAMURA,Yuko HOSHINO and Mitsuho YAMADA,How to evaluate English pronunciation learning by lip movements,IMQA2016, Nagoya,Mar.2016

4K