Chapter 69: The Temperature of Speech
Wu Hao shook his head with a smile and said, "No, it's just an unfinished product, and there are still many problems that we need to solve."
For example, in the dialogue just now, it is more difficult to understand and deal with the ambiguous context. ”
"Ambiguous context?"
Zou Xiaodong was stunned for a moment, and quickly understood: "It seems that it is more difficult for us real people to understand, let alone the machine program."
Boss, I don't quite understand. Speech recognition and voice dialogue are currently being done by most technology companies, and the results are good.
These voice software also have a high degree of recognition of our normal speech, basically reaching more than 99%.
However, the response speed of these software is far less fast than the recognition speed of our set of technologies, and the comprehension ability is not as strong as it, and the associative processing ability is not comparable.
Also, in terms of voice conversations, how do you do it so that the language of the machine can be so close to a human voice.
You must know that human hearing is still very sensitive, and it is still very easy to distinguish between human and machine program sounds. ”
Wu Hao heard a lot of questions from Zou Xiaodong, and asked him rhetorically: "What do you think is the biggest difference between a real voice and an AI voice?" ”
Zou Xiaodong thought for a moment, and then replied: "Less flat and frustrated? ”
Wu Hao shook his head and said: "This is not the most critical, in fact, some voice software on the market at present has been able to perform a simple settling of frustration." ”
"That's ......"
Wu Hao looked at Zou Xiaodong's puzzled look, and said with a smile: "Feelings, all the voice programs on the market are missing feelings. ”
"Feelings, what kind of joke is this, how can the program have feelings, this is what people have." Zou Xiaodong shook his head and said incomprehensibly.
Wu Hao smiled, and then controlled the computer to display the schematic diagram of the structure on the big screen and said: "It's not so much about feelings, it's about language temperature."
When we speak, the other party can clearly perceive the emotional changes when we speak, which is the emotion and the temperature of the language.
Language programs, on the other hand, react according to a fixed formula. So it can't understand the temperature of each sentence, and naturally it doesn't have the temperature when it comes to generating speech.
What we need to do is to add an understanding of the language vocabulary environment in the process of speech recognition, and analyze the temperature of the speech and the emotional changes of the speaker from different tones. ”
"I still can't understand how programs can capture the ever-changing emotions that people show when they speak. You have to know that sometimes a slight change in language and tone can show two completely different meanings and two emotions, and how the machine can tell the difference. Zou Xiaodong said with his doubts.
Wu Hao smiled and demonstrated the content on the screen, and replied to him: "This is the use of AI technology, everyone's language and tone are different, and the expression of emotions is also ever-changing. In the traditional way, we need to capture, collect and analyze these ever-changing linguistic intonation contexts to define them. If that's the case, it's too much work.
Therefore, the learning and evolution ability of AI technology has allowed me to find an idea, we can train a basic AI voice program by grabbing the massive voice information on the Internet.
Of course, this is just a sample of the basic program, and we need to adapt accordingly according to the user's habits. Let the program learn to adapt to the user, and the longer the user uses it, the more accurate the recognition understanding of the AI recognition program will be. ”
Speaking of this, Wu Hao said with a smile: "This is actually very similar to the process of our real people getting along in real society, after two strangers get along, both sides are gradually getting to know each other."
The more time passes, the more familiar they become. Even a simple word, gesture or eye look on one side can be accurately received and understood by the other party, which is called tacit understanding.
What we need to do is to cultivate the tacit understanding between the program and the people, but it is difficult for the user to change, and it can only be imperceptibly influenced. So we have to start with the program software, adapt it to the user, and subtly change the user.
Only in this way will human-computer interaction be more tacit.
That's why when I was talking to 10 before, it couldn't understand my ambiguous context. It didn't adapt to my speaking habits, so it didn't understand what I meant by the vague words I was saying.
Ambiguous words like what, several, how many, then, where, arbitrary, are difficult for programs to understand and deal with. This requires us to give a basic definition of these words, which cannot be rigid and rigid, and must be modified and changed accordingly according to the user's context. ”
After saying this, Wu Hao looked at Zou Xiaodong and said: "Only after the program understands the emotional temperature in our real words, can the program simulate a voice similar to that of a real person." ”
"In any case, this is a major breakthrough in the field of AI voice technology. I think this technology will definitely shock the world once it is released, and it represents the real arrival of this era of intelligent voice.
To be honest, I can't wait. Zou Xiaodong licked his somewhat dry lips and said excitedly.
Wu Hao waved his hand and said: "It's not as exaggerated as you said, but it is indeed a major breakthrough in technology." ”
"Boss, do you plan to target this technology directly to the mass consumer market, or do you want to partner with enterprise users, sell technology and related patents, or serve them with open source relaxation." Zou Xiaodong said curiously at him. It's a heavyweight piece of technology that will shake up the industry no matter who you work with.
"What do you think?" Wu Hao did not answer directly, but asked rhetorically.
Zou Xiaodong thought for a while, and then said seriously to Wu Hao: "If an enterprise wants to become bigger and stronger, it can't be limited to a single field. Partnering with a business can save a lot of things, but it's very risky. Once the partner company acquires more advanced technology, we are at risk of being abandoned.
So I think we should develop the mass market and use this technology to build our brand among the people and expand our reach. Only in this way can we reduce some unnecessary troubles and resistance in our future development. ”
"The analysis is in place, but this market has huge potential, and monopoly alone will definitely not work, and we still need to cooperate with those companies. Of course, we can't afford to lag behind when it comes to the mass market.
So I'm going to do both, and this smart voice assistant is what I built specifically for the mass market. How about putting out the video I just demonstrated, what do you think will be the reaction of society and industry? Wu Hao asked with a smile.
"You mean...... Haha, I'm looking forward to it! ”