Speech Recognition by Machines

Hello guys, After a long time of creating this blog I was in dilemma to select a particular topic that I want to write about. After a lot of research on what to write and how to write, I came to a conclusion that I'm going to write about Speech recognition by machines. This topic fascinates me the most when I talk about Natural Language Processing. I have written a review paper on this topic as well.

You can refer to the paper at https://www.ijert.org/a-review-on-speech-recognition-by-machines.

The very first thing we need to understand is that Natural Language Processing(NLP) is a subpart of Artificial Intelligence which deals with the interaction of humans and Computer Systems with the help of natural language. The main goal of this NLP is to understand and decrypt the human language by computers or machines. Now we understood what Speech recognition might be, Speech recognition is mainly about understanding the human language and converting that in the form of words or sentences with the help of computer programs or algorithms. When we hear about Speech recognition the first thing that strikes our mind is Google Assistant or Siri or Cortana or Alexa which is absolutely right these are the examples of speech recognition.

Speech recognition types:

There are mainly 6 types of speech. When we say speech it means the utterance of a speaker. There are basically:

1) Isolated words: This mainly lack audio signals. In this at once only one word or one utterance can be converted also when the speaker the process is taking place both sides of the sample window needs to be vey quiet. This mainly works with listen and not-listen states. This means whenever the speaker is uttering words it is called the listen state and when the speaker is silent then it is called as not-listen state. During this not-listen state the system actually converts the uttered word and gives the output and gets ready for the next processing.

2) Connected words: In this speaker gives different utterances but it is allowed that different utterances can be processed together for the output. This type of systems are called connected words system.

Whereas isolated words system is also regarded as discrete speech systems. These two are mostly similar.

3) Spontaneous words: This kind of systems are really complex and mainly automatic speech recognition systems with spontaneous words is most difficult to code. In this the system can even understand the sounds made by the speaker in between the utterances like "Uh, Hmm, Umm".

4) Natural language: This one is mostly like the conversation between two parties. In this the system will understand the utterances by the speaker and also gives the reply to the speaker.

There are actually two types of systems they are speaker dependent system and speaker independent system. For some systems the training of voices and training to understand the utterances of speaker is really important this type of systems are called as speaker dependent system. And the systems which doesn't need any training are called as speaker independent systems.

Simple Model of speech recognition technique:

When the speech is observed by the system then the speech signal is analyzed this step is mainly regarded as feature analysis. After the features of the signals are analyzed then with the help of acoustic models which contains the unit models and lexicons and language models which contains the syntax and semantics of the sentences, the words and sentences of the utterances are framed and then the framed sentences are searched after which the main sentence is recognized and given as the output. This is the basic working of speech recognition. In this feature analysis steps mainly the time varying voice of speech signals are depicted.

For the large vocabulary sentences we assume a simple probabilistic condition where the words sequence is specified by W, and the sentences that are observed based on word sequences is specified by S with the probability P (W, S). Now our aim is to find the word strings with the help of sentences observed and the decoded or observed string has maximum a posteriori probability.

𝑃 ( 𝑊 𝐴 ) = arg 𝑚𝑎𝑥𝑤 𝑃( 𝑊 𝐴 ) (1)[1]

Using Baye’s rule the (1) equation can be written as

𝑃 ( 𝑊 𝐴 ) = 𝑃( 𝐴 𝑊 )𝑃(𝑊) 𝑃(𝐴) (2)[1]

As we know that P(A) is independent of W the maximum a posteriori equation is

𝑊 = arg 𝑚𝑎𝑥𝑤 𝑃 ( 𝐴 𝑊 ) 𝑃(𝑊) (3)[1]

In the equation (3) P(A/W) is called as acoustic model. Therefore P(A/W) is calculated. In the same equation P(W) is called as language model.

Applications of speech recognition:

There are many applications of speech recognition. Speech recognition systems made our life very easy. Some of the examples where speech recognition is used are:

Healthcare sector

Military

Education sector

Communication sector

Translation

Banking

Marketing etc.

Search Here!!!

Techdummies