Definition: Vocoder is an audio processor that is used to transmit speech or voice signal in the form of digital data. The vocoder is used as short form for voice coder. Vocoders are basically used for digital coding of speech and voice simulation. The bitrate for available narrowband vocoders is from 1.2 to 64 kbps.
Vocoder operates on the principle of formants. Formants are basically the meaningful components of a speech that is generated due to the human voice.
Whenever a speech signal is transmitted, it is not needed to transmit the precise waveform. We can simply transmit the information by which one can reconstruct that particular waveform. This reconstructed waveform at the receiver must be similar and not identical to the waveform actually transmitted.
Vocoder works in such a way that it first captures the characteristic element of the signal. Then other audio signals are affected by the use of that characteristic signal.
Vocoders are used for voice synthesis. The vocoder takes two signals and creates a third signal using the spectral information of the two input signals. It aims to emblem the amplitude and frequency characteristic of speech signal onto the synthesis signal, while maintaining the pitch of the speech signal.
Let’s have a look at the voice model shown below-
A voice model is used to simulate voice. As speech contains a sequence of voiced and unvoiced sounds, this is the basis for the operation of a voice model.
Before proceeding further, it is better to first understand what is voiced and unvoiced sounds.
Voice sounds are basically the sounds generated by vibrations of the vocal cords.
On contrary, the sound produced at the pronunciation of the letters such as ‘s’, ‘p’ or ‘f’ is known as unvoiced sounds. Unvoiced sounds are generated by expelling air through lips and teeth.
As we can see in the above figure of speech model used in Vocoder. Here, voiced sounds are simulated by the impulse generator, the frequency of which is equal to the fundamental frequency of vocal cords. The noise source present in the circuit is used to simulate the unvoiced sounds.
The position of the switch helps in determining whether the sound is voiced or unvoiced.
Then the selected signal is passed through a filter that simulates the effect of mouth, throat and nasal passage of speaker. The filter unit then filters the input in such a way so as the required letter is pronounced. Thus we can have a synthesised approximated speech waveform.
LPC is extensively used in case of speech and music application. LPC is an acronym for Linear Predictive Coding. It is basically a technique to estimate future values. In simple words we can say, by analysing two previous samples it predicts the outcome.
Vocoder is comprised of voice encoder and decoder. Let us now discuss the operation of each in detail-
The figure given below shows the block diagram of voice encoder-
The frequency spectrum of the speech signal (200Hz – 3200Hz) is divided into 15 frequency ranges by using 15 Bandpass filter(BPF) each having bandwidth range of 200Hz. The output of BPF acts as input for the rectifier unit.
Here, the signal is rectified and filtered so as to produce a dc voltage. This generated dc voltage is proportional to the amplitude of AC signal present at the output of the filter.
The input of the frequency discriminator is the speech signal. Frequency discriminator unit is followed by a Low pass filter(LPF) of 20Hz. This LPF generates a dc voltage proportional to the voice frequency. The frequency represents nothing else than the pitch of the voice.
This dc voltage also indicates whether the speech is voiced or unvoiced.
Now, the output at all the LPF’s is dc voltage which is sampled, multiplexed and A/D converted. So, we have a digital equivalent of the speech signal at the output of the encoder. This encoded voice signal consists of frequency component from 200Hz to 3200Hz, information regarding the pitch of the speech and whether it is voiced or unvoiced.
The digital voice signal generated by the voice encoder is firstly decoded. Then voice decoder using a speech synthesizer produces voice signal at its output. It generally generates an approximate voice signal.
The block diagram of voice decoder section is shown below-
The demultiplexer and DAC section convert the received encoded signal back to its analog form. Here, a balanced modulator(BM)-filter combination is used in correspondence to rectifier-filter combination at the encoder. The carrier to this BM is either the output of noise generator or pulse generator. But this depends on the position of the switch.
However, the switch position is decided by the decoder. It is so because when the voiced signal is received, the switch connects the pulse generator output to the input of all the BM.
Similarly, when an unvoiced signal is received, the switch connects noise generator output to the input of all the BM. But, the position of the switch totally depends on the decision of decoder.
Only certain BM will provide the output if the received signal is voiced. This totally depends on the frequency component of the received signal. But we can get output from all the BM if the received signal is unvoiced. The adder will thus add up all the analog signal and produce voice or speech output.
Speech transmission using Vocoder is helpful but it is a disadvantageous technique. This is so because it leads to degradation in speech quality.