Automatic speaker recognition
Active In SP
Joined: Oct 2009
29-10-2009, 02:41 PM
Automatic speaker recognition.pdf (Size: 377.59 KB / Downloads: 243)
Automatic speaker recognition
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion
Active In SP
Joined: Sep 2010
12-01-2011, 03:21 PM
final.doc (Size: 780 KB / Downloads: 131)
Speaker authentication is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. Many principles are used in the area of voice recognition. This paper provides a method of storing the voiceprints of individuals uniquely, based on the Hidden Markov Model. HMM has been used in the speech recognition area for a long period of time, but VoizLock project and implimentation explores a way of using HMM for voice authentication which is different from speech recognition. This voiceprint will then be used for voice authentication, using text independent speaker recognition methods in which the system does not rely on a specific text being spoken, but solely on the voice of the speaker. This paper also provides details about certain misconceptions with regard to voice authentication that exist in the society.
A person's voice contains various parameters that convey information such as emotion, gender, attitude, health and identity. This thesis talks about speaker recognition which deals with the subject of identifying a person based on their unique voiceprint present in their speech data. Pre-processing of the speech signal is performed before voice feature extraction. This process ensures the voice feature extraction contains accurate information that conveys the identity of the speaker.
This paper explains more about the user training phase detailing how the voice print of an individual is Stored in the system by extracting certain values of the waveform using HMM. Apart from the training phase this analyses the results obtained from the testing done covering different scenarios pertaining to voice authentication.
In everyday life, there is a need for controlled access to certain information places for security purposes. Typical such secure identification system requires a person to use a cardkey (something that the user has) or to enter a pin (something that the user knows) in order to gain access to the system. However, the two methods mentioned above have some shortcomings as the access control used can be stolen, misused or forgotten.
The desire for a more secure identification system (whereby the physical human self is the key to access the system) led to the development of biometric recognition systems. Biometric recognition systems make use of features that is unique to each individual, which is not duplicable or transferable. There are two characteristics of biometric features. Behavioral characteristics such as voice and signature are the result of body part movements. In the case of voice it merely reflects the physical properties of the voice production organs. The articulatory process and the subsequent speech produced are never exactly identical even when the same person utters the same words. Physiological characteristics refer to the actual physical properties of a person such as fingerprint, iris and hand geometry measurement. Some of the possible applications of biometric recognition systems include user-interface customisation and access control such as airport check in, building access control, telephone banking or remote credit card purchases.
A conversation between people contains a lot of information besides just the communication of ideas. Speech also conveys information such as gender, emotion, attitude, health situation and identity of a speaker. The topic of this thesis deals with speaker recognition that refers to the task of recognizing people by their voices.
The system was built in keeping with the fact that for people to trust the advancement of technology more, the IT systems must have proper security implementations. Sustainable development is also a security imperative, thus the VoizLock –Human Voice Authentication System can be used as a means of fulfilling that level of security along with another one or more security implementation.
The Problem Statement :
Speaker identity is correlated with the physiological and behavioural characteristics of the speaker. The same speakers may speak fast, slow, varying speed or may speak louder or whisper.
• Changes depending on sequence of phonemes.
• Have widely-varying types of environmental noise.
• Performance in variation in the signal characteristics from trial to trial (intersession variability and variability over time). Speakers cannot repeat an utterance precisely the same way from trial to trial.
• Does not have distinct boundaries between units (phonemes).
• Has an unlimited number of words .
The main reason for the difficulty of authenticating voice is the variability of its properties as one voice cannot be straight away compared one to one with another. Thus our system is made to recognize this variability and adapt itself to these variabilities. The basic security vulnerability of a speaker authentication system is that it may be deceived by someone who plays back the recorded voice of a registered speaker.
In the system this problem is overcome by generating a random vocal sound that the user is supposed to utter and by not only verifying that the voice of the user's utterance is same as the voice profile of the user kept in the system but also verifying that the sound generated by the speaker is same as the sound that he was supposed to produce. Because the vocabulary is unlimited, impostors cannot know in advance what sound will be requested.
Proposed System Architecture
1 Project Objectives
To study the concepts of speaker recognition and understand its uses in identification and verification systems.
To conduct research on different types of voiceprint in the field of speaker recognition and understand the details of the feature extraction methods.
To evaluate the recognition capability of different voice features and parameters to find out the method that is suitable for Automatic Speaker Recognition (ASR) systems in terms of reliability and computational efficiency.
2 Project Scope
Although a lot of work has been done in the field of speaker recognition, there are many practical issues to be resolved before it can be implemented in the real world.
The scope of this thesis is to make a general overview of the available techniques and to analyse the reliability of the various voiceprint features for use in ASR.
In this project and implimentation, an open set, text independent, speaker identification system prototype will be developed to conduct the above mentioned.
This section tries to give background knowledge about human voice by discussing basic attributes and elements that forms the human voice
4) Literature Review:-
The fascination with employing speech for the many purposes in daily life has driven engineers and scientist to conduct vast amount of research and development in this field. The idea of an “Automatic speaker recognition” (ASR) aims to build a machine that can identify a person by recognising voice characteristics or features that are unique to each person.
The performance of modern recognition systems has improved significantly due to the various improvements of the algorithm and techniques involved in this field. As of this moment, ASR is still a subject of great interest to researchers and engineers worldwide and the efficiency level of ASR is still improving.
This chapter aims to highlight some of the important techniques, algorithm and research that are relevant to this report. Various types of typical pre-processing techniques, feature extraction and speaker modelling techniques will be covered in this report. An overview of the advantages and typical applications of the techniques and algorithm in the speaker recognition system will be provided.
5) Concepts of speaker recognition
The typical classification of automatic speaker recognition is divided into two tasks: Speaker Identification (SI) and Speaker Verification (SV). Figure 2.1 shows the taxonomy of speech technologies. Speaker recognition is one of the three sub-classes of speech technology which is further subdivided into SI and SV tasks.
6 Phases of Speaker Identification
Automatic speaker recognition system identifies the person speaking based on a database of known speakers in the system . Figure 2.2 shows the overview of an automatic speaker recognition system. In the training or enrolment phase, a new speaker with known identity is enrolled in the database of the system. In the identification phase, voice features from the unknown speaker are extracted and modelled. The speaker model is then used for comparison with speaker models from the enrolment phase to determine the identity of the speaker. Both enrolment and identification phase use the same modelling algorithms.
7 Results and Evidence:-
Testing was done in the form of experiments. Even some of the settings of the system were decided by the result of these test cases.
For example the acceptable range of probability that the input sound should match with the voice profile already stored in the system was decided by doing experiments.
These experimental test cases can be of 3 main types.
• Normal Case:
The performance of the system is analyzed when a privileged users attempt to get authorized. The accuracy of the system is practically viewed. The results are shown graphically in Fig. 8 and the data pertaining to the results are in Table 1. The average match score comes to 73.07 and the match score of the System is set to 70. There were instances where legitimate users couldn’t gain access to the system, but the System has an accuracy of 86.25% which is commendable with regards to Biometrics.
• Imposter Case:
The performance of the system is analyzed when imposters attempt to get authorized. The imposter tries to mimic the voice of the genuine user as closely as possible or tries to perform a tape attackto the system. VoizLock System doesn’t allow such scenarios as it has a text-independent (thus language independent) and a randomly generated phrase which is shown to the user with a timeout.
This prevents an imposter or an impressionist from gaining access to the system. The results of the testing done are shown in Fig. 9. The data obtained through the testing done is shown in Table 2. Out of the 150 test cases considered, only 2 instances allowed an imposter to gain access and the value obtained at both the instances are very close to 70, which is set as the threshold value or match score in the System.
The system is designed to work in an indoor environment with an average amount of noise. The presence of background noise to a large extent might disturb either the Training or the Authentication process. So the noise reduction
1 BRIL Noise Reduction Algorithm
BRIL is a new proprietary algorithm has been recently developed for adaptively estimating and removing background noise from speech signals. Unlike noise reduction and speech enhancement algorithms currently available on the market, the new algorithm is capable of cleaning noisy speech even in severe noisy environments without any distortion to the speech signal. The algorithm has been successfully used in many communication systems with high level of noise. BRIL is suitable for many applications such as car kits, mobile telephony, conferencing, speech recognition, Internet phones, etc. A Windows file processing demo is currently available. The demo package includes a Windows file processing application, speech samples, and documentation. The Windows application takes an input audio file (noisy speech) and allows you to adjust the algorithm parameters and writes the clean speech in an output audio file.the BRIL noise canceller demo to test the algorithm performance for your application. The performance of BRIL is demonstrated visually for different situations in the figures below. You can listen to the audio associated with each image by clicking the "PLAY SAMPLE" link below each image. Make sure to use good headphones for listening since most samples contain low frequency noise that might not be well produced on small PC speakers.
2 SANR135 Harmonic Noise Reduction Algorithm
The SANR135 is based on an adaptive line enhancer algorithm (harmonic noise reduction) and therefore is very effective in reducing time-varying additive harmonic noise. Such noise is usually superimposed on speech and audio signals from the recording hardware and/or surrounding environment. Examples of this kind of noise are hum, hiss, computer fan noise, etc. This algorithm is also suitable for multiple-tone siren noise usually impairing the radio communication in ambulance and police vehicles. The adaptive technique used by the SANR135 not only accurately estimates the noise but can also track any changes in the noise frequency and amplitude. The effectiveness of the SANR135 in reducing stationary as well as time-varying harmonic noise is demonstrated in the samples below. You can listen to the audio associated with each image by clicking the "PLAY SAMPLE" link below each image. Make sure to use good headphones for listening since most samples contain low frequency noise that might not be well produced on small PC speakers.
3 SANR145 Wide-band Noise Reduction Algorithm
The SANR145 although based on the spectral subtraction principle, it is absolutely free from musical noise and other problems usually found in products based on this technique. The SANR145 is capable of reducing noise of arbitrary spectrum including wide-band and harmonic noise. The tracking properties of SANR135 are, however, superior to those of the SANR145 for harmonic noise reduction problems. The SANR145 in recommended in reducing noise of arbitrary (stationary or slowly time-varying) spectrum such as background noise, wind noise, quantization and coding noise, noise from the communication channel, etc. The performance of the SANR145 is demonstrated in the samples below. You can listen to the audio associated with each image by clicking the "PLAY SAMPLE" link below each image. Make sure to use good headphones for listening
The system is designed to work in an indoor environment with an average amount of noise. The presence of background noise to a large extent might disturb either the Training or the Authentication process. It also depends on the microphone that is used. If it does not capture the background noise, there won't be a hindrance to the phases. But the background noise should be minimal in order for the system to work properly. Since the HTK is used as the foundation of the voice recognition engine, design is restricted to the architecture of the HTK. But the design is done in a manner in which it will be compliant with future versions of HTK.
The system cannot work for all available phonemes because it will create a large number of utterances for users to utter and therefore is going to be a tedious task. Since the number of phonemes are limited it is difficult to generate meaningful sample utterances for the users. What makes the VoizLock – Human Voice Authentication System unique is that it uses a text independent voice authentication system. It stores the voice print of the user, which cannot be stolen by anybody since it is not the raw voice that will be stored. It prevents an imposter or impressionist from gaining access to the system by performing a tape attack and it’s clearly depicted from the test results shown. VoizLock has explored a new avenue of using the HMM. HMM has been used for many decades for Speech Recognition or Voice Recognition Systems. But the VoizLock System used the HMM for Voice Authentication which is clearly different from the above two.
Active In SP
Joined: Feb 2011
25-02-2011, 12:06 PM
Automatic+Speaker+Recognition+System.doc (Size: 92 KB / Downloads: 77)
Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to user's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.
The goal of this project and implimentation is to build a simple, yet complete and representative automatic speaker recognition system. Due to the limited space, we will only test our system on a very small (but already non-trivial) speech database. There were 8 female speakers, labeled from S1 to S8. All speakers uttered the same single digit "zero" once in a training session and once in a testing session later on. Those sessions are at least 6 months apart to simulate the voice variation over the time. The vocabulary of digit is used very often in testing speaker recognition because of its applicability to many security applications. For example, users have to speak a PIN (Personal Identification Number) in order to gain access to the laboratory door, or users have to speak their credit card number over the telephone line. By checking the voice characteristics of the input utterance, using an automatic speaker recognition system similar to the one that we will develop, the system is able to add an extra level of security.
1. Principles of Speaker Recognition
Speaker recognition can be classified into identification and verification. Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. Figure shows the basic structures of speaker identification and verification systems.
Speaker recognition methods can also be divided into text-independent and text-dependent methods. In a text-independent system, speaker models capture characteristics of somebody’s speech which show up irrespective of what one is saying. In a text-dependent system, on the other hand, the recognition of the speaker’s identity is based on his or her speaking one or more specific phrases, like passwords, card numbers, PIN codes, etc.
All technologies of speaker recognition, identification and verification, text-independent and text-dependent, each has its own advantages and disadvantages and may requires different treatments and techniques. The choice of which technology to use is application-specific. The system that we will develop is classified as text-independent speaker identification system since its task is to identify the person who speaks regardless of what is saying.
At the highest level, all speaker recognition systems contain two main modules (refer to Figure ): feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the voice signal that can later be used to represent each speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing extracted features from his/her voice input with the ones from a set of known speakers. We will discuss each module in detail in later sections.
All speaker recognition systems have to serve two distinguish phases. The first one is referred n sessions or testing phase. In the training phase, each registered speaker has to provide samples of their speech so that the system can build or train a reference model for that speaker. In case of speaker verification systems, in addition, a speaker-specific threshold is also computed from the training samples. During the testing (operational) phase (see Figure), the input speech is matched with stored reference model(s) and recognition decision is made.