voice morphing full report
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
computer science technology
Active In SP
**

Posts: 740
Joined: Jan 2010
#1
22-01-2010, 05:46 PM



.doc   Voice Morphing final report.doc (Size: 281 KB / Downloads: 1,242)

.doc   Voice Morphing SLIDES.doc (Size: 116 KB / Downloads: 1,050)

ABSTRACT
Voice morphing means the transition of one speech signal into another. The new morphed signal will have the same information content as the two input speech signals but a different pitch, which is determined by the morphing algorithm. To do this, each signal's information has to be converted into another representation, which enables the pitch and spectral envelope to be encoded on orthogonal axes. Individual components of the speech signal are then matched and the signalâ„¢s amplitudes are then interpolated to produce a new speech signal. This new signal's representation then has to be converted back to an acoustic waveform. This project and implimentation vividly describes the representations of the signals required to affect the morph and also the techniques required to match the signal components, interpolate the amplitudes and invert the new signalâ„¢s representation back to an acoustic waveform.

1. Introduction
Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals, while generating a smooth transition between them. Speech morphing is analogous to image morphing. In image morphing the in-between images all show one face smoothly changing its shape and texture until it turns into the target face. It is this feature that a speech morph should possess. One speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties. The major properties of concern as far as a speech signal is concerned are its pitch and envelope information. These two reside in a convolved form in a speech signal. Hence some efficient method for extracting each of these is necessary. We have adopted an uncomplicated approach namely cepstral analysis to do the same. Pitch and formant information in each signal is extracted using the cepstral approach. Necessary processing to obtain the morphed speech signal include methods like Cross fading of envelope information, Dynamic Time Warping to match the major signal features (pitch) and Signal Re-estimation to convert the morphed speech signal back into the acoustic waveform.
This report has been subdivided into seven chapters. The second chapter gives an idea of the various processes involved in this project and implimentation in a concise manner. A thorough analysis of the procedure used to accomplish morphing and the necessary theory involved is presented in an uncomplicated manner in the third chapter. Processes like pre processing, cepstral analysis, dynamic time warping and signal re-estimation are vividly described with necessary diagrams. The fourth chapter gives a deep insight into the actual morphing process. The conversion of the morphed signal into an acoustic waveform is dealt in detail in the fifth chapter. Chapter six summarizes the whole morphing process with the help of a block diagram. Chapter seven lists the conclusions that have been drawn from this project and implimentation.
2. An Introspection of the Morphing Process
We had undertaken this work, which sounded quite challenging and interesting. We were eager to know whether a venture like speech morphing will be feasible using the cepstral approach. Processes like cepstral analysis and the re estimation of the morphed speech signal into an acoustic waveform involve much intricacy and challenge. Also this project and implimentation digs deep into the basics of digital signal processing or speech processing rather. This project and implimentation covers a lot of ground as far as speech processing is concerned.
Speech morphing can be achieved by transforming the signalâ„¢s representation from the acoustic waveform obtained by sampling of the analog signal, with which many people are familiar with, to another representation. To prepare the signal for the transformation, it is split into a number of 'frames' - sections of the waveform. The transformation is then applied to each frame of the signal. This provides another way of viewing the signal information. The new representation (said to be in the frequency domain) describes the average energy present at each frequency band.
Further analysis enables two pieces of information to be obtained: pitch information and the overall envelope of the sound. A key element in the morphing is the manipulation of the pitch information. If two signals with different pitches were simply cross-faded it is highly likely that two separate sounds will be heard. This occurs because the signal will have two distinct pitches causing the auditory system to perceive two different objects. A successful morph must exhibit a smoothly changing pitch throughout. The pitch information of each sound is compared to provide the best match between the two signals' pitches. To do this match, the signals are stretched and compressed so that important sections of each signal match in time. The interpolation of the two sounds can then be performed which creates the intermediate sounds in the morph. The final stage is then to convert the frames back into a normal waveform.
However, after the morphing has been performed, the legacy of the earlier analysis becomes apparent. The conversion of the sound to a representation in which the pitch and spectral envelope can be separated loses some information. Therefore, this information has to be re-estimated for the morphed sound. This process obtains an acoustic waveform, which can then be stored or listened to.
.
Figure 2.1 Schematic block diagram of the speech morphing process
3. Morphing Process: A Comprehensive Analysis
The algorithm to be used is shown in the simplified block diagram given below. The algorithm contains a number of fundamental signal processing methods including sampling, the discrete Fourier transform and its inverse, cepstral analysis. However the main processes can be categorized as follows.
I. Preprocessing or representation conversion: This involves processes like signal acquisition in discrete form and windowing.
II. Cepstral analysis or Pitch and Envelope analysis: This process will extract the pitch and formant information in the speech signal.
III. Morphing which includes Warping and interpolation.
IV. Signal re-estimation.

Fig 3.1: Block diagram of the simplified speech morphing algorithm.
3.1 Acoustics of speech production
Speech production can be viewed as a filtering operation in which a sound source excites a vocal tract filter. The source may be periodic, resulting in voiced speech, or noisy and a periodic, causing unvoiced speech. As a periodic signal, voiced speech has a spectra consisting of harmonics of the fundamental frequency of the vocal cord vibration; this frequency often abbreviated as F0, is the physical aspect of the speech signal corresponding to the perceived pitch. Thus pitch refers to the fundamental frequency of the vocal cord vibrations or the resulting periodicity in the speech signal. This F0 can be determined either from the periodicity in the time domain or from the regularly spaced harmonics in the frequency domain.
The vocal tract can be modeled as an acoustic tube with resonances, called formants, and anti resonances. (The formants are abbreviated as F1, where F1 is the formant with the lowest center frequency.) Moving certain structures in the vocal tract alters the shape of the acoustic tube, which in turn changes its frequency response. The filter amplifies energy at and near formant frequencies, while attenuating energy around anti resonant frequencies between the formants.
The common method used to extract pitch and formant frequencies is the spectral analysis. This method views speech as the output of a liner, time-varying system (vocal tract) excited by either quasiperiodic pulses or random noise. Since the speech signal is the result of convolving excitation and vocal tract sample response, separating or deconvolving the two components can be used. In general, deconvolution of the two signals is impossible, but it works for speech, because the two signals have quite different spectral characteristics. The deconvolution process transforms a product of two signals into a sum of two signals. If the resulting summed signals are sufficiently different spectrally, they may be separated by linear filtering. Now we present a comprehensive analysis of each of the processes involved in morphing with the aid of block diagrams wherever necessary.
3.2 Preprocessing
This section shall introduce the major concepts associated with processing a speech signal and transforming it to the new required representation to affect the morph. This process takes place for each of the signals involved with the morph.
3.2.1 Signal Acquisition
Before any processing can begin, the sound signal that is created by some real-world process has to be ported to the computer by some method. This is called sampling. A fundamental aspect of a digital signal (in this case sound) is that it is based on processing sequences of samples. When a natural process, such as a musical instrument, produces sound the signal produced is analog (continuous-time) because it is defined along a continuum of times. A discrete-time signal is represented by a sequence of numbers - the signal is only defined at discrete times. A digital signal is a special instance of a discrete-time signal - both time and amplitude are discrete. Each discrete representation of the signal is termed a sample.

Fig 3.2: Signal acquisition
The input speech signals are taken using MIC and CODEC. The analog speech signal is converted into the discrete form by the inbuilt CODEC TLC320AD535 present onboard and stored in the processor memory. This completes the signal acquisition phase.
3.2.2 Windowing
A DFT (Discrete Fourier Transformation) can only deal with a finite amount of information. Therefore, a long signal must be split up into a number of segments. These are called frames. Generally, speech signals are constantly changing and so the aim is to make the frame short enough to make the segment almost stationary and yet long enough to resolve consecutive pitch harmonics. Therefore, the length of such frames tends to be in the region of 25 to 75 milli seconds. There are a number of possible windows. A selection is:
The Hanning window
W (n) = 0.5 - 0.5 cos (2 p n /N) when 0<= n <= N,
=0 otherwise 3.1

Fig 3.3: Windowing
The frequency-domain spectrum of the Hamming window is much smoother than that of the rectangular window and is commonly used in spectral analysis. The windowing function splits the signal into time-weighted frames.
However, it is not enough to merely process contiguous frames. When the frames are put back together, modulation in the signal becomes evident due to the windowing function. As the weighting of the window is required, another means of overcoming the modulation must be found. A simple method is to use overlapping windows. To obtain a number of overlapping spectra, the window is shifted along the signal by a number of samples (no more than the window length) and the process is repeated. Simply put, it means that as one frame fades out, its successor fades in. It has the advantage that any discontinuities are smoothed out. However, it does increase the amount of processing required due to the increase in the number of frames produced.
3.3 Morphing
3.3.1 Matching and Warping: Background theory
Both signals will have a number of 'time-varying properties'. To create an effective morph, it is necessary to match one or more of these properties of each signal to those of the other signal in some way. The property of concern is the pitch of the signal - although other properties such as the amplitude could be used - and will have a number of features. It is almost certain that matching features do not occur at exactly the same point in each signal. Therefore, the feature must be moved to some point in between the position in the first sound and the second sound. In other words, to smoothly morph the pitch information, the pitch present in each signals needs to be matched and then the amplitude at each frequency cross-faded. To perform the pitch matching, a pitch contour for the entire signal is required. This is obtained by using the pitch peak location in each cepstral pitch slice.
Consider the simple case of two signals, each with two features occurring in different positions as shown in the figure below.

Figure 3.4: The match path between two signals with differently located features
The match path shows the amount of movement (or warping) required in order aligning corresponding features in time. Such a match path is obtained by Dynamic Time Warping (DTW).
3.3.2 Dynamic Time Warping
Speaker recognition and speech recognition are two important applications of speech processing. These applications are essentially pattern recognition problems, which is a large field in itself. Some Automatic Speech Recognition (ASR) systems employ time normalization. This is the process by which time-varying features within the words are brought into line. The current method is time-warping in which the time axis of the unknown word is non-uniformly distorted to match its features to those of the pattern word. The degree of discrepancy between the unknown word and the pattern “ the amount of warping required to match the two words - can be used directly as a distance measure. Such time-warping algorithm is usually implemented by dynamic programming and is known as Dynamic Time Warping. Dynamic Time Warping (DTW) is used to find the best match between the features of the two sounds - in this case, their pitch. To create a successful morph, major features, which occur at generally the same time in each signal, ought to remain fixed and intermediate features should be moved or interpolated. DTW enables a match path to be created. This shows how each element in one signal corresponds to each element in the second signal.
In order to understand DTW, two concepts need to be dealt with:
Features: The information in each signal has to be represented in some manner.
Distances: some form of metric has to be used in order to obtain a match path. There are two types:
1. Local: a computational difference between a feature of one signal and a feature of the other.
2. Global: the overall computational difference between an entire signal and another signal of possibly different length.
Feature vectors are the means by which the signal is represented and are created at regular intervals throughout the signal. In this use of DTW, a path between two pitch contours is required. Therefore, each feature vector will be a single value. In other uses of DTW, however, such feature vectors could be large arrays of values. Since the feature vectors could possibly have multiple elements, a means of calculating the local distance is required. The distance measure between two feature vectors is calculated using the Euclidean distance metric. Therefore the local distance between feature vector x of signal 1 and feature vector y of signal 2 is given by,

As the pitch contours are single value feature vectors, this simplifies to,


The global distance is the overall difference between the two signals. Audio is a time- dependent process. For example, two audio sequences may have different durations and two sequences of the sound with the same duration are likely to differ in the middle due to differences in sound production rate. Therefore, to produce a global distance measure, time alignment must be performed - the matching of similar features and the stretching and compressing, in time, of others. Instead of considering every possible match path which would be very inefficient, a number of constraints are imposed upon the matching process.
3.3.3 The DTW Algorithm
The basic DTW algorithm is symmetrical - in other words, every frame in signals must be used. The constraints placed upon the matching process are:
¢ Matching paths cannot go backwards in time;
¢ Every frame in each signal must be used in a matching path;
¢ Local distance scores are combined by adding to give a global distance.
If D (i,j) is the global distance up to (i,j) and the local distance at (i,j) is given by d(i,j)

Computationally, the above equation is already in a form that could be recursively programmed. However, unless the language is optimized for recursion, this method can be slow even for relatively small pattern sizes. Another method, which is both quicker and requires less memory storage, uses two nested for loops. This method only needs two arrays that hold adjacent columns of the time-time matrix. In the following explanation, it is assumed that the array notation is of the form 0¦N-1 for an array of length N.
The only directions in which the match path can move when at (i, j) in the time-time matrix are given in figure 3.8 below.

Figure 3.5: Time “Time matrix
The three possible directions in which the best match path may move from cell (i, j) in symmetric DTW.

Figure 3.6: Minimum cost path
The cells at (i,j) and (i,0) have different possible originator cells. The path to (i, 0) can only originate from (i-1, 0). However, the path to (i,j) can originate from the three standard locations as shown in the figure 3.9 above.
The algorithm to find the least global cost is:
I. Calculate column 0 starting at the bottom most cell. The global cost to this cell is just its local cost. Then, the global cost for each successive cell is the local cost for that cell plus the global cost to the cell below it. This is called the predCol (predecessor column).
II. Calculate the global cost to the first cell of the next column (the curCol). This local cost for the cell plus the global cost to the bottom most cell of the previous column.
III. Calculate the global cost of the rest of the cells of curCol. For example, at (i,j) this is the local distance at (i,j) plus the minimum global cost at either (i-1,j), (i-1,j-1) or (i,j-1).
IV. curCol is assigned to predCol and repeat from step 2 until all columns have been calculated.
V. Global cost is the value stored in the top most cell of the last column.
However, in the case of audio morphing, it is not the minimum global distance itself, which is of interest but the path to achieve. In other words, a back trace array must be kept with entries in the array pointing to the preceding point in the path. Therefore, a second algorithm is required to extract the path.
The path has three different types of direction changes:
¢ Vertical
¢ Horizontal
¢ Diagonal
The back trace array will be of equal size to that of the time-time matrix. When the global distance to each cell, say (i,j), in the time-time matrix is calculated, its predecessor cell is known - it's the cell out of (i-1,j), (i-1,j-1) or (i,j-1) with the lowest global cost. Therefore, it is possible to record in the backtrace array the predecessor cell using the following notation (for the cell (i,j) ):
1) (i-1, j-1) -- Diagonal
2) (i-1, j) -- Horizontal
3) (i, j-1) -- Vertical

Fig 3.7: A sample back trace array with each cell containing a number, which represents the location of the predecessor cell in the lowest global path distance to that cell.
The path is calculated from the last position, in figure 3.10 above this would be (4, 4). The first cell in the path is denoted by a zero in the back trace array and is always the cell (0, 0). A final 2D array is required which gives a pair (signal1 vector, signal2 vector) for each step in the match path given a back trace array similar to that of figure 3.10 above.
The pseudo code is:
Store the back trace indices for the top right cell.
Obtain the value in that cell - current Val.
While current Val is not 0
If current Val is 1 then reduce both indices by 1
If current Val is 2 then reduce the signal 1 index by 1
If current Val is 3 then reduce the signal 2 index by 2
Store the new indices at the beginning of the 2D array
Obtain the value in that cell “ current Val
End.
Therefore, for the example in Figure above, the 2D array would be


Figure 3.8: The sample back trace array with the calculated path overlaid
At this stage, we now have the match path between the pitches of the two signals and each signal in the appropriate form for manipulation. The next stage is to then produce the final morphed signal.
4. Morphing Stage
Now we shall give a detailed account of how the morphing process is carried out. The overall aim in this section is to make the smooth transition from signal 1 to signal 2. This is partially accomplished by the 2D array of the match path provided by the DTW. At this stage, it was decided exactly what form the morph would take. The implementation chosen was to perform the morph in the duration of the longest signal. In other words, the final morphed speech signal would have the duration of the longest signal. In order to accomplish this, the 2D array is interpolated to provide the desired duration.
However, one problem still remains: the interpolated pitch of each morph slice. If no interpolation were to occur then this would be equivalent to the warped cross-fade which would still be likely to result in a sound with two pitches. Therefore, a pitch in- between those of the first and second signals must be created. The precise properties of this manufactured pitch peak are governed by how far through the morph the process is. At the beginning of the morph, the pitch peak will take on more characteristics of the signal 1 pitch peak - peak value and peak location - than the signal 2 peak. Towards the end of the morph, the peak will bear more resemblance to that of the signal 2 peaks. The variable l is used to control the balance between signal 1 and signal 2. At the beginning of the morph, l has the value 0 and upon completion, l has the value 1. Consider the example in Figure 4.6. This diagram shows a sample cepstral slice with the pitch peak area highlighted. Figure 4.7 shows another sample cepstral slice, again with the same information highlighted. To illustrate the morph process, these two cepstral slices shall be used.
There are three stages:
1. Combination of the envelope information;
2. Combination of the pitch information residual - the pitch information excluding the pitch peak;
3. Combination of the pitch peak information.
Figure 4.1: A sample cepstral slice with the three main areas of interest in the morphing process highlighted.
4.1 Combination of the envelope information

Figure 4.2: Cross fading of the formants.
We can say that that the best morphs are obtained when the envelope information is merely cross-faded, as opposed to employing any pre-warping of features, and so this approach is adopted here.In order to cross-fade any information in the cepstral domain, care has to be taken. Due to the properties of logarithms employed in the cepstral analysis stage, multiplication is transformed into addition. Therefore, if a cross-faded between the two envelopes were attempted, multiplication would in fact take place. Consequently, each envelope must be transformed back into the frequency domain (involving an inverse logarithm) before the cross-fade is performed. Once the envelopes have been successfully cross-faded according to the weighting determined by l, the morphed envelope is once again transformed back into the cepstral domain. This new cepstral slice forms the basis of the completed morph slice.
4.2 Combination of the pitch information residual

Figure 4.3: Cross fading of the Pitch information.
The pitch information residual is the pitch information section of the cepstral slice with the pitch peak also removed by liftering. To produce the morphed residual, it is combined in a similar way to that of the envelope information: no further matching is performed. It is simply transformed back into the frequency domain and cross-faded with respect to l. Once the cross-fade has been performed, it is again transformed into the cepstral domain. The information is now combined with the new morph cepstral slice (currently containing envelope information). The only remaining part to be morphed is the pitch peak area.
4.3 Combination of the Pitch peak information
As stated above, in order to produce a satisfying morph, it must have just one pitch. This means that the morph slice must have a pitch peak, which has characteristics of both signal 1 and signal 2. Therefore, an artificialâ„¢ peak needs to be generated to satisfy this requirement. The positions of the signal 1 and signal 2 pitch peaks are stored in an array (created during the pre-processing, above), which means that the desired pitch peak location can easily be calculated.
In order to manufacture the peak, the following process is performed,
I. Each pitch peak area is liftered from its respective slice. Although the alignment of the pitch peaks will not match with respect to the cepstral slices, the pitch peak areas are liftered in such a way as to align the peaks with respect to the liftered area (see Figure 4.8).
II. The two liftered cepstral slices are then transformed back into the frequency domain where they can be cross-faded with respect to l. The cross-fade is then transformed back into the cepstral domain.
III. The morphed pitch peak area is now placed at the appropriate point in the morph cepstral slice to complete the process.
The morphing process is now complete. The final series of morphed cepstral slices is transformed back in to the frequency domain. All that remains to be done is re-estimate the waveform.

5. Signal re-estimation
This is a vital part of the system and the time expended on it was well spent. As is described above, due to the signals being transformed into the cepstral domain, a magnitude function is used. This results in a loss of phase information in the representation of the data. Therefore, an algorithm to estimate a signal whose magnitude DFT is close to that of the processed magnitude DFT is required. The solution to this problem is explained below.
Let the windowing function used in the DFT be w (n) which is L points long and non-zero for 0 <= n<= L-1. Therefore, the windowed signal can be represented by

Where m is the window index, l is the signal sample index and S is the window shift.
Hence from the definition of the DFT, the windowed signal's N-point DFT is given by

As stated above, the morphing process shall produce a magnitude DFT. Let Yw (m,k) represent this magnitude DFT. Before investigating the signal estimation of the magnitude DFT, let us consider estimating a signal from Yw (m,k). The time-domain signal of Yw (m,k) is given by

However, due to the Yw (m,k) having been manufactured by some process, it is generally not a 'valid' DFT. The phrase not a 'valid' DFT means that there is not a specific signal whose DFT is given by Yw (m, k). So estimate a signal x (n) whose DFT Xw (m,k) is as close as possible to that of the manufactured DFT Yw (m,k) . The closeness of the two is represented by calculating the squared error between the estimated signal x (n) and that of the manufactured DFT. This error measurement can be represented as

This is the sum of all the errors in a windowed section between the estimated signal and the manufactured DFT's signal. These are then summed for all windows. The error measurement equation can be solved to find x (n) because the equation is in quadratic form. This gives:

This equation forms the basis of the algorithm to estimate a signal from the magnitude DFT Yw (m,k). In this iterative algorithm, the error between magnitude DFT of the estimated signal Xw (m,k) and the magnitude DFT Yw (m,k) produced by the morphing sequence is decreased by each iteration. Let xi (n) be the estimated x (n) after i iterations. xi+1 (n) is found by finding the DFT of xi (n) and replacing the magnitude of Xw i (m,k) with the magnitude DFT of the morph, Yw (m,k) and then using the above equation to calculate the signal.
The sequence of the algorithm is shown in the figure 5.1 below.

Figure 5.1: Signal re-estimation.
As can be seen from the algorithm, the re-estimation process requires a DFT of the previous iteration's signal estimation in order to obtain the pitch information for the current iteration. However, the first iteration has no previous estimation from which to obtain pitch information and so random noise is used as the pitch for the first iteration.
6. Summarized Block Diagram
The whole morphing process is summarized using the detailed block diagram shown below (figure 6.1).
7. Conclusions and Future scope
The approach we have adopted separates the sounds into two forms: spectral envelope information and pitch and voicing information. These can then be independently modified. The morph is generated by splitting each sound into two forms: a pitch representation and an envelope representation. The pitch peaks are then obtained from the pitch spectrograms to create a pitch contour for each sound. Dynamic Time Warping of these contours aligns the sounds with respect to their pitches. At each corresponding frame, the pitch, voicing and envelope information are separately morphed to produce a final morphed frame. These frames are then converted back into a time domain waveform using the signal re-estimation algorithm.
In order to reduce the number of cepstral slices to be processed, the window size and window shift were increased. However, the size of the window was still within the range to achieve the desired balance between frequency and time resolution. The quality of the morph is heavily influenced by the number of iterations used to re-estimate the sound. In re-estimation section, the algorithm was tested by re-estimating a sound from an unprocessed magnitude DFT. In other words, no information was removed -intentionally or not - by further manipulation. This meant that if a large number of iterations were used then an almost perfect signal could be obtained. In speech morphing, a large amount of manipulation of the signal takes place and some loss of quality is inevitable. Therefore, less iteration were required before the sound began to converge to a point at which further iterations made negligible difference.
The pitch contour extraction process is performed in a rather naïve manner. In order to smoothly morph the pitch information, the pitches of each signal need to be matched. To facilitate this, a pitch estimate for the entire signal is found “ a pitch contour. Dynamic Time Warping is then used to find the best match between the two pitch contours. In this work, the pitch contour is found from the cepstral domain. The position of the peak in each slice is found and these build up a pitch contour. Although the results are satisfactory, this method does not take into account two possibilities: The pitch may be absent or difficult to find in both frames; one frame may have a pitch but the other may not. Unlike visual morphing, speech morphing can separate different aspects of the sound into independent dimensions. Those dimensions are time, pitch and voicing, and spectral envelope.
There are a number of areas in which further work should be carried out in order to improve the technique described here and extend the field of speech morphing in general. The time required to generate a morph is dominated by the signal re-estimation process. Even a small number (for example, 2) of iterations takes a significant amount of time even to re-estimate signals of approximately one second duration. Although in speech morphing, an inevitable loss of quality due to manipulation occurs and so less iteration are required, an improved re-estimation algorithm is required.

A number of the processes, such as the matching and signal re-estimation are very unrefined and inefficient methods but do produce satisfactory morphs. Concentration on the issues described above for further work and extensions to the speech morphing principle ought to produce systems which create extremely convincing and satisfying speech morphs.
In this project and implimentation, only one type of morphing has been discussed - that in which the final morph has the same duration as the longest signal. Also we discuss the case of speech morphing in this project and implimentation. But the work can be extended to include audio sounds as well. The longest signal is compressed and the morph has the same duration as the shortest signal (the reverse of the approach described here). If one signal is significantly longer than the other, two possibilities arise. However, according to the eventual use of the morph, a number of other types could be produced:
1. If the longer signal is the 'target' - the sound one wishes to morph to - then the morph would be performed between the start signal and the target's corresponding section (of equal duration) with the remainder of the target's signal unaffected.
2. If the longer signal is the start signal then the morph would be performed over the duration of the shorter signal and the remainder of the start signal would be removed.
Further extension to this work to provide the above functionality would create a powerful and flexible morphing tool. Such a tool would allow the user to specify at which points a morph was to start and finish the properties of the morph and also the matching function. With the increased user interaction in the process, a Graphical User Interface could be designed and integrated to make the package more 'user-friendly'. Such an improvement would immediate visual feedback (which is lacking in the current implementation) and possibly step by step guidance. Finally, this work has used spectrograms as the pitch and voicing and spectral envelope representations. Although effective, further work ought to concentrate on new representations which enable further separation of information. For example, a new representation might allow the separation of the pitch and voicing.
Pitch is not the only time-varying property which can be used to morph between two sounds. If the underlying rhythm of a sound is important then this ought to be used as the matching function between the two sounds. A better approach still, may be to combine two or more matching functions together in order to achieve a more pleasing morph. The algorithm presented in this project and implimentation is prone to excessive stretching of the time axis in order to achieve a match between the two pitch contours. The use of a combined rhythm and pitch matching function could limit this unwanted warping.
Further, the weighting of each component in the matching function could be determined according to requirements allowing heavily rhythm-biased matches or heavily pitch-biased matches.
The Speech morphing concept can be extended to include audio sounds in general. This area offers many possible applications including sound synthesis. For example, there are two major methods for synthesizing musical notes. One is to digitally model the sound's physical source and provide a number of parameters in order to produce a synthetic note of the desired pitch. Another is to take two notes which bound the desired note and use the principles used in speech morphing to manufacture a note which contains the shared characteristics of the bounding notes but whose other properties have been altered to form a new note. The use of pitch manipulation within the algorithm also has an interesting potential use. In the interests of security, it is sometimes necessary for people to disguise the identity of their voice. An interesting way of doing this is to alter the pitch of the sound in real-time using sophisticated methods.

8. References
¢ audio4fun.com
¢ speechtechmag.com
¢ macmusic.org
¢ nillymoser.com
CONTENTS
1. INTRODUCTION
2. AN INTROSPECTION OF THE MORPHING PROCESS
3. MORPHING PROCESS: A COMPREHENSIVE ANALYSIS
3.1 Acoustics of speech production
3.2 Preprocessing
3.2.1 Signal Acquisition
3.2.2 Windowing
3.3 Morphing
3.3.1 Matching and Warping: Background theory
3.3.2 Dynamic Time Warping
3.3.3 The DTW Algorithm
4. MORPHING STAGE
4.1 Combination of the envelope information
4.2 Combination of the pitch information residual
4.3 Combination of the Pitch peak information
5. SIGNAL RE-ESTIMATION
6. SUMMARIZED BLOCK DIAGRAM
7. CONCLUSIONS AND FUTURE SCOPE
8. REFERENCES

ACKNOWLEDGEMENT
I extend my sincere thanks to Prof. P.V.Abdul Hameed, Head of the Department for providing me with the guidance and facilities for the Seminar.
I express my sincere gratitude to Seminar coordinator
Mr. Berly C.J, Staff in charge, for their cooperation and guidance for preparing and presenting this seminar and presentation.
I also extend my sincere thanks to all other faculty members of Electronics and Communication Department and my friends for their support and encouragement.
MARTIN K. GEORGE
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion
Reply
project report tiger
Active In SP
**

Posts: 1,062
Joined: Feb 2010
#2
01-03-2010, 12:14 AM


.doc   VOICE MORPHING.doc (Size: 38 KB / Downloads: 235)

VOICE MORPHING
ABSTRACT
This paper addresses the capability needed in telecommunication system to support mobile access to real-time sights and sounds of a complex environment defined as a virtual reality service (VRS) episode. The constant development of terminal and networking equipment are paving way for the provision of a VRS and the creation of VRS episodes. This paper describes a mobile VRS environment in general and the core architecture and describes the various entities employed to perform a VRS episode setup task. The proposed VRS architecture is in full harmony with the preceding generation of all-IP multimedia networks currently under study in the third generation partnership project and implimentation
INTRODUCTION
Voice morphing is a technology developed at the Los Alamos National Laboratory in New Mexico, USA by George Papcun and publicly demonstrated in 1999. Voice morphing enables speech patterns to be cloned and an accurate copy of a person's voice be made which can then say anything the operator wishes it to say, appearing in the voice of someone else. Voice morphing has tremendous possibilities in military psychological warfare and subversion, particularly in conjunction with the use of recorded telephone conversations as evidence in courts of law. An agency can use voice morphing to provide a fake confession or incriminating evidence appearing to be spoken by a suspect which in reality is fake. Voice morphing is a powerful battlefield weapon which can be used to provide fake orders to the enemy's troops, appearing to come from their own commanders. In 1990, the US department of defence considered using voice morphing to produce a propaganda recording of Iraqi president Saddam Hussein, which could then be distributed throughout the Arab world and Iraq to discredit the Iraqi leader.
Definition
Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals, while generating a smooth transition between them. Speech morphing is analogous to image morphing. In image morphing the in-between images all show one face smoothly changing its shape and texture until it turns into the target face. It is this feature that a speech morph should possess. One speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties.
INTROSPECTION OF THE MORPHING PROCESS
Speech morphing can be achieved by transforming the signal's representation from the acoustic waveform obtained by sampling of the analog signal, with which many people are familiar with, to another representation. To prepare the signal for the transformation, it is split into a number of 'frames' - sections of the waveform. The transformation is then applied to each frame of the signal. This provides another way of viewing the signal information. The new representation (said to be in the frequency domain) describes the average energy present at each frequency band. Further analysis enables two pieces of information to be obtained: pitch information and the overall envelope of the sound. A key element in the morphing is the manipulation of the pitch information. If two signals with different pitches were simply cross-faded it is highly likely that two separate sounds will be heard. This occurs because the signal will have two distinct pitches causing the auditory system to perceive two different objects. A successful morph must exhibit a smoothly changing pitch throughout.The pitch information of each sound is compared to provide the best match between the two signals' pitches. To do this match, the signals are stretched and compressed so that important sections of each signal match in time. The interpolation of the two sounds can then be performed which creates the intermediate sounds in the morph. The final stage is then to convert the frames back into a normal waveform.
Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals, while generating a smooth transition between them. Speech morphing is analogous to image morphing. In image morphing the in-between images all show one face smoothly changing its shape and texture until it turns into the target face. It is this feature that a speech morph should possess. One speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties.
The major properties of concern as far as a speech signal is concerned are its pitch and envelope information. These two reside in a convolved form.
INTROSPECTION OF THE MORPHING PROCESS
Speech morphing can be achieved by transforming the signal's representation from the acoustic waveform obtained by sampling of the analog signal, with which many people are familiar with, to another representation. To prepare the signal for the transformation, it is split into a number of 'frames' - sections of the waveform. The transformation is then applied to each frame of the signal. This provides another way of viewing the signal information. The new representation (said to be in the frequency domain) describes the average energy present at each frequency band.
Further analysis enables two pieces of information to be obtained: pitch information and the overall envelope of the sound. A key element in the morphing is the manipulation of the pitch information. If two signals with different pitches were simply cross-faded it is highly likely that two separate sounds will be heard. This occurs because the signal will have two distinct pitches causing the auditory system to perceive two different objects. A successful morph must exhibit a smoothly changing pitch throughout.
The pitch information of each sound is compared to provide the best match between the two signals' pitches. To do this match, the signals are stretched and compressed so that important sections of each signal match in time. The interpolation of the two sounds can then be performed which creates the intermediate sounds in the morph. The final stage is then to convert the frames back into a normal waveform.
Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals, while generating a smooth transition between them. Speech morphing is analogous to image morphing. In image morphing the in-between images all show one face smoothly changing its shape and texture until it turns into the target face. It is this feature that a speech morph should possess. One speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties.
The major properties of concern as far as a speech signal is concerned are its pitch and envelope information. These two reside in a convolved form in a speech signal. Hence some efficient method for extracting each of these is necessary. We have adopted an uncomplicated approach namely cepstral analysis to do the same. Pitch and formant information in each signal is extracted using the cepstral approach. Necessary processing to obtain the morphed speech signal include methods like Cross fading of envelope information, Dynamic Time Warping to match the major signal features (pitch) and Signal Re-estimation to convert the morphed speech signal back into the acoustic waveform.
INTROSPECTION OF THE MORPHING PROCESS
Speech morphing can be achieved by transforming the signal's representation from the acoustic waveform obtained by sampling of the analog signal, with which many people are familiar with, to another representation. To prepare the signal for the transformation, it is split into a number of 'frames' - sections of the waveform. The transformation is then applied to each frame of the signal. This provides another way of viewing the signal information. The new representation (said to be in the frequency domain) describes the average energy present at each frequency band.
Further analysis enables two pieces of information to be obtained: pitch information and the overall envelope of the sound. A key element in the morphing is the manipulation of the pitch information. If two signals with different pitches were simply cross-faded it is highly likely that two separate sounds will be heard. This occurs because the signal will have two distinct pitches causing the auditory system to perceive two different objects. A successful morph must exhibit a smoothly changing pitch throughout.
The pitch information of each sound is compared to provide the best match between the two signals' pitches. To do this match, the signals are stretched and compressed so that important sections of each signal match in time. The interpolation of the two sounds can then be performed which creates the intermediate sounds in the morph. The final stage is then to convert the frames back into a normal waveform.
Reply
seminar presentation
Active In SP
**

Posts: 582
Joined: Apr 2010
#3
21-05-2010, 11:58 PM

please read this also topicideashow-to-voice-morphing-full-report and topicideashow-to-voice-morphing--823 and topicideashow-to-voice-morphing-download-full-report-and-abstract for voice morphing full report
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion
Reply
computer science topics
Active In SP
**

Posts: 610
Joined: Jun 2010
#4
29-06-2010, 12:42 AM

VOICE MORPHING
Abstract
Voice Morphing ABSTRACT Voice morphing means the transition of one speech signal into another. Voice Morphing which is also referred to as voice transformation and voice conversion is a technique to modify a source speaker\'s speech utterance to sound as if it was spoken by a target speaker. Voice morphing is a technique for modifying a source speaker\'s speech to sound as if it was spoken by some designated target speaker. The core process in a voice morphing system is the transformation of the spectral envelope of the source speaker to match that of the target speaker and linear transformations estimated from time-aligned parallel training data are commonly used to achieve this. Speech morphing is analogous to image morphing. In image morphing the in-between images all show one face smoothly changing its shape and texture until it turns into the target face. It is this feature that a speech morph should possess. One speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties. The major properties of concern as far as a speech signal is concerned are its pitch and envelope information. These two reside in a convolved form in a speech signal. Hence some efficient method for extracting each of these is necessary. We have adopted an uncomplicated approach namely cepstral analysis to do the same. Pitch and formant information in each signal is extracted using the cepstral approach.
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion
Reply
seminar surveyer
Active In SP
**

Posts: 3,541
Joined: Sep 2010
#5
01-01-2011, 04:15 PM


.pdf   VOICE_MORPHING.pdf (Size: 189.72 KB / Downloads: 228)

INTRODUCTION
Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals, while generating a smooth transition between them. Speech morphing is analogous to image morphing. In image morphing the in-between images all show one face smoothly changing its shape and texture until it turns into the target face. It is this feature that a speech morph should possess. One speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties. The major properties of concern as far as a speech signal is concerned are its pitch and envelope information. These two reside in a convolved form in a speech signal. Hence some efficient method for extracting each of these is necessary. We have adopted an uncomplicated approach namely cepstral analysis to do the same. Pitch and formant information in each signal is extracted using the cepstral approach. Necessary processing to obtain the morphed speech signal include methods like Cross fading of envelope information, Dynamic Time Warping to match the major signal features (pitch) and Signal Re-estimation to convert the morphed speech signal back into the acoustic waveform.
Reply
seminar class
Active In SP
**

Posts: 5,361
Joined: Feb 2011
#6
26-02-2011, 02:35 PM

presented by:
Yogesh Kumar


.ppt   VOICE MORPHING 2003.ppt (Size: 2.1 MB / Downloads: 185)
Voice Morphing
INTRODUCTION

 Voice morphing is a technique for modifying a (source) speaker's speech to different (target) speaker.
 Voice morphing means the transition of one speech signal into another.
EXAMPLE
Online games lately provide a function that players can join them as one of the limited number of prepared characters. Many of such games use a voicing function with which users talk each other using the character’s voice instead of their own voices.
Morphing process: A Comprehensive Analysis
The main process can be categorized as follows
1) Representation conversion
2) Cepstral analysis
3) Morphing
4) Signal re-estimation
Morphing process : MORPHING
To create an effective morph, it is necessary to match one or more of these properties of each signal to those of the other signal in some way.
The match path shows the amount of movement required in order aligning corresponding features in time
Research Goals
To develop algorithms which can morph speech from one speaker to another with the following properties.
 High quality
 Cross language voice conversion
 The ability to operate with target voice training data ranging from a few seconds to tens of minutes.
APPLICATION
 Entertainment
 In Computer Gaming
 In Film Industry

Reply
seminar class
Active In SP
**

Posts: 5,361
Joined: Feb 2011
#7
22-03-2011, 12:11 PM


.doc   CONTENTS.doc (Size: 56 KB / Downloads: 58)
Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals, while generating a smooth transition between them. Speech morphing is analogous to image morphing. In image morphing the in-between images all show one face smoothly changing its shape and texture until it turns into the target face. It is this feature that a speech morph should possess. One speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties.
The major properties of concern as far as a speech signal is concerned are its pitch and envelope information. These two reside in a convolved form in a speech signal. Hence some efficient method for extracting each of these is necessary. We have adopted an uncomplicated approach namely cepstral analysis to do the same. Pitch and formant information in each signal is extracted using the cepstral approach. Necessary processing to obtain the morphed speech signal include methods like Cross fading of envelope information, Dynamic Time Warping to match the major signal features (pitch) and Signal Re-estimation to convert the morphed speech signal back into the acoustic waveform
INTROSPECTION OF THE MORPHING PROCESS
Speech morphing can be achieved by transforming the signal's representation from the acoustic waveform obtained by sampling of the analog signal, with which many people are familiar with, to another representation. To prepare the signal for the transformation, it is split into a number of 'frames' - sections of the waveform. The transformation is then applied to each frame of the signal. This provides another way of viewing the signal information. The new representation (said to be in the frequency domain) describes the average energy present at each frequency band
Further analysis enables two pieces of information to be obtained: pitch information and the overall envelope of the sound. A key element in the morphing is the manipulation of the pitch information. If two signals with different pitches were simply cross-faded it is highly likely that two separate sounds will be heard. This occurs because the signal will have two distinct pitches causing the auditory system to perceive two different objects. A successful morph must exhibit a smoothly changing pitch throughout.
The pitch information of each sound is compared to provide the best match between the two signals' pitches. To do this match, the signals are stretched and compressed so that important sections of each signal match in time. The interpolation of the two sounds can then be performed which creates the intermediate sounds in the morph. The final stage is then to convert the frames back into a normal waveform
ABSTRACT
Voice morphing means the transition of one speech signal into
another. Like image morphing, speech morphing aims to preserve the
shared characteristics of the starting and final signals, while
generating a smooth transition between them. Speech morphing is
analogous to image morphing. In image morphing the in-between images
all show one face smoothly changing its shape and texture until it
turns into the target face. It is this feature that a speech morph
should possess. One speech signal should smoothly change into
another, keeping the shared characteristics of the starting and
ending signals but smoothly changing the other properties.
The major properties of concern as far as a speech signal is
concerned are its pitch and envelope information. These two reside in
a convolved form in a speech signal. Hence some efficient method for
extracting each of these is necessary. We have adopted an
uncomplicated approach namely cepstral analysis to do the same. Pitch
and formant information in each signal is extracted using the
cepstral approach. Necessary processing to obtain the morphed speech
signal include methods like Cross fading of envelope information,
Dynamic Time Warping to match the major signal features (pitch) and
Signal Re-estimation to convert the morphed speech signal back into
the acoustic waveform.
INTRODUCTION
Voice morphing, which is also referred to as voice transformation and
voice conversion, is a technique for modifying a source speakerâ„¢s
speech to sound as if it was spoken by some designated target
speaker. There are many applications of voice morphing including
customizing voices for text to speech (TTS) systems, transforming
voice-overs in adverts and films to sound like that of a well-known
celebrity, and enhancing the speech of impaired speakers such as
laryngectomees. Two key requirements of many of these applications
are that firstly they should not rely on large amounts of parallel
training data where both speakers recite identical texts, and
secondly, the high audio quality of the source should be preserved in
the transformed speech. The core process in a voice morphing system
is the transformation of the spectral envelope of the source speaker
to match that of the target speaker and various approaches have been
proposed for doing this such as codebook mapping, formant mapping,
and linear transformations. Codebook mapping, however, typically
leads to discontinuities in the transformed speech. Although some
discontinuities can be resolved by some form of interpolation
technique , the conversion approach can still suffer from a lack of
robustness as well as degraded quality. On the other hand, formant
mapping is prone to formant tracking errors. Hence, transformation-
based approaches are now the most popular. In particular, the
continuous probabilistic transformation approach introduced by
Stylianou provides the baseline for modern systems. In this approach,
a Gaussian mixture model (GMM) is used to classify each incoming
speech frame, and a set of linear transformations weighted by the
continuous GMM probabilities are applied to give a smoothly varying
target output. The linear transformations are typically estimated
from time-aligned parallel training data using least mean squares.
More recently, Kain has proposed a variant of this method in which
the GMM classification is based on a joint density model. However,
like the original Stylianou approach, it still relies on parallel
training data. Although the requirement for parallel training data is
often acceptable, there are applications which require voice
transformation for nonparallel training data. Examples can be found
in the entertainment and media industries where recordings of unknown
speakers need to be transformed to sound like well-known
personalities. Further uses are envisaged in applications where the
provision of parallel data is impossible such as when the source and
target speaker speak different languages. Although interpolated
linear transforms are effective in transforming speaker identity, the
direct transformation of successive source speech frames to yield the
required target speech will result in a number artifacts. The reasons
for this are as follows. First, the reduced dimensionality of the
spectral vector used to represent the spectral envelope and the
averaging effect of the linear transformation result in formant
broadening and a loss of spectral detail. Second, unnatural phase
dispersion in the target speech can lead to audible artifacts and
this effect is aggravated when pitch and duration are modified.
Third, unvoiced sounds have very high variance and are typically not
transformed. However, in that case, residual voicing from the source
is carried over to the target speech resulting in a disconcerting
background whispering effect .To achieve high quality of voice
conversion, include a spectral refinement approach to compensate the
spectral distortion, a phase prediction method for natural phase
coupling and an unvoiced sounds transformation scheme. Each of these
techniques is assessed individually and the overall performance of
the complete solution evaluated using listening tests. Overall it is
found that the enhancements significantly improve
TRANSFORM BASED VOICE MORPHING SYSTEM
2.1 Overall Framework

Transform-based voice morphing technology converts the speaker
identity by modifying the parameters of an acoustic representation of
the speech signal. It normally includes two parts, the training
procedure and the transformation procedure. The training procedure
operates on examples of speech from the source and the target
speakers. The input speech examples are first analyzed to extract the
spectral parameters that represent the speaker identity. Usually
these parameters encode the short-term acoustic features, such as the
spectrum shape and the formant structure. After the feature
extraction, a conversion function is trained to capture the
relationship between the source parameters and the corresponding
target parameters. In the transformation procedure, the new spectral
parameters are obtained by applying the trained conversion functions
to the source parameters. Finally, the morphed speech is synthesized
from the converted parameters. There are three interdependent issues
that must be decided before building a voice morphing system. First,
a mathematical model must be chosen which allows the speech signal to
be manipulated and regenerated with minimum distortion. Previous
research suggests that the sinusoidal model is a good candidate
since, in principle at least, this model can support modifications to
both the prosody and the spectral characteristics of the source
signal without inducing significant artifacts However, in practice,
conversion quality is always compromised by phase incoherency in the
regenerated signal, and to minimize this problem, a pitch synchronous
sinusoidal model is used in our system .Second, the acoustic features
which enable humans to identify speakers must be extracted and coded.
These features should be independent of the message and the
environment so that whatever and wherever the source speaker speaks,
his/her voice characteristics can be successfully transformed to
sound like the target speaker. Clearly the changes applied to these
features must be capable of straightforward realization by the speech
model. Third, the type of conversion function and the method of
training and applying the conversion function must be decided
2.2 Spectral Parameters
As indicated above, the overall shape of the spectral envelope
provides an effective representation of the vocal tract
characteristics of the speaker and the formant structure of voiced
sounds. Generally, there are several ways to estimate the spectral
envelope,such as using linear predictive coding (LPC) , cepstral
coefficients, and line spectral frequencies (LSF). The main steps in
estimating the LSF envelope for each speech frame are as follows.
1. Use the amplitudes of the harmonicsdetermined by the pitch
synchronous sinusoidal model to represent the magnitude spectrum.K is
determined by the fundamental frequency , its value can typically
range from 50 to 200.
2. Resample the magnitude spectrum nonuniformly according to the
bark scale frequency warping using cubic spline interpolation.
3. Compute the LPC coefficients by applying the Levinson- Durbin
algorithm to the autocorrelation sequence of the warped power
spectrum.
4. Convert the LPC coefficients to LSF.
5. In order to maintain adequate encoding of the formant
structure,LSF spectral vectors with an order of p=15 were used
throughout our voice conversion experiments.
Reply
seminar class
Active In SP
**

Posts: 5,361
Joined: Feb 2011
#8
20-04-2011, 09:39 AM

Presented By
SNEHA G GAJBHIYE


.doc   New_Microsoft_Word_Document (1).doc (Size: 95 KB / Downloads: 62)
ABSTRACT
A supply chain is a network of facilities and distribution options that performs the functions of procurement of materials, transformation of these materials into intermediate and finished products, and the distribution of these finished products to customers. Supply chains exist in both service and manufacturing organizations, although the complexity of the chain may vary greatly from industry to industry and firm to firm. Supply chain management (SCM) is the term used to describe the management of the flow of materials, information, and funds across the entire supply chain, from suppliers to component producers to final assemblers to distribution (warehouses and retailers), and ultimately to the consumer .The SCM consists basically 3 types of flows as given below:
Product Flow: It includes the movements of goods from manufacturer to customer and raw material from supplier to manufacturer.
Information Flow: The Information flow involves transmitting order and updating the status of delivery. Financial Flow: The financial flow consists of credit terms, payments, schedules & consignment & title ownership arrangements. In fact, it often includes after-sales service and returns or recycling. SCM typically involves coordination of information and materials among multiple firms. Firms are increasingly thinking in terms of competing as part of a supply chain against other supply chains, rather than as a single firm against other individual firms. Also, as firms successfully streamline their own operations, the next opportunity for improvement is through better coordination with the suppliers and customers. The costs of poor coordination can be extremely high. With the recent explosion of inexpensive information technology, it seems only natural that business would become more supply chain focused. However, while technology is clearly an enabler of integration, it alone can not explain the radical organizational changes in both individual firms and whole industries. Changes in both technology and management theory set the
INTRODUCTION
Voice morphing which is also referred to as voice transformation and voice conversion is a technique for modifying a source speaker’s speech to sound as if it was spoken by some designated target speaker. There are many applications of voice morphing including customising voices for TTS systems, transforming voice-overs in adverts and films to sound like that of a well-known celebrity, and enhancing the speech of impaired speakers such as laryngectomees. Two key requirements of many of these applications are that firstly they should not rely on large amounts of parallel training data where both speakers recite identical texts, and secondly the high audio quality of the source should be preserved in the transformed speech.
The core process in a voice morphing system is the transformation of the spectral envelope of the source speaker to match that of the target speaker and various approaches have been proposed for doing this such as codebook mapping [1], [2], formant mapping [3] and linear transformations [4], [5], [6]. Codebook mapping, however, typically leads to discontinuities
in the transformed speech. Although some discontinuities can be resolved by some form of interpolation technique [2], the conversion approach can still suffer from a lack of robustness as well as degraded quality. On the other hand, formant mapping is prone to formant tracking errors. Hence, transformation-based approaches are now the most popular.
In particular, the continuous probabilistic transformation approach introduced by Stylianou et al. [4] provides the baseline for modern systems. In this approach, a Gaussian mixture model (GMM) is used to classify each incoming speech frame, and a set of linear transformations weighted by the continuous GMM probabilities are applied to give a smoothly
varying target output. The linear transformations are typically estimated from time-aligned parallel training data using least mean squares. More recently, Kain has proposed a variant of this method in which the GMM classification is based on a joint density model[5]. However, like the original Stylianou approach, it still relies on parallel training data. Although the
requirement for parallel training data is often acceptable, there are applications which require voice transformation for nonparallel training data. Examples can be found in the entertainment and media industries where recordings of unknown speakers need to be transformed to sound like well-known personalities. Further uses are envisaged in applications where the provision of parallel data is impossible such as when the
source and target speaker speak different languages. This paper begins by expressing the continuous probabilistic transform of Stylianou as a simple interpolated linear transform. Expressed in a compact form, this representation then leads straightforwardly to the realisation of the conventional training and conversion algorithms. In analogy to the transform-based adaptation methods used in recognition
[7], [8], the estimation of the interpolated transform is then extended to a maximum likelihood formulation which does not require that the source and training data be parallel. Although interpolated linear transforms are effective in transforming speaker identity, the direct transformation of successive source speech frames to yield the required target speech will result in a number artifacts. The reasons for this are as follows. Firstly, the reduced dimensionality of the spectral vector used to represent the spectral envelope and the averaging effect of the linear transformation result in formant broadening and a loss of spectral detail. Secondly, unnatural phase dispersion in the target speech can lead to audible artifacts and this effect is aggravated when pitch and duration are modified. Thirdly, unvoiced sounds have very high variance and are typically not transformed. However, in that case, residual voicing from the source is carried over to the target speech resulting in a disconcerting background whispering effect.
To achieve high quality of voice conversion, all these issues have to be taken into account and in this paper, we identify and present solutions for each of them. These include a spectral refinement approach to compensate the spectral distortion, a phase prediction method for natural phase coupling and an unvoiced sounds transformation scheme. Each of these techniques is assessed individually and the overall performance of the complete solution evaluated using listening tests. Overall it is found that the enhancements significantly improve speaker identification scores and perceived audio quality.
LITERATURE SURVEY
2.1.WHAT IS MORPHING

We hear the word morphing in day to day life. The word morphing stands for alteration or change. This means changing of the source to our desired target. We have heard of video morphing, which stands for alterting the vedio slides to suit our requirement. Audio morphing or Voice morphing is a technique for modifying a source speaker's speech to sound as if it was spoken by some designated target speaker. Most of the recent approaches to voice morphing apply a linear transformation to the spectral envelope and pitch scaling to modify the prosody.this techniques have revolutionized the entire entertainment world, business world ,security systems and sorry to say the criminal world.
2.2. WHAT IS VOICE MORPHING
Voice Morphing which is also referred to as voice transformation and voice conversion is a technique to modify a source speaker's speech utterance to sound as if it was spoken by a target speaker. There are many applications which may benefit from this sort of technology. For example, a TTS system with voice morphing technology integrated can produce many different voices. In cases where the speaker identity plays a key role, such as dubbing movies and TV-shows, the availability of high q uality voice morphing technology will be very valuable allowing the appropriate voice to be generated (maybe in different languages) without the original actors being present.
There are basically three inter-dependent issues that must be solved before building a voice morphing system. Firstly, it is important to develop a mathematical model to represent the speech signal so that the synthetic speech can be regenerated and prosody can be manipulated without artifacts. Secondly, the various acoustic cues which enable humans to identify speakers must be identified and extracted. Thirdly,
the type of conversion function and the method of training and applying the conversion function must be decided.
2.3. A DEMONSTRATION TABLE
Table below shows some examples of Voice Morphing Technology. The "Source Speech" column indicates the utterances of the source speaker, and the "Target Speech" column is the target speaker's utterances. The utterances in both these two columns are not included in the training data for the estimation of the conversion function. The next two columns, "Converted Speech 1" and "Converted Speech 2", are the results regenerated using the Voice Morphing technology. The difference between these two column is that the "Converted Speech 1" applies the target prosody extracted from the target utterance, but the "Converted Speech 2" still applies the original prosody of the source utterances. The reason to convert with different prosody is for the evaluation of prosody influence on speaker identification.
Reply
smart paper boy
Active In SP
**

Posts: 2,053
Joined: Jun 2011
#9
18-07-2011, 11:14 AM


.pdf   voice_morphing.pdf (Size: 284.5 KB / Downloads: 91)
1. INTRODUCTION
Voice morphing means the transition of one speech signal
into another. Like image morphing, speech morphing aims to preserve
the shared characteristics of the starting and final signals, while
generating a smooth transition between them. Speech morphing is
analogous to image morphing. In image morphing the in-between
images all show one face smoothly changing its shape and texture until
it turns into the target face. It is this feature that a speech morph should
possess. One speech signal should smoothly change into another,
keeping the shared characteristics of the starting and ending signals but
smoothly changing the other properties. The major properties of
concern as far as a speech signal is concerned are its pitch and
envelope information. These two reside in a convolved form in a
speech signal. Hence some efficient method for extracting each of
these is necessary. We have adopted an uncomplicated approach
namely cepstral analysis to do the same. Pitch and formant information
in each signal is extracted using the cepstral approach. Necessary
processing to obtain the morphed speech signal include methods like
Cross fading of envelope information, Dynamic Time Warping to
match the major signal features (pitch) and Signal Re-estimation to
convert the morphed speech signal back into the acoustic waveform.
Voice Morphing bestneo.com
2. AN INTROSPECTION OF THE MORPHING
PROCESS

Speech morphing can be achieved by transforming the
signal’s representation from the acoustic waveform obtained by
sampling of the analog signal, with which many people are familiar
with, to another representation. To prepare the signal for the
transformation, it is split into a number of 'frames' - sections of the
waveform. The transformation is then applied to each frame of the
signal. This provides another way of viewing the signal information.
The new representation (said to be in the frequency domain) describes
the average energy present at each frequency band.
Further analysis enables two pieces of information to be
obtained: pitch information and the overall envelope of the sound. A
key element in the morphing is the manipulation of the pitch
information. If two signals with different pitches were simply crossfaded
it is highly likely that two separate sounds will be heard. This
occurs because the signal will have two distinct pitches causing the
auditory system to perceive two different objects. A successful morph
must exhibit a smoothly changing pitch throughout. The pitch
information of each sound is compared to provide the best match
between the two signals' pitches. To do this match, the signals are
stretched and compressed so that important sections of each signal
match in time. The interpolation of the two sounds can then be
performed which creates the intermediate sounds in the morph. The
final stage is then to convert the frames back into a normal waveform.
Reply
seminar flower
Super Moderator
******

Posts: 10,120
Joined: Apr 2012
#10
17-09-2012, 03:07 PM

VOICE MORPHING


.doc   VOICE MORPHING.doc (Size: 38 KB / Downloads: 14)

ABSTRACT

This paper addresses the capability needed in telecommunication system to support mobile access to real-time sights and sounds of a complex environment defined as a virtual reality service (VRS) episode. The constant development of terminal and networking equipment are paving way for the provision of a VRS and the creation of VRS episodes. This paper describes a mobile VRS environment in general and the core architecture and describes the various entities employed to perform a VRS episode setup task. The proposed VRS architecture is in full harmony with the preceding generation of all-IP multimedia networks currently under study in the third generation partnership project and implimentation

INTRODUCTION

Voice morphing is a technology developed at the Los Alamos National Laboratory in New Mexico, USA by George Papcun and publicly demonstrated in 1999. Voice morphing enables speech patterns to be cloned and an accurate copy of a person's voice be made which can then say anything the operator wishes it to say, appearing in the voice of someone else. Voice morphing has tremendous possibilities in military psychological warfare and subversion, particularly in conjunction with the use of recorded telephone conversations as evidence in courts of law. An agency can use voice morphing to provide a fake confession or incriminating evidence appearing to be spoken by a suspect which in reality is fake. Voice morphing is a powerful battlefield weapon which can be used to provide fake orders to the enemy's troops, appearing to come from their own commanders. In 1990, the US department of defence considered using voice morphing to produce a propaganda recording of Iraqi president Saddam Hussein, which could then be distributed throughout the Arab world and Iraq to discredit the Iraqi leader.
Definition
Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals, while generating a smooth transition between them. Speech morphing is analogous to image morphing. In image morphing the in-between images all show one face smoothly changing its shape and texture until it turns into the target face. It is this feature that a speech morph should possess. One speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties.

INTROSPECTION OF THE MORPHING PROCESS

Speech morphing can be achieved by transforming the signal's representation from the acoustic waveform obtained by sampling of the analog signal, with which many people are familiar with, to another representation. To prepare the signal for the transformation, it is split into a number of 'frames' - sections of the waveform. The transformation is then applied to each frame of the signal. This provides another way of viewing the signal information. The new representation (said to be in the frequency domain) describes the average energy present at each frequency band. Further analysis enables two pieces of information to be obtained: pitch information and the overall envelope of the sound. A key element in the morphing is the manipulation of the pitch information. If two signals with different pitches were simply cross-faded it is highly likely that two separate sounds will be heard. This occurs because the signal will have two distinct pitches causing the auditory system to perceive two different objects. A successful morph must exhibit a smoothly changing pitch throughout.The pitch information of each sound is compared to provide the best match between the two signals' pitches. To do this match, the signals are stretched and compressed so that important sections of each signal match in time. The interpolation of the two sounds can then be performed which creates the intermediate sounds in the morph. The final stage is then to convert the frames back into a normal waveform.
Reply
Guest
Thinking To Register

 
#11
23-09-2012, 12:35 PM

where are the figures??? please reply....its urgent
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page

Quick Reply
Message
Type your reply to this message here.


Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  voice operated intelligent fire extinguisher vehicle jaseelati 0 302 22-01-2015, 04:20 PM
Last Post: jaseelati
  high voltage fuse blown indicator with voice based announcement system jaseelati 0 334 15-01-2015, 03:58 PM
Last Post: jaseelati
  voice applications in cdma system ppt jaseelati 0 326 07-01-2015, 04:43 PM
Last Post: jaseelati
  witricity full report project report tiger 28 38,235 30-08-2014, 02:26 AM
Last Post: radiopodarok.ru
  ACCIDENT PREVENTION USING WIRELESS COMMUNICATION full report computer science topics 5 7,612 17-04-2014, 11:07 AM
Last Post: seminar project topic
  silicon on plastic full report computer science technology 2 2,972 13-04-2014, 10:34 PM
Last Post: 101101
  Automatic Emergency Light full report seminar class 7 17,620 08-03-2014, 02:28 PM
Last Post: seminar project topic
  ACD-Anti Collision Device full report seminar presentation 11 18,242 10-01-2014, 03:20 PM
Last Post: seminar project topic
  AUTOMATED VOICE BASED HOME NAVIGATION SYSTEM FOR THE ELDERLY seminar ideas 3 1,526 10-01-2014, 02:36 PM
Last Post: seminar project topic
  wireless charger full report project topics 21 18,598 10-01-2014, 12:58 PM
Last Post: seminar project topic