The complete structure of a modern specified by the IEEE 802.11a standard is shown in . It consists of a data scrambler, modulator, convolutional encoder, interleaver, 64point IFFT/FFF, demodulator, deinterleaver, Viterbi decoder, and descrambler. The standard specifies a data rate ranging from 6 to 54 Mb/s. Depending on the de d rate, the modulation scheme adopted can be binary phase shift keying (BPSKt, quaternary phase shift keying (QPSK), or quadrature amplitude modulation(QAM) with l6bit/subcarrier. Fourth generation wireless and mobile system are currently the focus of research and development. Broadband wireless system based on orthogonal frequency division multiplexing will allow packet based high data rate communication suitable for video transmission and mobile internet application. Considering this fact we proposed a data path architecture using dedicated hardwire for the baseband processor. The most computationally intensive part of such a high data rate system are the 64point inverse FFT in the transmit direction and the viterbi decoder in the receiver direction. Accordingly an appropriate design methodology for constructing them has to be chosen a) how much silicon area is needed b) how easily the particular architecture can be made flat for implementation in VLSI c) in actual implementation how many wire crossings and how many long wires carrying signals to remote parts of the design are necessary d) how small the power consumption can be. This paper describes a novel 64point FFT/IFFT processor which has been developed as part of a large research project and implimentation to develop a single chip wireless modem. The encoding rates supported in the standard are 1/2, 2/3, and 3/4. The bandwidth of the transmitted signal is 20 MHz and the OFDM symbol duration is 4 ?s including 0.8 ?s for a guard interval [1 ] Thus, in effect, FFT/TFFT has to be computed within 4 ? In general, the FFT implementations typically fall into one of the two categories: 1) methods based on direct Fourier transform [and 2) methods based on direct hardware implementations of established FFT signal flow graphs [8] [14]. A problem with these solutions is that the approach adopted on the algorithmic level typically takes little account of its implications at the architecture, data flow, or chip design levels. Thus, many of these designs [8][10] may be irregular, dominated by wiring, and may have heavy overheads in terms of data storage [15] In a complex system, deployment of such strategies may result in severe disadvantages, because of the tight timing constraints and implicit requirement of low power consumption. The conventional CooleyTukey radix2 FFT algorithm requires 192 complex butterfly operations for a 64point FFT computation. Considering that one FF1' has to be computed within 4?s, one butterfly operation has to be completed within 20,8 ns which results in 48 MHz clock frequency for a single butterfly architecture. The synthesis result for a radix2 butterfly unit (one complex multiplication and two complex additions) in 1HP O.25?m technology shows that it occupies 0.18mm area and dissipates 17 mW power at that frequency. On top of this butterfly unit, one needs memory to store the complex twiddle factors and complex intermediate data, serialtoparallel and paralleltoserial converters at the inputs and outputs, respectively, complicated addressing logic and control circuitry. Combining all these circuit modules it is expected that the power dissipation of the entire processor will be quite high. Moreover, the input data arrives at 20 MHz clock frequency, and thus, it is more appropriate to operate the FFT module at that frequency. In order to satisfy the time constraint at this frequency, one has to employ multiple butterfly units in parallel, which in turn increases the area and power dissipation. Alternatively, since most of the implementations of the IEEE 802.1 1a standard oversample the incoming data at 40 or 80 MHz, a single radix2 butterfly based FFT module should be operated at 80 MHz clock frequency (the next available frequency to the actually required frequency). This approach satisfies the timing constraint, but at the cost of high power consumption. In order to speed up the FFT computation, more advanced solutions have been proposed using an increase of the radix [15] [16]. These approaches result in increase of arithmetic complexity within the butterfly itself. The radix4 FFT algorithm is most popular and has the potential to satisfy the current need. However, a single radix4 butterfly requires three complex multiplications and eight complex additions. 