About the Inventor

Our Dedication, Efforts and Patience were the Secrets and the Tools of our Impressive Success

Jaber Technology JFFT > - JFFT Engin - JFFT Module - Parallel Computing - Address Generator & Control Unit

JFFT Engin

Digital Signal Processing

The development of digital signal processing dates from the 1960's with the use of mainframe digital computers for number-crunching applications such as the Fast Fourier Transform (FFT), which allows the frequency spectrum of a signal to be computed rapidly. These techniques were not widely used at that time, because suitable computing equipment was available only in universities and other scientific research institutions.

DSP technology is nowadays commonplace in such devices as mobile phones, multimedia computers, video recorders, CD players, hard disc drive controllers and modems, fax machines and will soon replace analog circuitry in TV sets and telephones, signal compression and decompression such as in CD systems, in digital cellular phones to allow a greater number of calls to be handled simultaneously within each local "cell", telecommunications, computers, consumer electronics, automotive, industrial controls, GPS, medical instrumentation and defence/aerospace applications, speech synthesizer, high-speed modem chip set, TV set-top box chip set, MPEG encoders/decoders etc.

Over the last 40 years, different algorithms have been proposed to compute the FFT that stands at the core of all DSP operations on IC chips. In order to improve the capacity of the processors to handle a large flow of data in real time, all algorithms aim at reducing the computational (reduction in the number of serial multiplications) and communications loads, hence a reduction in total amount of time it takes to generate the results.

FFT Invention

The conceptual key to this invention is the formulation of the radix-r FFT as composed of butterflies with identical structures and a systematic means of accessing the corresponding multiplier coefficients. This enables the design of a processing element (PE) which utilizes r complex stage, butterfly, element) to the addresses of the multiplier coefficients needed. For a single-processor environment, this type of PE would result in a decrease multipliers in parallel to implement each of the butterfly computations. There is a simple mapping from the three indices (FFT in time delay for the complete FFT by a factor of O(r). Trivial multiplication encountered during the execution of particular butterflies may be avoided by simple checks on the coefficient addresses. Avoiding trivial multiplications reduces the computational load of particular butterflies but would not be advantageous (in terms of decreasing time delay for the complete FFT) in situations where multiple PE's are being executed in parallel on different processors.

Radix-4 Engine.

"Butterfly-processing element for efficient Fast Fourier Transform method and apparatus", US patent no. US-6751643.

Each of the complex multiplier/complex adder in the JDSPE (Jaber DSP Engine) could be implemented in parallel where the complex multiplier/Mixer mega-function can multiply two complex numbers or mix two signals in which the input width, output width, and processing latency of the complex multiplier /mixer could be customized.

Complex Multiplier/Mixer With Four Parallel Multipliers

Complex Mixer

Since the multipliers are in general more costly in hardware implementation, an alternate model of the complex multiplier is illustrated below.

Complex Multiplier/Mixer With Three Parallel Multipliers


JFFT Module >


JABER * Wireless/Data/Image/Video-Encryption-Compression * is a powerful new high-tech digital information management software platform that provides dramatic capabilities for protecting and reducing the storage and transmission requirements of digital information * text/data, images and video * over a wide number of platforms, applications, and media.

JABER integrates into a single package the world's highest compression and encryption technologies, secure dual database architecture capabilities, and a multitude of other features all of which enable it to greatly surpass competitive technologies in performance, function, and versatility.

Based upon the unique integration of artificial intelligence, neural networks and various proprietary technologies, JABER offers solutions to telecommunications, computer, broadcast and numerous other industries never before available. Due to its versatility and capabilities, this technology can be employed in virtually every application that handles, stores, transmits or utilizes digital information. Most importantly, JaberTech is positioned to take this new technology to market immediately both to general business customers as a solution to data security and transmission and to specialized customers such as healthcare companies as their solution to data management and privacy.

The FFT Module

The second aspect of the FFT invention is that the Jaber PE's are also useful in parallel multiprocessing environments. In essence, the precedence relations between the butterflies in the radix-r FFT are such that the execution of r butterflies in parallel is feasible during each FFT stage. If each butterfly is executed on a Jaber PE, it means that each of the r parallel processors would always be executing the same instruction simultaneously, which is very desirable for SIMD implementations on some of the latest DSP cards.

Radix-4 Module

"Butterfly-processing element for efficient Fast Fourier Transform method and apparatus", US patent no. US-6751643.


Parallel Computing >

The success of computational science to accurately describe and model the real world has helped to fuel the ever increasing demand for cheap computing power. Scientists are continually looking for ways to test the limits of theories, using high performance computing to allow them to simulate more realistic systems in greater detail. Parallel computing offers a way to tackle these problems in a cost effective manner.

One reason for this is economic. By making use of "off the shelf" components, parallel computers can offer higher performance at lower prices than machines which use specially developed processors. In addition, the inherent scalability of parallel computers allows for them to be upgraded as the need arises. Whereas serial architectures are upgraded by making the previous processors obsolete, parallel architectures can, in theory, be upgraded simply by adding more processors.
However, there exists another reason, fundamental physical law, which will ultimately limit the speed of single processors, irrespective of the economics. Movement of information forms the basis of a computer, but the speed of this movement is eventually limited by the speed of light. If instead the distance traveled by this information was reduced, eventually the need to avoid the uncertainties introduced by quantum mechanics would limit the separations of the paths along which the information could travel.

These two reasons of economics and physics, coupled with the inherent scalability of parallel computers, points to a future of high performance computing which is based in some way on the ideas of parallelism.


Parallel Multiprocessing for the Fast Fourier Transform

The computational of the fast Fourier transforms (FFTs) is the cornerstone of many super-computer applications. These include not only the common ones such as digital signal processing, speech recognition, image processing, and petroleum seismic analysis, but also other less obvious applications, such as in computational fluid dynamics, medical technology, multiple precision arithmetic and computational number theory. Computations worthy of a parallel computer generally fall into four categories:

1) one or a few very long 1-D FFTs.

2) many small or moderate-sized 1-D FFTs.

3) one or a few large 2-D FFTs.

4) one or a few large 3-D FFTs.
The most significant problem in spectral analysis resides in its data's parallel multiprocessing. This difficulty arises in finding a feasible algorithm that could meet the following objectives:
1) To build an algorithm, which could be easily implemented on DSP cards of the newest technology?

2) The r parallel processors should execute a single instruction simultaneously.
3) Reduce the N O P (no operations) to its minimum value.
4) Reduce the communication load between the r processors to its minimum value.
5) Reduce the computational load to its minimum value.
6) No Pipeline break (or "pipeline stall"): the delay caused on a processor using pipelines when a transfer of control is taken (is absent).
7) Simplicity in design.

Parallel Implementation of the FFT

The last component in the picture will be triggred as soon as the first circuit part will be performing the last iteration.



Address Generator & Control Unit >

Address Generator

DSP Features and Data Control

Digital Signal Processing (DSP) is an engineering field that continues to extend its theoretical foundations and practical implications in the modern world. From the fulfillment of day-to-day needs, such as personal communications, to sophisticated systems for biomedical and tactical applications, DSP has a strong and ever-increasing participation in the areas of work that are revolutionizing our society.

Typical DSP operations require simple many additions and multiplications, which requires us to:
- Fetch two operands.
- Perform the addition or multiplication (usually both).
- Store the result or hold it for a repetition.
To fetch the two operands in a single instruction cycle, we need to be able to make two memory accesses simultaneously. Actually, a little thought will show that since we also need to store the result - and to read the instruction itself - we really need more than two memory accesses per instruction cycle. Understanding how different aspects of the kernel can impact memory architecture and usage will allow for application fine-tuning and customizing. For this reason DSP processors usually support multiple memory accesses in the same instruction cycle. It is not possible to access two different memory addresses simultaneously over a single memory bus. There are two common methods to achieve multiple memory accesses per instruction cycle:
- Harvard architecture.
- Modified von Neumann architecture.

So, DSPs are typically used to input large amounts of data; perform mathematical transformation on that data and then output the resulting data all at very high rates. In a real time system, data flow is important to understand and control in order to achieve high performance. Analyzing the timing characteristics for accessing data and switching between data requestors can maximize bandwidth in a system. Since the CPU should only be used for sporadic (non-periodic) accesses to individual locations, it is preferable that the data flow should be controlled by an independent device; otherwise the system can incur performance degradation. Such peripheral devices, which can control data transfers between an I/O subsystem and a memory subsystem in the same manner that a processor can control such transfers, reduce CP interrupt latencies and leave precious DSP cycles free for other tasks leading to increased performance. Special channels were created, along with circuitry to control them, which allowed the transfer of information without the processor controlling every aspect of the transfer. This circuitry is normally part of the system chipset (A number of integrated circuits designed to perform one or more related functions) on the DSP board.

The Read/Write Address Generator:
The main objective of the Read/Write Address Generator, which is treated as a part of I/O system, is to provide a block of memory addresses in or from which the introduced butterfly’s input data or the processed butterfly’s output data is collected from or stored into the specific provided memory address locations.

Read/write FFT Address Generator Structure.

The Multiplier Coefficients Address Generator:

The main role of the coefficient address generator is to provide a block of memory addresses from which the multipliers coefficients are collected and fed to the butterfly’s multipliers input in order to be processed.

The Multipliers Coefficients Address Generator Structure.

The Control Unit:

The flowchart of the control unit is illustrated below, which is responsible in providing certain parameters to the DIT RAD (Reading Address Generator), the DIT twiddle factor address generator and the WAD (Writing Address Generator) is illustrated in the figure below. As shown in this figure, this complex process is implemented by mean of three simple reset able and programmable counters which help the control of the data flow of the input data by providing the right parameter to the DIT reading/coefficient address generator in order to provide the specific word of length r or series of r input data/coefficient addresses to the input of the butterfly PE or to provide a block of r addresses in which the butterfly's processed output data is stored.

"ADDRESS GENERATOR FOR FAST FOURIER TRANSFORM PROCESSOR", US patent application no. US-60-289302 and European patent application Serial no: PCT/US01/07602.




The Company That Offers a Unique DSP System Solutions By The Parallel Implementation of Its Innovative DSP Core Engines For The Third Millennium Ultra High Speed Applications