Parallel DFT Implementation and Benchmarking in Cluster Architecture
Date
Authors
Advisor
Publisher
Polytechnic University of Puerto Rico
Item Type
Article
- Total Views Total Views2
- Total Downloads Total Downloads0
Abstract
For the execution of very large DFT
(Discrete Fourier Transform), where the
implementation size is limited by the memory
available in a single processor, it is still convenient
to use the larger memory afforded by the use of
cluster architectures. In this work we did different
parallel MATLAB implementations of a onedimensional DFT through a two-dimensional DFT,
which was coded using the row-column algorithm.
One version of the code has client-based pre and
post-processing stages. The cluster master node
was used as the client computer. Since the pre and
post-processing involves matrix transpositions,
which could pose memory limitations for large data
sets, we also did an implementation that distributes
the data directly from disc to the cluster cores. This
second approach allowed us to quadruple the
largest length signal that we could tackle in the
cluster architecture. Largest core memory in the
nodes should allow even larger increases in signal
size. We benchmarked both implementations and
did scalability studies using up to 64 cores.
Key Terms - FFT, Large Signals, Parallel
Processing, Row-Column Algorithm
Description
Design Project Article for the Graduate Programs at Polytechnic University of Puerto Rico
Keywords
Citation
Vélez Rodríguez, W. (2013). Parallel DFT implementation and benchmarking in cluster architecture [Unpublished manuscript]. Graduate School, Polytechnic University of Puerto Rico.