Abstract | This paper describes a parallel implementation developed to improve the time performance of the Iterative Closest Point Algorithm. Within each iteration, the correspondence calculations are distributed among the processor resources.At the end of each iteration, the results of the correspondence determination are communicated back to a central processor and the current transformation is calculated. A number of additional techniques were developed that served to improve upon this basic scheme. Calculating the partial sums within each distributed resource made it unnecessary to transmit the correspondence values back to the central processor, which reduced the communication overhead, and improved time performance. Randomly distributing the points among the processor resources resulted in a better load balancing, which further improved time performance. We also found that thinning the image by randomly removing a certain percentage of the points did not improve the performance, when viewed as the progression of <em>mse</em> with time. The method was implemented and tested on a 22 node Beowulf class cluster. For a large image, linear performance improvements were obtained for up to 16 processors, while they held for up to 8 processors with a smaller image. |
---|