Use gpu with particleshop

These contributions address the issues related to single-GPU implementations of the LBM and the optimisation of memory accesses, as well as multi-GPU implementations and the modelling of inter-GPU and internode communication. The present thesis consists of a collection of nine articles published in international journals and proceedings of international conferences (the last one being under review). For LBM, GPU implementations currently provide performance two orders of magnitude higher than a weakly optimised sequential CPU implementation.

Yet, due to numerous hardware induced constraints, GPU programming is quite complex and the possible benefits in performance depend strongly on the algorithmic nature of the targeted application. These massively parallel circuits provide up to now unrivalled performance at a rather moderate cost. The use of graphics processors to perform general purpose computations is increasingly widespread in high performance computing. From an algorithmic standpoint, the LBM is well-suited for parallel implementations. It is therefore an interesting alternative to the direct solving of the Navier-Stokes equations using classic numerical analysis. The lattice Boltzmann method, which is based on a discretised version of the Boltzmann equation, is an explicit approach offering numerous attractive features: accuracy, stability, ability to handle complex geometries, etc. The present research work is devoted to explore the potential of such a strategy. The joint use of innovative approaches such as the lattice Boltzmann method (LBM) and massively parallel computing devices such as graphics processing units (GPUs) could help to overcome these limits. Resorting to computational fluid dynamics seems therefore unavoidable, but the required computational effort is in general prohibitive. However, for the time being, the thermo-aeraulic effects are often taken into account through simplified or even empirical models, which fail to provide the expected accuracy. This makes our implementation a good candidate for low-power, visual object tracking using FPGA, especially in low-power, smart camera applications.With the advent of low-energy buildings, the need for accurate building performance simulations has significantly increased. With image size of \(320\times 240\), frame rates of 348 fps and 310 fps were achieved for single-object tracking of size \(17\times 17\) and \(33\times 33\) pixels, respectively, with a reasonable low-power consumption of 1.7 mW/fps on Zynq XC7Z020 (Zedboard) with an operating frequency of 69 MHz. We validated the particle filter-based visual tracking with video feed from a Petalinux-based system. Multi-target tracking is also demonstrated for four objects. These implementations can easily support different image sizes, object sizes, and number of particles, without modifying the complete architecture. We propose two designs and implementations, with one optimized for speed and other optimized for area. Here, several processing elements execute parallelly to handle large number of particles. In this paper, we propose a scalable implementation of particle filter algorithm for visual object tracking, using scalable interconnect such as network-on-chip on an FPGA platform. They handle non-linear model and non-Gaussian noise, but are computationally demanding. Particle filter algorithms have been successfully used in various visual object tracking applications.