Learn more about things that I am working on (or have worked in the past).
Keywords: CNN Quantisation, Embedded FPGAs, Design Space Exploration, Performance Modelling, Input-Dependent Computation, Adaptive Inference
->A novel two-stage FPGA-based accelerator for CNN classifiers, exploiting the fact that not all input require the level same numerical precision in computations to yield a confident prediction, without the need of model re-training.
->Comprises an excessively quantised low-precision unit (LPU) providing rapid classification predictions, followed by a confidence evaluation unit that determines which samples should be re-processed by a high-precision unit to restore accuracy, based on the LPU prediction confidence.
->Analytical performance modelling and Design Space Exploration is employed to parametrise a configurable hardware architecture yielding an tailored instance for the given CNN-FPGA pair, according to user-specified requirements in performance (throughput or latency) and accuracy.
Research Project at: iDSL, Imperial College London
Joint work with: S.Venieris, C.S.Bouganis
Keywords: SVD-based compression, prunning, anytime inference, Embedded FPGAs, Model-Hardware co-design, Self-driving Cars
-> A methodology combining iterative refinement (through low-rank approximation) and structured prunning, to conduct the most information-carrying calculations first in LSTM inference.
-> Provides a rapid approximation of the final output, which is iteratively refined as a function of latency budget, Implemented on autonomous navigation task for self-driving cars, enabling fast reaction.
-> Configurable FPGA-based hardware architecture, tailored to the target LSTM-FPGA pair, co-design with the approximate LSTM model.
Research Project at: iDSL, Imperial College London
Joint work with: S.Venieris, C.S.Bouganis
Keywords: Self-supervised Learning, Visual Navigation, Obstacle Avoidance, Spatio-temporal representations, Two-stream CNNs
-> Two-stream CNN, simultaneously processing current and previous frame of a UAV camera to extract spatio-temporal represntations, improving efficiency in autonomous navigation.
->Predicting distance-to-collision towards multiple directions through regression, enabling a custom motion planner to make more informed action decisions, avoiding collision in real-world environments.
-> Trained in a self-supervised manner, with a UAV performing safe autonomous navigation through the use of mounted distance sensors, also used as ground-thruth signals for training.
Research Project at: iDSL, Imperial College London
Joint work with: C.S.Bouganis
Keywords: Object Detection, Domain-specific models, Data-driven oprimisation, Altitude-aware region proposals, Embedded GPUs
-> Exploiting applictaion-specific information and prior domain knowledge, to optimise region-proposal-based object detectors for efficient altitude aware vehicle detection in UAVs.
-> Considering UAV flight altitude captured by sensors at runtime, to eliminate false positive candiate detections of vehicles on the ground (e.g. based on size/density), effectively reducing inference workload.
-> Exploiting light-weight road segmentation method for further performance optimisation, without sacrifising accuracy.
Research Project at: iDSL, Imperial College London and KIOS
Joint work with: C. Kyrkou and C.S. Bouganis
Keywords: Semantic Segmentation, Impainting, Occlusion Handling, RGB-D perception, Simultaneous Localisation and Mapping (SLAM)
(work in progress) - more details soon...
Research Project at: iDSL and Dyson Robotics Lab at Imperial College
Joint work with: A.Nicastro, S.Leutenegger, C.S.Bouganis
Keywords: survey, Automated toolflows, DNNs, FPGAs
-> Survey on Automated toolflows for mapping DNNs to FPGAs, along with a set of promising research areas that can bridge the gap between DL and hardware design communities.
-> From a deep learning practitioner's perspective, we extensive study the supported DL models, achived inference speed and applicability to visual scene understanding tasks, from latency-critical mobile systems to high-throughput cloud services.
-> From a harware engineer's perspective, we present a detailed analysis of architectural design choices, design space exploration methods and supported oprimisations on implementation (such as numerical precision).
Research Project at: iDSL, Imperial College London
Joint work with: S.Venieris and C.S.Bouganis
Keywords: Frequency domain filtering, collision detection, Admittance/Impedance Control, contact distinction
-> A novel contact distinction method is developed, monitoring externally applied forces/torques on a robot manipulator (arm) during physical human-robot interaction, in order todistinguish collisions from intended contacts.
-> The method is based on frequency component analysis, and adapts with respect to the desired dynamic behavior (inertia, damping and elasticity) of the admittance/impedance controller.
-> Collision trigger appropriate reaction from the robot, improving the safety of the operator during the interaction with the robot.
Research Project at: Robotics Group, University of Patras
Joint work with: F.Dimeas and N.Aspragathos
Keywords: FSM-based state-estimation, Adaptive PD Control, Mobile Robots
-> Finite State Machine -based state estimation for line following robots, able to detect line irregurarities or noisy sensor measurments on an array of iR reflectance sensors, enhancing the robot's situational awareness.
-> Based on the estimated state, the robot switches between a proposed variable-gain PD controller for line following and an open-loop controller handling special cases.
Research Project at: Robotics Group, University of Patras
Joint work with: A.Toumpa, F. Dimeas, N.Aspragathos (et al.)
Keywords: CUDA, Nvidia GPUs, QR-decomposition, Givens Rotations, gSpike, Parallel System Solvers
-> g-Spike: a parallel algorithm for solving general nonsymmetric tridiagonal systems for the GPU. The solver applies Givens rotations and QR factorization without pivoting. It also implements a low-rank modification strategy to compute the Spike DS decomposition even when the partitioning defines singular submatrices along the diagonal.
-> Numerical experiments with problems of high order indicate that g-Spike is competitive in runtime with existing GPU methods, and can provide acceptable results when other methods cannot be applied or fail.
Research Project at: HPC Lab, University of Patras
Joint work with: A.Sobczyk, E.Gallopoulos and A.Sameh