MulticoreWare

Smart Health, Smart Cities & Industry 4.0

Designing Ultra-Low-Power Vision Pipelines on Neuromorphic Hardware

Building Real-Time Elderly Assistance with Neuromorphic hardware

December 10, 2025

 

Author Reshi Krish is a software engineer in the Platforms and Compilers Technical Unit at MulticoreWare, focused on building ultra-efficient AI pipelines for resource-constrained platforms. She specializes in optimizing and deploying AI across diverse hardware environments, leveraging techniques like quantization, pruning, and runtime optimization. Her work spans optimizing linear algebra libraries, embedded systems, and edge AI applications.

Introduction: Driving Innovation Beyond Power Constraints

As AI continues to advance at an unprecedented pace, its growing complexity often demands powerful hardware and high energy resources. However, when deploying AI solutions to the edge we look for ultra-efficient hardware which can run utilizing the least amount of energy possible and this introduces its own engineering challenges. ARM Cortex-M Microcontrollers (MCUs) and similar low-power processors have tight compute and memory limits, making optimizations like quantization, pruning, and lightweight runtimes critical for real-time performance. These challenges on the other hand are inspiring innovative solutions that make intelligence more accessible, efficient, and sustainable.

At MulticoreWare, we’ve been exploring multiple paths to push more intelligence onto these constrained devices. This exploration led us to neuromorphic AI architectures and specialized neuromorphic hardware which provides ultra-low-power inference by mimicking the brain’s event-driven processing. We saw the novelty of this framework and aimed to combine this with our deep MCU experience for opening new ways to deliver always-on AI across medical, smart home, and industrial segments.

Designing for Neuromorphic Hardware

The neuromorphic AI framework we had identified utilized a novel type of neural networks Temporal Event-based Neural Networks (TENNs). TENNs employs a state-space architecture that processes events dynamically rather than at fixed intervals, skipping idle periods to minimize energy and memory usage. This design enables real-time inference on milliwatts of power, making it ideal for edge deployments.

Developing models for neuromorphic AI requires more than porting existing architectures. The framework which we have utilised mandates full INT8 quantization and adherence to strict architectural constraints. Only a limited set of layers is supported, and models must follow rigid sequences for compatibility. These restrictions often necessitate significant redesigns, including modification of model architecture, replacing unsupported activations (e.g., LeakyReLU → ReLU) and simplifying branched topologies. Many deep learning features like multi-input/output models are also not supported, requiring developers to implement workarounds or redesign models entirely.

In short, building for neuromorphic acceleration means starting from the ground up balancing accuracy, efficiency, and strict design rules to unlock the promise of real-time, ultra-low-power AI at the edge.

Engineering Real-Time Elderly Assistance on the Edge

To demonstrate the potential of neuromorphic AI, we developed a computer vision based elderly assistance system capable of detecting critical human activities such as sitting, walking, lying down, or falling all in real time running on extremely low power hardware.

The goal was simple yet ambitious:

To deliver a fully on-device, low-power AI pipeline that continuously monitors and interprets human actions while maintaining user privacy and operational efficiency even in resource-limited environments.

However, due to frameworks architectural constraints, certain models such as pose estimation, could not be fully supported. To overcome this, we adopted a hybrid approach combining neuromorphic and conventional compute resources:

  • Neuromorphic Hardware: Executes object detection and activity classification using specialized models.
  • CPU (Tensorflow Lite): Handles pose estimation and intermediate feature extraction.

This design maintained functionality while ensuring power-efficient on the edge inference. Our modular vision pipeline leverages neuromorphic acceleration for detection and classification, with pose estimation being run on the host device.

Results: Intelligent, Low-Power Assistance at the Edge

In the above demo, we have deployed a complete vision pipeline running seamlessly on a Raspberry Pi with the neuromorphic accelerator attached at the PCIe slot, demonstrating portability and practical deployment validating real-time, low-power AI at the edge. This system continuously identifies and classifies user activities in real time, instantly detecting events such as falls or help gestures and triggering immediate alerts. All the processing required was achieved entirely at the edge ensuring privacy and responsiveness in safety-critical scenarios.

The neuromorphic architecture consumes only a fraction of the power required by conventional deep learning pipelines, while maintaining consistent inference speeds and robust performance.

Application Snapshot:
  • Ultra-low power consumption
  • Portable Raspberry Pi + neuromorphic hardware setup
  • End to end application running on the edge hardware

Our Playbook for Making Edge AI Truly Low-Power

MulticoreWare applies deep technical expertise across emerging low-power compute ecosystems, enabling AI to run efficiently on resource-constrained platforms. Our approach combines:

  • Application-Ready AI Workloads for low-powered MCU: Wake-word/KWS speech models, compact vision (person detection, classification), sensor-level anomaly detection, and TinyML NLP, all tuned for Arm Cortex-M and similar low-power embedded chips.
  • End-to-End SDK Enablement: Custom CMSIS-NN layers, a clean training-to-TFLite flow, and targeted quantization/pruning with memory profiling for smooth MCU deployment.
  • Compiler-Level & Runtime Optimization: Leveraging TFLite Micro and TVM-Micro to tune kernels, manage memory-tight tensor arenas, and build inference paths that remain stable within strict RAM, compute, and power budgets.

Broader MCU AI Applications: Industrial, Smart Home & Smart City

With healthcare leading the shift toward embedded-first AI, smart homes, industrial systems, and smart cities are rapidly following. Applications like quality inspection, predictive maintenance, robotic assistance, home security, and occupancy sensing increasingly require AI that runs directly on MCU-class, low-power edge processors.

MulticoreWare’s real-time inference framework for Arm Cortex-M devices supports this transition through highly optimised pipelines including quantisation, pruning, CMSIS-NN kernel tuning, and memory-tight execution paths tailored for constrained MCUs. This enables OEMs to deploy workloads such as wake-word spotting, compact vision models, and sensor-level anomaly detection, allowing even the smallest devices to run intelligent features without relying on external compute.

Conclusion: Redefining Intelligence Beyond the Cloud

The convergence of AI and embedded computing marks a defining moment in how intelligence is designed, deployed, and scaled. By enabling lightweight, power-efficient AI directly at the edge, MulticoreWare empowers customers across healthcare, industrial, and smart city domains to achieve faster response times, higher reliability, and reduced energy footprints.

As the boundary between compute and intelligence continues to fade, MulticoreWare’s Edge AI enablement across MCU and embedded platforms ensures that our partners stay ahead, building the foundation for a truly decentralised, real-time intelligence beyond the cloud.

To learn more about MulticoreWare’s edge AI initiatives, write to us at info@multicorewareinc.com.

Share Via

Explore More

No related categories found.

GET IN TOUCH

    Please note: Personal emails like Gmail, Hotmail, etc. are not accepted
    (Max 2000 characters)