Bird’s-Eye-View (BEV) perception transforms multi-camera and sensor inputs into a unified top-down view, enabling autonomous systems to detect objects, track motion, and reason about their surroundings in real time. However, BEV models like BEVDet and BEVFormer are challenging to deploy due to high compute requirements, complex operator patterns, and sensitivity to graph optimizations and quantization. In collaboration with the Autoware Foundation, MulticoreWare optimized these models for efficient execution on edge devices, AI accelerators, and automotive SoCs enabling efficient, scalable deployment across edge platforms.
BEVDet utilizes the Lift-Splat-Shoot approach, converting multi-camera images into unified BEV representations and enabling accurate 3D detection. With MulticoreWare optimizations, it achieved real-time performance, reaching ~3 FPS in FP32 and ~5 FPS in FP16 on an RTX 2080 Ti,establishing a strong baseline for deployable perception systems.
BEVFormer enhanced this by incorporating spatial and temporal information, improving detection of moving or occluded objects and enabling velocity estimation. A complete C++ pipeline with TensorRT optimization delivered strong performance, including a NuScenes Detection Score of 0.478, mAP of 0.370, and ~90 ms latency in FP16.
Advancing BEV Perception for Real-World Deployment
Optimized BEVDet and BEVFormer enable real time multi camera 3D perception with temporal awareness, solving key challenges in scaling vision based autonomy.
MulticoreWare optimizes and deploys complex AI models across edge and automotive platforms, helping teams reduce integration effort, shorten time to deployment, and deliver consistent high performance perception in production environments.

