MulticoreWare

Case Studies

Enabling PyTorch 2.0 Models on Next-Gen AI Accelerator

April 9, 2025

Client

The customer is a next-generation computing company specializing in AI hardware. Their mission is to provide cost-effective, scalable computing systems optimized for AI workloads. Their hardware ecosystem natively supports PyTorch and ONNX, enabling researchers and developers to deploy AI models with minimal friction.

Challenge

With the release of PyTorch 2.0, the customer sought to enable seamless execution of AI models on their custom AI accelerator while maintaining compatibility with standard PyTorch workflows. Certain challenges included:

  1. Lack of Native Support for Custom Hardware:
    • PyTorch 2.0 introduced torch.compile, which requires a backend that efficiently maps computations to hardware.
    • The customer’s AI accelerator did not have a dedicated PyTorch backend, making model execution inefficient or infeasible.
  2. Bridging PyTorch’s Computational Graph with Custom Hardware:
    • PyTorch 2.0 models rely on Aten IR before lowering to hardware-specific execution.
    • The customer needed to implement Aten IR transformations to map PyTorch operations to their hardware’s operations
  3. Minimizing Code Changes for Developers:
    • AI researchers and developers prefer out-of-the-box compatibility with PyTorch.
    • The goal was to allow execution on the AI accelerator without requiring developers to rewrite or refactor models significantly.

Solution

To enable seamless execution of PyTorch 2.0 models on the customer’s AI accelerator, a custom PyTorch backend was designed and implemented. This ensured that models could run efficiently on the hardware with minimal code modifications while maintaining PyTorch’s ease of use. The solution comprised the following key components:

1. Custom Backend
A dedicated PyTorch backend was developed to seamlessly integrate with torch.compile and translate PyTorch operations into hardware-specific operators. This backend enabled efficient model execution by leveraging the accelerator’s computational capabilities while maintaining compatibility with PyTorch’s standard workflows.

2. Aten IR Transformation Passes
Various torch aten ops were processed and lowered to ops in the customer’s operator library. Transformation passes take care of support for missing ops in various categry including unary, binary & reduction ops. Aten ops like LayerNorm, AvgPool, Softmax, expand, squeeze, slice, etc., were lowered in the transformation pass.

3. Data movement related passes
Several optimization passes were implemented   including data movement to and from the device, constant folding, memory usage analysis, and eviction strategies—each designed to enhance execution efficiency and resource utilization on the target hardware.

4. Validation & Accuracy Testing
To guarantee correctness, transformed operations were rigorously validated against PyTorch’s reference implementations. The PCC metric was used to measure numerical accuracy, ensuring minimal deviation from expected results.

5. Runtime Testing with Real-World Models
To evaluate real-world performance, the backend was tested using:

  • Torchvision models such as ResNet, MobileNet and YOLO, etc., for vision-based inference.
  • Large Language Models to assess execution efficiency for NLP tasks including GPT, Bloom, etc
  • End-to-end inference benchmarks to validate performance and correctness.  

This comprehensive testing approach ensured that the backend was robust, performant, and production-ready for deployment in AI applications.

Technology Stack

Solution Overview

To enable seamless execution of PyTorch 2.0 models on the customer’s AI accelerator, key improvements were made in operator support, backend integration, and real-world validation.

Support for 50+ PyTorch Core Aten IR Ops

Implemented transformation and execution logic for the below mentioned 50+ commonly used PyTorch operations, ensuring broad model compatibility with minimal intervention.

  • Mathematical & Activation Functions (ReLU, Softmax, Log, Exp)
  • Normalization & Pooling (LayerNorm, BatchNorm, AvgPool)
  • Tensor Manipulation & Reduction Ops (Reshape, Transpose, Sum, Mean)

Seamless Integration with torch.compile API and model benchmarking

Enabled developers to use PyTorch’s torch.compile() API for effortless deployment.

  • No manual modifications were required, allowing models to automatically optimize for the AI accelerator.
  • Validated performance and correctness of various of the implementation using various categories of models including ResNet, MobileNet, VGG, YOLO, Faster R-CNN, BERT, GPT, Falcon, etc.,
  • Each model was tested for accuracy, latency, and stability, ensuring reliable execution.

Business Impact

  • Faster AI Model Deployment – Enables seamless execution of PyTorch models on the customer’s AI accelerator, significantly reducing time-to-market for AI applications.
  • Increased Developer Adoption – Eliminates complex integration efforts, making it easier for AI developers to leverage the hardware, driving ecosystem growth.
  • Stronger Competitive Positioning – Enhances the value proposition of the AI accelerator, offering effortless PyTorch support and positioning it as a strong alternative to existing AI computing platforms.

Conclusion

By integrating a custom PyTorch 2.0 backend with their AI accelerator, the customer has significantly improved the accessibility and usability of their hardware. This effort not only enhances execution efficiency but also positions their AI ecosystem as an attractive solution for AI researchers, enterprises, and developers looking for scalable, high-performance AI computing solutions. The seamless onboarding experience ensures that more AI practitioners can leverage the customer’s hardware without needing to modify their existing PyTorch workflows, ultimately driving greater adoption and ecosystem expansion.

MulticoreWare showcased expertise in Compilers, Python, PyTorch, and AI Accelerators and our comprehensive approach ensured Performance parity and Stability. To learn more about our expertise or to discover how we can help your organization achieve innovative and high-performance results, please contact info@multicorewareinc.com.

Share Via

Explore More

Dec 15 2025

AI-Powered Dynamic Policy Management for Auto Healing Networks

Client The client is a global leader in network management software, delivering end-to-end network and service management solutions for enterprise, telecom, industrial, and data centre networks. Their platform manages a vast and diverse range of devices across enterprise, cloud, edge, and hybrid environments providing large-scale configuration, monitoring, and remediation capabilities. Problem Statement As networks grow  … Read more

Read more
Nov 27 2025

AI-Powered Actionable Troubleshooting for Next-Gen Laptops

Customer A leading technology company in consumer computing, known for pushing the boundaries of user experience through AI-driven innovation. Problem Statement The customer aimed to evolve its existing AI chat-based troubleshooting system. While the earlier version could answer user’s queries using manuals and system documentation, it could not take corrective actions. The next step was  … Read more

Read more
Nov 17 2025

AI Framework Unification for Next-Gen Multi-Accelerator Platforms

A leading OEM in consumer and enterprise computing, known for its innovation-driven product roadmap spanning laptops, workstations, and data center servers.

Read more

GET IN TOUCH

    Please note: Personal emails like Gmail, Hotmail, etc. are not accepted
    (Max 2000 characters)