Run BERT onnx model using nGraph

nGraph:

2 min readJul 20, 2020

nGraph is an acceleration framework for deep learning models just like TensorRT. While TensorRT can be run on NVIDIA-GPUs, nGraph allows us to run on any framework or device.

Checkout the git repo:
https://github.com/NervanaSystems/ngraph

This tutorial only covers model inference on CPU and NVIDIA GPUs. The INTEL GPU inference is not covered, but detailed tutorial for INTEL GPUs are covered in the official website

Installation:

For latest build options, checkout https://www.ngraph.ai/tutorials/onnx-tutorial

Quickstart-Installation

# apt update
# apt install -y protobuf-compiler libprotobuf-devpip install ngraph-core
pip install ngraph-onnx
pip install plaidml

After successful installation,

import onnx
from ngraph_onnx.onnx_importer.importer import import_onnx_model
import ngraph as ngonnx_protobuf = onnx.load("checkpoints/model.onnx")
ng_function = import_onnx_model(onnx_protobuf)
runtime = ng.runtime(backend_name='CPU')
ngraphmodel = runtime.computation(ng_function)
pred = ngraphmodel(input_ids, token_type_ids, attention_mask)
pred = torch.tensor(pred[0])
pred_output_softmax = nn.Softmax()(pred)

For better inference times and parsing, simplify the onnx graph using https://github.com/daquexian/onnx-simplifier

Comparision of inference times for BERT model

onnxruntime( GPU ): 0.67 sec
pytorch( GPU ): 0.87 sec
pytorch( CPU ): 2.71 sec
ngraph( CPU backend ): 2.49 sec with simplified onnx graph
TensorRT : 0.022 sec with simplified onnx graph

change the backend_name to ‘PlaidML’ which could run on NVIDIA GPUs. I think currently not many layers are supported by PlaidML backend. So could not run BERT model with PlaidML backend.

Hope it helps :)

You can refer to my TensorRT tutorial for faster inference

Convert Onnx BERT model to TensorRT

Pre-requisites:

medium.com

I apologize if I have left out any references from which I could have taken the code snippets from.

References:

https://www.ngraph.ai/tutorials/onnx-tutorial
https://azure.microsoft.com/en-in/blog/onnx-runtime-integration-with-nvidia-tensorrt-in-preview/
https://github.com/NervanaSystems/ngraph
https://github.com/daquexian/onnx-simplifier