DeploySharp 使用 ONNX Runtime 部署 PP-OCR v4/v5 教程-编程阁

DeploySharp 使用 ONNX Runtime 部署 PP-OCR v4/v5 教程

本文详细介绍如何使用 DeploySharp 框架和 ONNX Runtime 推理引擎部署 PP-OCR v4/v5 模型，涵盖 CPU、CUDA、DML、TensorRT 等多种部署方式的完整指南。

一、ONNX Runtime 简介

1.1 什么是 ONNX Runtime

ONNX Runtime 是微软推出的高性能跨平台推理引擎，支持 ONNX 模型格式。它是目前最受欢迎的推理引擎之一，具有以下特点：

•跨平台：支持 Windows、Linux、macOS、Android、iOS 等
•多后端：支持 CPU、CUDA、TensorRT、OpenVINO、DirectML 等多种执行提供器
•高性能：经过深度优化，推理速度快
•易用性：简单的 API，快速集成

1.2 ONNX Runtime 的优势

优势	说明
跨平台	一套代码，多平台运行
多硬件支持	CPU、NVIDIA GPU、AMD GPU、Intel GPU 等
丰富的执行提供器	CPU、CUDA、TensorRT、DML、OpenVINO 等
易于集成	支持 C#、C++、Python 等多种语言
活跃社区	微软官方维护，持续更新

二、支持的后端对比

ONNX Runtime 支持多种执行提供器（Execution Provider），以下是各后端的对比：

执行提供器	支持设备	性能特点	适用场景
CPU	所有 CPU	性能中等，通用性强	无 GPU 环境，跨平台部署
CUDA	NVIDIA GPU	GPU 加速，性能好	NVIDIA 显卡，需要 CUDA 环境
TensorRT	NVIDIA GPU	GPU 加速 + TensorRT 优化，性能最佳	NVIDIA 显卡，追求极致性能
DML	多厂商 GPU（AMD/NVIDIA/Intel）	Windows 平台统一接口	Windows 平台，多品牌显卡
OpenVINO	Intel CPU/iGPU/GPU	Intel 硬件优化	Intel 硬件，Windows/Linux

三、环境准备

3.1 系统要求

组件	最低要求	推荐配置
操作系统	Windows 10/11, Linux	Windows 11
.NET 版本	.NET 6.0+	.NET 8.0
CPU	4核+	8核+
内存	8GB	16GB+
显卡（可选）	NVIDIA RTX 3060+	NVIDIA RTX 4070+

3.2 安装 ONNX Runtime NuGet 包

CPU 版本

dotnet add package Microsoft.ML.OnnxRuntime.Managed

CUDA 版本

dotnet add package Microsoft.ML.OnnxRuntime.Gpu.Windows

注意：CUDA 版本需要与系统安装的 CUDA 版本匹配：

• CUDA 11.x → OnnxRuntime.Gpu (旧版本)
• CUDA 12.x → OnnxRuntime.Gpu.Windows (新版本)

DML 版本

dotnet add package Microsoft.ML.OnnxRuntime.DirectML

TensorRT 版本

dotnet add package Microsoft.ML.OnnxRuntime.Gpu.Windows

TensorRT 执行提供器需要额外安装 TensorRT。

3.3 CUDA 环境配置（如需）

1. 访问 NVIDIA CUDA 官网：https://developer.nvidia.com/cuda-downloads
2. 下载并安装 CUDA 12.x
3. 验证安装：

nvcc --version

4.4 依赖文件配置

将以下 DLL 文件复制到程序运行目录：

CPU 模式

无需额外 DLL 文件。

CUDA 模式

cuda_runtime.dll cudnn64_8.dll cudnn_ops_infer64_8.dll cudnn_cnn_infer64_8.dll

DML 模式

directml.dll onnxruntime_providers_shared.dll

四、模型准备

PP-OCR 模型结构

ppocrv5/ ├── PP-OCRv5_mobile_det_onnx.onnx # 文本检测模型 ├── PP-OCRv5_mobile_cls_onnx.onnx # 文本方向分类模型 ├── PP-OCRv5_mobile_rec_onnx.onnx # 文本识别模型 └── ppocrv5_dict.txt # 识别字典

五、CPU 推理实现

5.1 创建配置

using DeploySharp.Data; using DeploySharp.Engine; using DeploySharp.Model; // 创建 PP-OCR v5 配置 PaddleOCRConfig config = PaddleOCRConfig.GetPPOCRv5Config( detModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx", clsModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx", recModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx", recDictPath: @"E:\Model\ppocrv5\ppocrv5_dict.txt" ); // 配置推理引擎 config.GlobalInferenceBackend = InferenceBackend.OnnxRuntime; config.GlobalDeviceType = DeviceType.CPU; config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.Cpu;

5.2 完整代码示例

using DeploySharp.Data; using DeploySharp.Engine; using DeploySharp.Log; using DeploySharp.Model; using OpenCvSharp; using System.Diagnostics; namespace PaddleOCR.ONNX.CPU.Demo { class Program { static void Main(string[] args) { MyLogger.SetLevel(Log.LogLevel.ERROR); // 读取图片 string imagePath = @"E:\Data\ocr\demo_1.jpg"; Mat img = Cv2.ImRead(imagePath); if (img.Empty()) { Console.WriteLine("图片读取失败！"); return; } // 创建配置 PaddleOCRConfig config = PaddleOCRConfig.GetPPOCRv5Config( detModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx", clsModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx", recModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx", recDictPath: @"E:\Model\ppocrv5\ppocrv5_dict.txt" ); // CPU 推理配置 config.GlobalInferenceBackend = InferenceBackend.OnnxRuntime; config.GlobalDeviceType = DeviceType.CPU; config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.Cpu; config.MaxConcurrency = 4; config.GlobalMaxBatchSize = 1; config.RecConfig.InferImageHeight = 48; config.RecConfig.MaxImageWidth = 320; // 创建预测器 using (PaddleOcrPredictor predictor = new PaddleOcrPredictor(config)) { Console.WriteLine("模型加载完成！"); // 预热 predictor.Predict(img); // 性能测试 Stopwatch sw = Stopwatch.StartNew(); OcrResult result = predictor.Predict(img); sw.Stop(); // 输出结果 Console.WriteLine("\n========== 识别结果 =========="); Console.WriteLine(result.TextContentsToString()); Console.WriteLine($"\n总耗时: {sw.ElapsedMilliseconds} ms"); predictor.PrintTimeProfiling(); // 可视化 Mat resultMat = Visualize.DrawOcrResult(img, result, new VisualizeOptions(1.0f)); Cv2.ImShow("Result", resultMat); Cv2.WaitKey(); } } } }

5.3 性能数据

设备	耗时	备注
AMD Ryzen 7 5800H	~656ms	8核，无 GPU
Intel Core i7-12700H	~550ms	12核，无 GPU

六、CUDA 推理实现

6.1 环境准备

1. 确认已安装 NVIDIA 显卡驱动
2. 安装 CUDA 12.x
3. 复制 CUDA 相关 DLL 文件到程序目录

6.2 创建配置

PaddleOCRConfig config = PaddleOCRConfig.GetPPOCRv5Config( detModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx", clsModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx", recModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx", recDictPath: @"E:\Model\ppocrv5\ppocrv5_dict.txt" ); // CUDA 推理配置 config.GlobalInferenceBackend = InferenceBackend.OnnxRuntime; config.GlobalDeviceType = DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.Cuda; config.MaxConcurrency = 4; config.GlobalMaxBatchSize = 4;

6.3 完整代码示例

using DeploySharp.Data; using DeploySharp.Engine; using DeploySharp.Log; using DeploySharp.Model; using OpenCvSharp; using System.Diagnostics; namespace PaddleOCR.ONNX.CUDA.Demo { class Program { static void Main(string[] args) { MyLogger.SetLevel(Log.LogLevel.ERROR); // 读取图片 string imagePath = @"E:\Data\ocr\demo_1.jpg"; Mat img = Cv2.ImRead(imagePath); // 创建配置 PaddleOCRConfig config = PaddleOCRConfig.GetPPOCRv5Config( detModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx", clsModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx", recModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx", recDictPath: @"E:\Model\ppocrv5\ppocrv5_dict.txt" ); // CUDA 推理配置 config.GlobalInferenceBackend = InferenceBackend.OnnxRuntime; config.GlobalDeviceType = DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.Cuda; config.MaxConcurrency = 4; config.GlobalMaxBatchSize = 4; config.RecConfig.InferImageHeight = 48; config.RecConfig.MaxImageWidth = 320; // 创建预测器 using (PaddleOcrPredictor predictor = new PaddleOcrPredictor(config)) { Console.WriteLine("模型加载完成！"); // 预热 predictor.Predict(img); // 性能测试 Stopwatch sw = Stopwatch.StartNew(); OcrResult result = predictor.Predict(img); sw.Stop(); // 输出结果 Console.WriteLine("\n========== 识别结果 =========="); Console.WriteLine(result.TextContentsToString()); Console.WriteLine($"\n总耗时: {sw.ElapsedMilliseconds} ms"); predictor.PrintTimeProfiling(); // 可视化 Mat resultMat = Visualize.DrawOcrResult(img, result, new VisualizeOptions(1.0f)); Cv2.ImShow("Result", resultMat); Cv2.WaitKey(); } } } }

6.4 性能数据

设备	耗时	备注
NVIDIA RTX 3060	~93ms	CUDA 12
NVIDIA RTX 4070	~65ms	CUDA 12
NVIDIA RTX 4090	~45ms	CUDA 12

七、DML 推理实现

7.1 DML 简介

DirectML (DML) 是 Windows 平台的高性能硬件加速接口，支持 AMD、NVIDIA 和 Intel 多厂商显卡。

7.2 创建配置

PaddleOCRConfig config = PaddleOCRConfig.GetPPOCRv5Config( detModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx", clsModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx", recModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx", recDictPath: @"E:\Model\ppocrv5\ppocrv5_dict.txt" ); // DML 推理配置 config.GlobalInferenceBackend = InferenceBackend.OnnxRuntime; config.GlobalDeviceType = DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.Dml; config.MaxConcurrency = 2; config.GlobalMaxBatchSize = 2;

7.3 完整代码示例

using DeploySharp.Data; using DeploySharp.Engine; using DeploySharp.Log; using DeploySharp.Model; using OpenCvSharp; using System.Diagnostics; namespace PaddleOCR.ONNX.DML.Demo { class Program { static void Main(string[] args) { MyLogger.SetLevel(Log.LogLevel.ERROR); // 读取图片 string imagePath = @"E:\Data\ocr\demo_1.jpg"; Mat img = Cv2.ImRead(imagePath); // 创建配置 PaddleOCRConfig config = PaddleOCRConfig.GetPPOCRv5Config( detModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx", clsModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx", recModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx", recDictPath: @"E:\Model\ppocrv5\ppocrv5_dict.txt" ); // DML 推理配置 config.GlobalInferenceBackend = InferenceBackend.OnnxRuntime; config.GlobalDeviceType = DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.Dml; config.MaxConcurrency = 2; config.GlobalMaxBatchSize = 2; config.RecConfig.InferImageHeight = 48; config.RecConfig.MaxImageWidth = 320; // 创建预测器 using (PaddleOcrPredictor predictor = new PaddleOcrPredictor(config)) { Console.WriteLine("模型加载完成！"); // 预热 predictor.Predict(img); // 性能测试 Stopwatch sw = Stopwatch.StartNew(); OcrResult result = predictor.Predict(img); sw.Stop(); // 输出结果 Console.WriteLine("\n========== 识别结果 =========="); Console.WriteLine(result.TextContentsToString()); Console.WriteLine($"\n总耗时: {sw.ElapsedMilliseconds} ms"); predictor.PrintTimeProfiling(); // 可视化 Mat resultMat = Visualize.DrawOcrResult(img, result, new VisualizeOptions(1.0f)); Cv2.ImShow("Result", resultMat); Cv2.WaitKey(); } } } }

7.4 性能数据

设备	耗时	备注
NVIDIA RTX 3060	~114ms	DML
NVIDIA RTX 4070	~75ms	DML
AMD RX 6800	~95ms	DML
Intel Arc A750	~130ms	DML

八、TensorRT 推理实现

8.1 环境准备

1. 安装 CUDA 12.x
2. 安装 TensorRT 8.x
3. 配置环境变量

8.2 创建配置

PaddleOCRConfig config = PaddleOCRConfig.GetPPOCRv5Config( detModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx", clsModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx", recModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx", recDictPath: @"E:\Model\ppocrv5\ppocrv5_dict.txt" ); // TensorRT 推理配置 config.GlobalInferenceBackend = InferenceBackend.OnnxRuntime; config.GlobalDeviceType = DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.TensorRt; config.MaxConcurrency = 4; config.GlobalMaxBatchSize = 4;

注意：首次推理时，ONNX Runtime 会自动将 ONNX 模型编译为 TensorRT 引擎，这个过程可能需要数分钟。

8.3 完整代码示例

using DeploySharp.Data; using DeploySharp.Engine; using DeploySharp.Log; using DeploySharp.Model; using OpenCvSharp; using System.Diagnostics; namespace PaddleOCR.ONNX.TensorRT.Demo { class Program { static void Main(string[] args) { MyLogger.SetLevel(Log.LogLevel.ERROR); // 读取图片 string imagePath = @"E:\Data\ocr\demo_1.jpg"; Mat img = Cv2.ImRead(imagePath); // 创建配置 PaddleOCRConfig config = PaddleOCRConfig.GetPPOCRv5Config( detModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_det_onnx.onnx", clsModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_cls_onnx.onnx", recModelPath: @"E:\Model\ppocrv5\PP-OCRv5_mobile_rec_onnx_combined.onnx", recDictPath: @"E:\Model\ppocrv5\ppocrv5_dict.txt" ); // TensorRT 推理配置 config.GlobalInferenceBackend = InferenceBackend.OnnxRuntime; config.GlobalDeviceType = DeviceType.GPU0; config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.TensorRt; config.MaxConcurrency = 4; config.GlobalMaxBatchSize = 4; config.RecConfig.InferImageHeight = 48; config.RecConfig.MaxImageWidth = 320; // 创建预测器 using (PaddleOcrPredictor predictor = new PaddleOcrPredictor(config)) { Console.WriteLine("模型加载完成！"); // 预热（首次会编译 TensorRT 引擎，需要较长时间） Console.WriteLine("开始预热（首次运行会编译 TensorRT 引擎，请耐心等待）..."); predictor.Predict(img); Console.WriteLine("预热完成！"); // 性能测试 Stopwatch sw = Stopwatch.StartNew(); OcrResult result = predictor.Predict(img); sw.Stop(); // 输出结果 Console.WriteLine("\n========== 识别结果 =========="); Console.WriteLine(result.TextContentsToString()); Console.WriteLine($"\n总耗时: {sw.ElapsedMilliseconds} ms"); predictor.PrintTimeProfiling(); // 可视化 Mat resultMat = Visualize.DrawOcrResult(img, result, new VisualizeOptions(1.0f)); Cv2.ImShow("Result", resultMat); Cv2.WaitKey(); } } } }

8.4 性能数据

设备	耗时	备注
NVIDIA RTX 3060	~52ms	TensorRT
NVIDIA RTX 4070	~35ms	TensorRT
NVIDIA RTX 4090	~25ms	TensorRT

九、性能对比与优化

9.1 性能对比

以下为使用相同测试图片在不同后端上的性能对比：

执行提供器	设备	耗时	相对性能
CPU	AMD Ryzen 7 5800H	656ms	1.0x
DML	NVIDIA RTX 3060	114ms	5.75x
DML	Intel Arc 140V	331ms	1.98x
CUDA	NVIDIA RTX 3060	93ms	7.05x
TensorRT	NVIDIA RTX 3060	52ms	12.6x

9.2 优化建议

并发优化

// 根据硬件调整并发数 // GPU 推理建议设置为 2-4 config.MaxConcurrency = 4; // CPU 推理建议设置为 CPU 核心数 config.MaxConcurrency = 8;

批处理优化

// GPU 推理建议增大 Batch Size config.GlobalMaxBatchSize = 4; // CPU 推理建议保持 Batch Size 为 1 config.GlobalMaxBatchSize = 1;

模型优化

// 调整识别模型输入尺寸 config.RecConfig.InferImageHeight = 48; // 降低高度可加速 config.RecConfig.MaxImageWidth = 320; // 限制宽度

预热优化

// 进行 1-2 次预热推理 for (int i = 0; i < 2; i++) { predictor.Predict(img); }

十、常见问题解答

Q1: CUDA 推理报错怎么办？

A:检查以下几点：

1. 确认 CUDA 版本是否正确安装
2. 检查 CUDA 相关 DLL 文件是否在程序目录
3. 确认显卡驱动是否为最新版本
4. 检查显卡是否支持 CUDA

Q2: DML 推理速度慢怎么办？

A:优化建议：

1. 确认显卡驱动是否为最新版本
2. 减小并发数和 Batch Size
3. 尝试使用 CUDA 或 TensorRT（如果使用 NVIDIA 显卡）

Q3: TensorRT 首次推理很慢？

A:这是正常现象：

首次推理时，ONNX Runtime 会自动将 ONNX 模型编译为 TensorRT 引擎，这个过程可能需要数分钟。编译完成后，后续推理速度会显著提升。

Q4: 如何切换不同执行提供器？

A:只需修改配置：

// CPU config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.Cpu; // CUDA config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.Cuda; // DML config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.Dml; // TensorRT config.GlobalOnnxRuntimeDeviceType = OnnxRuntimeDeviceType.TensorRt;

Q5: 如何选择最佳执行提供器？

A:根据硬件和需求选择：

场景	推荐后端
无 GPU，跨平台	CPU
NVIDIA 显卡，快速部署	CUDA
NVIDIA 显卡，追求性能	TensorRT
Windows 平台，AMD 显卡	DML
Windows 平台，多品牌显卡	DML

十一、软件获取

11.1 源码下载

DeploySharp 项目已完全开源：

https://github.com/guojin-yan/DeploySharp.git

11.2 Demo 程序

控制台 Demo：

demos/DeploySharp.OpenCvSharp.PaddleOcr.Demo

桌面应用 Demo：

applications/.NET 8.0/JYPPX.DeploySharp.OpenCvSharp.PaddleOcr

结语

通过本文的介绍，您应该已经掌握了使用 DeploySharp 和 ONNX Runtime 部署 PP-OCR v4/v5 模型的完整流程。ONNX Runtime 作为微软推出的高性能推理引擎，支持多种执行提供器和硬件平台，是 .NET 开发者进行 OCR 部署的理想选择。

如遇到问题，欢迎通过 GitHub Issues 或 QQ 交流群（945057948）联系我们。

QQ群二维码

作者：Guojin Yan
发布时间：2026年4月

【文章声明】

本文主要内容基于作者的研究与实践，部分表述借助 AI 工具进行了辅助优化。由于技术局限性，文中可能存在错误或疏漏之处，恳请各位读者批评指正。如果内容无意中侵犯了您的权益，请及时通过公众号后台与我们联系，我们将第一时间核实并妥善处理。感谢您的理解与支持！