Overview

Relevant source files

Purpose and Scope

This document provides a comprehensive overview of PaddleOCR 3.x, an industry-leading OCR and document AI engine. It covers the system's architecture, core capabilities, major pipelines, and the evolution from version 2.x to 3.x. This overview is intended for developers and technical users who need to understand how PaddleOCR is structured and how its components interact.

For specific pipeline usage details, see pages 2.1 through 2.7. For deployment strategies, refer to section 5. For training and model development, see section 4.

What is PaddleOCR 3.x

PaddleOCR is a production-ready OCR and document AI engine built on PaddlePaddle 3.0+ framework. It converts documents and images into structured, AI-friendly data formats (JSON, Markdown) with industry-leading accuracy. The system supports 111 languages and provides end-to-end solutions from text extraction to intelligent document understanding.

System Positioning:

PaddleOCR operates as a layered system with four main tiers:

User Interfaces: CLI (paddleocr command), Python API (PaddleOCR class), Web UI (PPOCRLabel), and MCP Server for agent integration
Core Pipeline System: Five major pipeline families (PP-OCRv5, PP-StructureV3, PaddleOCR-VL, PP-ChatOCRv4, PP-DocTranslation)
PaddleX Integration Layer: Unified inference engine (paddlex[ocr-core]) providing pipeline registry and deployment infrastructure
Framework Layer: PaddlePaddle 3.0+ with CINN compiler and multi-device support

Key Characteristics:

Multi-Pipeline Architecture: Supports text recognition (PP-OCRv5), complex document parsing (PP-StructureV3), VLM-based understanding (PaddleOCR-VL), intelligent extraction (PP-ChatOCRv4), and document translation
Production-Ready Tools: Complete ML lifecycle support including training system (tools/train.py), inference engines, and deployment infrastructure for high-performance, service, and on-device scenarios
Hardware Flexibility: Compatible with GPU, CPU, XPU, NPU, MLU, and DCU through abstracted device support
Open Source: Apache 2.0 license with 60,000+ GitHub stars, integrated into leading projects like MinerU, RAGFlow, pathway, and cherry-studio

Sources: README.md1-50 docs/index.en.md1-50 Diagram 1 (Complete System Overview)

Version Evolution: From 2.x to 3.x

PaddleOCR 3.x represents a major architectural redesign from the 2.x series. The upgrade addresses technical debt accumulated over four years of rapid feature growth and introduces modern capabilities for document AI applications.

Why the Major Version Upgrade?

Technical Challenges in 2.x:

Architecture designed for lightweight OCR could not scale to complex document understanding
Code duplication and inconsistent interfaces across modules
Incompatibility with latest PaddlePaddle features
High maintenance cost due to bridging layers and workarounds

New Requirements:

Integration with Transformer-based vision-language models
Support for large language model (LLM) collaboration
Compatibility with PaddlePaddle 3.0 features (CINN compiler, improved hardware support)
Unified interface for training, inference, and deployment

Major Changes in 3.x

Version Evolution Timeline:

1. New Model Pipelines and Capabilities:

PP-OCRv5 (from PP-OCRv4): Enhanced accuracy (+13%), multi-text-type support (simplified/traditional Chinese, English, Japanese, Pinyin in single model), expanded to 111 languages
PP-StructureV3 (from PP-StructureV2): SOTA document parsing with 20 element categories (vs. 10), improved table recognition (SLANeXt), cross-page merging, chart-to-table conversion
PP-ChatOCRv4 (new): LLM-powered information extraction (+15% accuracy), native ERNIE 4.5 integration, multi-page PDF support, Q&A capability
PaddleOCR-VL (new): 0.9B parameter VLM alternative approach, 111 languages, 94.5% on OmniDocBench, minimal resource consumption
PP-DocTranslation (new): Document translation pipeline combining PP-StructureV3 + ERNIE 4.5

2. Unified Deployment Architecture:

Integration with PaddleX framework (paddlex[ocr-core]) for underlying capabilities
Consistent Python API and CLI interfaces across all pipelines via PaddleXPipelineWrapper base class paddleocr/_pipelines/base.py54
Modular design: pipelines compose reusable foundation modules via PaddleXPredictorWrapper paddleocr/_models/base.py30
Support for high-performance inference (enable_hpi), service deployment (Docker Compose), and on-device execution (Paddle Lite)

3. Framework Compatibility:

Full PaddlePaddle 3.0+ support (CINN compiler, latest optimizations)
Standardized model naming conventions (e.g., PP-OCRv5_server_det, PP-OCRv5_mobile_rec)
Improved training workflows via tools/train.py and tools/program.py

Breaking Changes:

PaddleOCR.ocr() method no longer accepts det, rec parameters (use TextDetection, TextRecognition classes from paddleocr._models instead)
show_log parameter replaced by new logging system controlled by environment variable (see paddleocr._env)
use_onnx superseded by enable_hpi high-performance inference flag
PPStructure class removed, replaced by PPStructureV3 paddleocr._pipelines.pp_structurev3
Legacy PP-StructureV2 system now under ppstructure/ for migration reference only

Sources: docs/update/upgrade_notes.en.md1-80 README.md81-236 paddleocr/_pipelines/base.py54-110 paddleocr/_models/base.py30-83 Diagram 2 (Model Ecosystem and Version Evolution)

Core Capabilities and Pipelines

PaddleOCR 3.x organizes functionality into five major pipeline families, each addressing specific document understanding needs. The system evolved from the 2.x series with significant improvements in accuracy, language support, and architectural design.

Pipeline Evolution and Current State

Model Ecosystem Progression:

Series	Latest Version	Key Capabilities	Evolution Path
PP-OCR	v5 (2024)	5 text types, 13% accuracy ↑, 111 languages	v2→v3→v4→v5
PP-Structure	v3 (2024)	20 layout types, SLANeXt tables	v1→v2→v3 (v2 deprecated)
PaddleOCR-VL	1.5 (2026)	0.9B params, 94.5% OmniDocBench, NaViT + ERNIE-4.5	1.0→1.5
PP-ChatOCR	v4 (2024)	ERNIE 4.5 integration, multi-page PDF	v3→v4
PP-DocTranslation	Current	Structure + LLM translation	New in 3.x

Specialized Pipelines:

Formula Recognition: LaTeX-OCR, UniMERNet models
Table Recognition V2: PP-TableMagic, SLANeXt for wired/wireless tables
Seal Recognition: PP-OCRv4_seal_det for curved text
Document Preprocessing: Orientation classification (PP-LCNet_x1_0_doc_ori), unwarping (UVDoc)

Pipeline Architecture

Pipeline Layer Mapping to Code Entities:

Pipeline Capabilities Summary

Pipeline	Class Name	Primary Use Case	Key Features
PP-OCRv5	`PaddleOCR`	Universal text recognition	Single model for 5 text types (Simplified/Traditional Chinese, English, Japanese, Pinyin), 13% accuracy improvement
PP-StructureV3	`PPStructureV3`	Complex document parsing	20 layout categories, Markdown/JSON output, cross-page table merging, chart-to-table conversion
PP-ChatOCRv4	`PPChatOCRv4Doc`	Intelligent extraction	ERNIE 4.5 LLM integration, 15% accuracy gain, multi-page PDF support, Q&A capability
PaddleOCR-VL	`PaddleOCRVL`	VLM-based parsing	0.9B parameter VLM, 111 languages, 94.5% on OmniDocBench, alternative to pipeline approach
PP-DocTranslation	`PPDocTranslation`	Document translation	Structure-preserving translation using PP-StructureV3 + ERNIE 4.5, Markdown output

Sources: paddleocr/_pipelines/ocr.py1-50 paddleocr/_pipelines/pp_structurev3.py1-50 paddleocr/_pipelines/base.py54-110 paddleocr/_models/base.py30-83 README.md51-76 Diagram 2 (Model Ecosystem and Version Evolution)

System Architecture

High-Level Component Organization

Sources: paddleocr/_pipelines/base.py54-136 paddleocr/_common_args.py31-94 pyproject.toml54-56

Component Responsibilities

User Interface Layer:

CLI Entry Point [paddleocr.__main__:console_entry]: Argument parsing, subcommand routing to pipeline executors
Python API: Direct class instantiation for programmatic access
MCP Server: Model Context Protocol integration for agent applications (e.g., Claude Desktop)

Pipeline Orchestration Layer:

PaddleXPipelineWrapper paddleocr/_pipelines/base.py54-110: Base class for all pipelines; manages PaddleX config merging, pipeline creation, resource cleanup
Common Args Parser paddleocr/_common_args.py31-57: Validates and normalizes device, precision, optimization settings
Config Management: Exports/imports PaddleX YAML configs for advanced customization

PaddleX Integration Layer:

create_pipeline(): Factory function instantiating PaddleX pipeline from config
create_predictor(): Factory function for single-model predictors
Pipeline Config: YAML-based configuration merged from defaults + user overrides

Inference Backend Layer:

Paddle Inference: Primary inference engine with native PaddlePaddle support
TensorRT: GPU acceleration via NVIDIA TensorRT subgraph engine
MKL-DNN: CPU optimization using Intel Math Kernel Library for Deep Neural Networks
ONNX Runtime: Cross-platform inference via exported ONNX models

Hardware Abstraction:

Device Parser [paddlex.utils.device]: Parses device strings (gpu:0,1, npu, etc.) and handles multi-device setup
Hardware Backends: Abstracted access to heterogeneous computing devices

Sources: paddleocr/_pipelines/base.py54-136 paddleocr/_common_args.py31-150 paddleocr/_models/base.py30-83

Key Components and Code Entities

Pipeline Wrapper Pattern

All pipelines inherit from PaddleXPipelineWrapper and follow a consistent pattern:

Key Methods:

__init__(): Parses common args (device, precision, etc.), merges PaddleX config, creates underlying pipeline
predict_iter() / predict(): Inference interface; predict_iter() yields results for streaming, predict() returns list
export_paddlex_config_to_yaml(): Serializes current config to YAML for inspection/editing
close(): Releases resources (important for proper cleanup)

Template Method Pattern:

_paddlex_pipeline_name: Property returning PaddleX registration name (e.g., "OCR", "PP-StructureV3")
_get_paddlex_config_overrides(): Builds config overrides from constructor parameters

Sources: paddleocr/_pipelines/base.py54-110

Model Predictor Pattern

Individual modules (e.g., text detection, table recognition) use PaddleXPredictorWrapper:

Key Differences from Pipeline Wrapper:

Accepts model_name and model_dir for model selection/custom models
No config management (single-model inference is simpler)
Each predictor class defines a default_model_name property

Sources: paddleocr/_models/base.py30-83 paddleocr/_models/text_detection.py1-50 paddleocr/_models/text_recognition.py1-50

Common Arguments and Configuration

All pipelines and predictors accept common inference arguments:

Parameter	Type	Default	Description
`device`	`str`	Auto-detect	Device specification: `cpu`, `gpu`, `gpu:0`, `gpu:0,1`, `npu`, `xpu`
`enable_hpi`	`bool`	Pipeline-specific	Enable high-performance inference (ONNX backend)
`use_tensorrt`	`bool`	`False`	Use Paddle Inference TensorRT subgraph engine
`precision`	`str`	`"fp32"`	TensorRT precision: `fp32`, `fp16`
`enable_mkldnn`	`bool`	`True`	Use MKL-DNN for CPU acceleration
`mkldnn_cache_capacity`	`int`	`10`	MKL-DNN cache capacity
`cpu_threads`	`int`	`10`	Number of CPU threads for inference
`enable_cinn`	`bool`	`False`	Use CINN compiler

Processing Flow:

Argument Parsing paddleocr/_common_args.py31-57: Validates and normalizes common args
Device Resolution [paddlex.utils.device]: Auto-detects GPU if available, falls back to CPU
Backend Configuration paddleocr/_common_args.py60-94: Creates PaddlePredictorOption with run mode (paddle, trt_fp32, trt_fp16) and optimization settings
Pipeline/Predictor Initialization: Passes prepared config to PaddleX

Constants: paddleocr/_constants.py1-23

Sources: paddleocr/_common_args.py31-150 paddleocr/_constants.py1-23

Integration with PaddleX

PaddleOCR 3.x is built on top of PaddleX, a low-code development framework. This integration provides unified infrastructure for inference deployment while keeping PaddleOCR's OCR-focused API surface clean.

Relationship Diagram

Version Compatibility

PaddleOCR Version	PaddleX Version	PaddlePaddle Version
`3.0.0`	`3.0.0`	`>= 3.0.0`
`3.0.1`	`3.0.1`	`>= 3.0.0`
`3.0.2`	`3.0.2`	`>= 3.0.0`
`3.0.3`	`>= 3.0.3`	`>= 3.0.0`
`3.1.x`	`>= 3.1.0, < 3.2.0`	`>= 3.0.0`
`3.2.x`	`>= 3.2.0, < 3.3.0`	`>= 3.0.0`
`3.3.x`	`>= 3.3.0, < 3.4.0`	`>= 3.0.0`

PaddleX Pipeline Registration Names

PaddleOCR pipeline classes map to PaddleX internal pipeline names:

PaddleOCR Class	PaddleX Pipeline Name
`PaddleOCR`	`OCR`
`PPStructureV3`	`PP-StructureV3`
`PPChatOCRv4Doc`	`PP-ChatOCRv4-doc`
`PaddleOCRVL`	`PaddleOCR-VL`
`TableRecognitionV2`	`table_recognition_v2`
`FormulaRecognition`	`formula_recognition`
`SealRecognition`	`seal_recognition`
`DocPreprocessor`	`doc_preprocessor`
`DocUnderstanding`	`doc_understanding`
`PPDocTranslation`	`PP-DocTranslation`

Configuration Export/Import

PaddleX uses YAML configuration files for advanced settings. PaddleOCR exposes this through:

Export Configuration:

Load Configuration:

Configuration Merging Logic paddleocr/_pipelines/base.py90-100:

Load default PaddleX config for pipeline
Apply user-provided paddlex_config (if any)
Override with pipeline-specific params from constructor

Sources: docs/version3.x/paddleocr_and_paddlex.en.md1-96 paddleocr/_pipelines/base.py54-110

User Access Patterns and Deployment

PaddleOCR 3.x provides multiple entry points tailored to different skill levels and use cases, with clear progression paths from development to production deployment.

Entry Points and User Journey

Entry Points by User Type:

Training, Inference, and Deployment Pipeline

Complete ML Lifecycle:

Deployment Characteristics

Option	Entry Point	Use Case	Performance	Hardware Support
Quick Start	`pip install paddleocr`	Experimentation, prototyping	Basic	CPU/GPU
CLI	`paddleocr ocr -i image.png`	Scripting, batch processing	Basic-High	CPU/GPU/XPU/NPU
Python API	`PaddleOCR(enable_hpi=True)`	Application integration	High (with HPI)	CPU/GPU/XPU/NPU
Service Deployment	Docker Compose + HTTP API	Web services, microservices	High	GPU/CPU/XPU/NPU/MLU/DCU
C++ Local	deploy/cpp_infer/	Embedded systems, standalone apps	High	GPU/CPU
On-Device	Paddle Lite	Mobile apps, edge computing	Medium	ARM CPU/GPU
MCP Server	paddleocr._deployment.mcp	AI agent integration (Claude Desktop)	Medium	GPU/CPU

Configuration Options Across Entry Points:

Model Selection: Server vs. mobile models (e.g., PP-OCRv5_server_rec vs. PP-OCRv5_mobile_rec)
Language Choice: 111 languages supported via lang parameter
Hardware Target: Device specification via device parameter (cpu, gpu, gpu:0,1, xpu, npu)
Precision Mode: FP32/FP16/INT8 via precision and use_tensorrt parameters

Sources: tools/train.py1-50 tools/export_model.py1-50 paddleocr/__main__.py1-40 README.md107-146 Diagram 3 (User Journey and Access Patterns), Diagram 4 (Training, Inference, and Deployment Pipeline)

Hardware Support and Acceleration Frameworks

PaddleOCR 3.x supports diverse hardware platforms through PaddlePaddle's hardware abstraction layer and specialized acceleration frameworks. Different pipelines have varying hardware requirements based on their computational characteristics.

Hardware and Acceleration Architecture

Traditional Pipeline (PP-StructureV3) vs VLM Pipeline (PaddleOCR-VL):

Hardware Configuration

Device String Format:

Single device: cpu, gpu, gpu:0, npu, xpu
Multiple devices: gpu:0,1,2,3 (parallel inference where supported)
Parsed by paddlex.utils.device

Optimization by Platform and Pipeline:

Hardware	Traditional Pipelines	VLM Pipelines (PaddleOCR-VL)	Configuration
NVIDIA GPU (CC ≥ 7.0)	TensorRT subgraph, CUDA, fp16	Base inference (CC < 8.0)	`device="gpu"`, `use_tensorrt=True`
NVIDIA GPU (CC ≥ 8.0)	TensorRT subgraph, CUDA, fp16	vLLM acceleration	`device="gpu"`, acceleration auto-selected
NVIDIA GPU (8.0 ≤ CC < 12.0)	TensorRT subgraph, CUDA, fp16	SGLang acceleration	`device="gpu"`
CPU (x64/ARM)	MKL-DNN (oneDNN), OpenMP threading	Base inference	`device="cpu"`, `enable_mkldnn=True`
Kunlunxin XPU	FastDeploy plugin	FastDeploy plugin	`device="xpu"`
Ascend NPU	FastDeploy plugin	vLLM/SGLang support	`device="npu"`
HYGON DCU	FastDeploy plugin	FastDeploy plugin	`device="dcu"`
MetaX GPU	FastDeploy plugin	FastDeploy plugin	`device="metax"`
Iluvatar GPU	FastDeploy plugin	FastDeploy plugin	`device="iluvatar"`
Apple Silicon	MKL-DNN on CPU	MLX-VLM acceleration	`device="cpu"`

Comparison Metrics:

Aspect	Traditional Pipelines	VLM Pipelines
Speed	Fast (optimized multi-model)	Variable (benefits from acceleration)
Accuracy	High (specialized models)	94.5% on OmniDocBench
Complexity	4-5 separate models	Single 0.9B model
Languages	37+ (PP-OCRv5)	111 (PaddleOCR-VL)
GPU Requirements	Any NVIDIA (CC ≥ 7.0)	CC ≥ 8.0 for acceleration

Device Auto-Detection Logic:

Checks PaddlePaddle framework for device support via paddle.device.is_compiled_with_cuda()
Prefers GPU if available and compiled, falls back to CPU
Respects explicit user device specification via device parameter
Handled by paddleocr/_common_args.py60-94

Sources: paddleocr/_common_args.py60-94 README.md20-22 README.md53-56 Diagram 6 (PaddleOCR-VL vs Traditional Pipeline Comparison)

Installation and Dependencies

Package Structure

PaddleOCR is distributed as a Python package with optional dependency groups:

paddleocr/
├── Core dependencies (always installed)
│   ├── paddlex[ocr-core] (3.3.0)
│   ├── PyYAML
│   └── requests
│
└── Optional dependency groups
    ├── [all] - Full functionality
    ├── [doc-parser] - PP-StructureV3, PaddleOCR-VL
    ├── [ie] - Information extraction (PP-ChatOCRv4)
    └── [trans] - Document translation (PP-DocTranslation)

Installation Commands:

Dependency Isolation:

Installing PaddleOCR does NOT install all of PaddleX's dependencies
Only OCR-specific PaddleX dependencies are included
Keeps installation size minimal (tested: 717 MB → 738 MB in Python 3.10/Linux/x64)

Sources: pyproject.toml1-76 requirements.txt1-18 README.md244-258 docs/version3.x/paddleocr_and_paddlex.en.md23-24

Module Organization

File Structure Overview

paddleocr/
├── __init__.py                 # Package exports
├── __main__.py                 # CLI entry point (console_entry)
├── _constants.py               # Default configuration values
├── _common_args.py             # Common argument parsing
├── _abstract.py                # Abstract base classes
│
├── _pipelines/                 # Pipeline implementations
│   ├── base.py                 # PaddleXPipelineWrapper
│   ├── ocr.py                  # PaddleOCR (PP-OCRv5)
│   ├── pp_structurev3.py       # PPStructureV3
│   ├── pp_chatocrv4_doc.py     # PPChatOCRv4Doc
│   ├── paddleocr_vl.py         # PaddleOCRVL
│   ├── doc_preprocessor.py     # DocPreprocessor
│   └── ...
│
├── _models/                    # Individual module implementations
│   ├── base.py                 # PaddleXPredictorWrapper
│   ├── text_detection.py       # TextDetection
│   ├── text_recognition.py     # TextRecognition
│   ├── layout_detection.py     # LayoutDetection
│   └── ...
│
└── _utils/                     # Utility modules
    ├── cli.py                  # CLI helpers
    └── logging.py              # Logging configuration

Key Design Patterns:

Wrapper Pattern: PaddleXPipelineWrapper and PaddleXPredictorWrapper provide consistent interface over PaddleX
Factory Pattern: create_pipeline() and create_predictor() from PaddleX
Template Method: Subclasses implement _paddlex_pipeline_name, _get_paddlex_config_overrides(), etc.
Strategy Pattern: Different inference backends (Paddle Inference, TensorRT, ONNX) configurable via options

Sources: paddleocr/_pipelines/base.py1-136 paddleocr/_models/base.py1-108 paddleocr/__main__.py1-50

Testing and Quality Assurance

PaddleOCR 3.x includes comprehensive testing infrastructure:

CI/CD Pipeline

GitHub Actions Workflow .github/workflows/tests.yaml1-64:

Runs on push to main and release/* branches
Skips tests if only docs/config files changed
Python 3.10 on Ubuntu
Caches dependencies for faster runs
Installs test dependencies: pytest, paddlepaddle==3.2.0, paddleocr[all]

Test Organization

tests/
├── pipelines/
│   ├── test_pp_structurev3.py      # PP-StructureV3 tests
│   ├── test_pp_chatocrv4_doc.py    # PP-ChatOCRv4 tests
│   └── ...
│
├── models/
│   ├── test_text_detection.py      # TextDetection tests
│   └── ...
│
└── testing_utils.py                # Shared test utilities

Test Markers:

@pytest.mark.resource_intensive: Heavy tests (skipped by default)
Parameterized tests for different configurations
Monkeypatching for testing parameter forwarding without full inference

Example Test Pattern:

Sources: .github/workflows/tests.yaml1-64 tests/pipelines/test_pp_structurev3.py1-81 tests/pipelines/test_pp_chatocrv4_doc.py1-81

Documentation Structure

PaddleOCR 3.x uses MkDocs for documentation with internationalization support:

Documentation Architecture

Configuration: mkdocs.yml1-389

Material theme with search, navigation features
i18n plugin for Chinese (default) and English
Mermaid diagrams, code highlighting, mathematical expressions
Git revision dates and committer information

Content Organization:

docs/
├── index.md / index.en.md          # Home pages
├── quick_start.md / quick_start.en.md  # Getting started
├── version3.x/                     # Current version docs
│   ├── installation.md
│   ├── pipeline_usage/             # Pipeline tutorials
│   ├── module_usage/               # Module reference
│   ├── deployment/                 # Deployment guides
│   └── ...
├── version2.x/                     # Legacy docs
└── update/                         # Version upgrade guides

Navigation Structure mkdocs.yml290-389:

Home, Installation, Quick Start
Pipeline-specific tutorials (PP-OCRv5, PP-StructureV3, etc.)
Module reference (text detection, recognition, layout, etc.)
Deployment guides (HPI, C++, serving, on-device)
Multi-hardware usage
Community and contribution guidelines

Sources: mkdocs.yml1-389 docs/index.en.md1-88 docs/quick_start.en.md1-197

This overview provides a comprehensive introduction to PaddleOCR 3.x architecture, capabilities, and design patterns. For specific implementation details, refer to the section-specific pages linked throughout this document.