This document provides a comprehensive overview of PaddleOCR 3.x, an industry-leading OCR and document AI engine. It covers the system's architecture, core capabilities, major pipelines, and the evolution from version 2.x to 3.x. This overview is intended for developers and technical users who need to understand how PaddleOCR is structured and how its components interact.
For specific pipeline usage details, see pages 2.1 through 2.7. For deployment strategies, refer to section 5. For training and model development, see section 4.
PaddleOCR is a production-ready OCR and document AI engine built on PaddlePaddle 3.0+ framework. It converts documents and images into structured, AI-friendly data formats (JSON, Markdown) with industry-leading accuracy. The system supports 111 languages and provides end-to-end solutions from text extraction to intelligent document understanding.
System Positioning:
PaddleOCR operates as a layered system with four main tiers:
paddleocr command), Python API (PaddleOCR class), Web UI (PPOCRLabel), and MCP Server for agent integrationpaddlex[ocr-core]) providing pipeline registry and deployment infrastructureKey Characteristics:
Sources: README.md1-50 docs/index.en.md1-50 Diagram 1 (Complete System Overview)
PaddleOCR 3.x represents a major architectural redesign from the 2.x series. The upgrade addresses technical debt accumulated over four years of rapid feature growth and introduces modern capabilities for document AI applications.
Technical Challenges in 2.x:
New Requirements:
Version Evolution Timeline:
1. New Model Pipelines and Capabilities:
2. Unified Deployment Architecture:
PaddleXPipelineWrapper base class paddleocr/_pipelines/base.py54PaddleXPredictorWrapper paddleocr/_models/base.py30enable_hpi), service deployment (Docker Compose), and on-device execution (Paddle Lite)3. Framework Compatibility:
PP-OCRv5_server_det, PP-OCRv5_mobile_rec)Breaking Changes:
PaddleOCR.ocr() method no longer accepts det, rec parameters (use TextDetection, TextRecognition classes from paddleocr._models instead)show_log parameter replaced by new logging system controlled by environment variable (see paddleocr._env)use_onnx superseded by enable_hpi high-performance inference flagPPStructure class removed, replaced by PPStructureV3 paddleocr._pipelines.pp_structurev3Sources: docs/update/upgrade_notes.en.md1-80 README.md81-236 paddleocr/_pipelines/base.py54-110 paddleocr/_models/base.py30-83 Diagram 2 (Model Ecosystem and Version Evolution)
PaddleOCR 3.x organizes functionality into five major pipeline families, each addressing specific document understanding needs. The system evolved from the 2.x series with significant improvements in accuracy, language support, and architectural design.
Model Ecosystem Progression:
| Series | Latest Version | Key Capabilities | Evolution Path |
|---|---|---|---|
| PP-OCR | v5 (2024) | 5 text types, 13% accuracy ↑, 111 languages | v2→v3→v4→v5 |
| PP-Structure | v3 (2024) | 20 layout types, SLANeXt tables | v1→v2→v3 (v2 deprecated) |
| PaddleOCR-VL | 1.5 (2026) | 0.9B params, 94.5% OmniDocBench, NaViT + ERNIE-4.5 | 1.0→1.5 |
| PP-ChatOCR | v4 (2024) | ERNIE 4.5 integration, multi-page PDF | v3→v4 |
| PP-DocTranslation | Current | Structure + LLM translation | New in 3.x |
Specialized Pipelines:
Pipeline Layer Mapping to Code Entities:
| Pipeline | Class Name | Primary Use Case | Key Features |
|---|---|---|---|
| PP-OCRv5 | PaddleOCR | Universal text recognition | Single model for 5 text types (Simplified/Traditional Chinese, English, Japanese, Pinyin), 13% accuracy improvement |
| PP-StructureV3 | PPStructureV3 | Complex document parsing | 20 layout categories, Markdown/JSON output, cross-page table merging, chart-to-table conversion |
| PP-ChatOCRv4 | PPChatOCRv4Doc | Intelligent extraction | ERNIE 4.5 LLM integration, 15% accuracy gain, multi-page PDF support, Q&A capability |
| PaddleOCR-VL | PaddleOCRVL | VLM-based parsing | 0.9B parameter VLM, 111 languages, 94.5% on OmniDocBench, alternative to pipeline approach |
| PP-DocTranslation | PPDocTranslation | Document translation | Structure-preserving translation using PP-StructureV3 + ERNIE 4.5, Markdown output |
Sources: paddleocr/_pipelines/ocr.py1-50 paddleocr/_pipelines/pp_structurev3.py1-50 paddleocr/_pipelines/base.py54-110 paddleocr/_models/base.py30-83 README.md51-76 Diagram 2 (Model Ecosystem and Version Evolution)
Sources: paddleocr/_pipelines/base.py54-136 paddleocr/_common_args.py31-94 pyproject.toml54-56
User Interface Layer:
paddleocr.__main__:console_entry]: Argument parsing, subcommand routing to pipeline executorsPipeline Orchestration Layer:
PaddleXPipelineWrapper paddleocr/_pipelines/base.py54-110: Base class for all pipelines; manages PaddleX config merging, pipeline creation, resource cleanupPaddleX Integration Layer:
create_pipeline(): Factory function instantiating PaddleX pipeline from configcreate_predictor(): Factory function for single-model predictorsInference Backend Layer:
Hardware Abstraction:
gpu:0,1, npu, etc.) and handles multi-device setupSources: paddleocr/_pipelines/base.py54-136 paddleocr/_common_args.py31-150 paddleocr/_models/base.py30-83
All pipelines inherit from PaddleXPipelineWrapper and follow a consistent pattern:
Key Methods:
__init__(): Parses common args (device, precision, etc.), merges PaddleX config, creates underlying pipelinepredict_iter() / predict(): Inference interface; predict_iter() yields results for streaming, predict() returns listexport_paddlex_config_to_yaml(): Serializes current config to YAML for inspection/editingclose(): Releases resources (important for proper cleanup)Template Method Pattern:
_paddlex_pipeline_name: Property returning PaddleX registration name (e.g., "OCR", "PP-StructureV3")_get_paddlex_config_overrides(): Builds config overrides from constructor parametersSources: paddleocr/_pipelines/base.py54-110
Individual modules (e.g., text detection, table recognition) use PaddleXPredictorWrapper:
Key Differences from Pipeline Wrapper:
model_name and model_dir for model selection/custom modelsdefault_model_name propertySources: paddleocr/_models/base.py30-83 paddleocr/_models/text_detection.py1-50 paddleocr/_models/text_recognition.py1-50
All pipelines and predictors accept common inference arguments:
| Parameter | Type | Default | Description |
|---|---|---|---|
device | str | Auto-detect | Device specification: cpu, gpu, gpu:0, gpu:0,1, npu, xpu |
enable_hpi | bool | Pipeline-specific | Enable high-performance inference (ONNX backend) |
use_tensorrt | bool | False | Use Paddle Inference TensorRT subgraph engine |
precision | str | "fp32" | TensorRT precision: fp32, fp16 |
enable_mkldnn | bool | True | Use MKL-DNN for CPU acceleration |
mkldnn_cache_capacity | int | 10 | MKL-DNN cache capacity |
cpu_threads | int | 10 | Number of CPU threads for inference |
enable_cinn | bool | False | Use CINN compiler |
Processing Flow:
PaddlePredictorOption with run mode (paddle, trt_fp32, trt_fp16) and optimization settingsConstants: paddleocr/_constants.py1-23
Sources: paddleocr/_common_args.py31-150 paddleocr/_constants.py1-23
PaddleOCR 3.x is built on top of PaddleX, a low-code development framework. This integration provides unified infrastructure for inference deployment while keeping PaddleOCR's OCR-focused API surface clean.
| PaddleOCR Version | PaddleX Version | PaddlePaddle Version |
|---|---|---|
3.0.0 | 3.0.0 | >= 3.0.0 |
3.0.1 | 3.0.1 | >= 3.0.0 |
3.0.2 | 3.0.2 | >= 3.0.0 |
3.0.3 | >= 3.0.3 | >= 3.0.0 |
3.1.x | >= 3.1.0, < 3.2.0 | >= 3.0.0 |
3.2.x | >= 3.2.0, < 3.3.0 | >= 3.0.0 |
3.3.x | >= 3.3.0, < 3.4.0 | >= 3.0.0 |
PaddleOCR pipeline classes map to PaddleX internal pipeline names:
| PaddleOCR Class | PaddleX Pipeline Name |
|---|---|
PaddleOCR | OCR |
PPStructureV3 | PP-StructureV3 |
PPChatOCRv4Doc | PP-ChatOCRv4-doc |
PaddleOCRVL | PaddleOCR-VL |
TableRecognitionV2 | table_recognition_v2 |
FormulaRecognition | formula_recognition |
SealRecognition | seal_recognition |
DocPreprocessor | doc_preprocessor |
DocUnderstanding | doc_understanding |
PPDocTranslation | PP-DocTranslation |
PaddleX uses YAML configuration files for advanced settings. PaddleOCR exposes this through:
Export Configuration:
Load Configuration:
Configuration Merging Logic paddleocr/_pipelines/base.py90-100:
paddlex_config (if any)Sources: docs/version3.x/paddleocr_and_paddlex.en.md1-96 paddleocr/_pipelines/base.py54-110
PaddleOCR 3.x provides multiple entry points tailored to different skill levels and use cases, with clear progression paths from development to production deployment.
Entry Points by User Type:
Complete ML Lifecycle:
| Option | Entry Point | Use Case | Performance | Hardware Support |
|---|---|---|---|---|
| Quick Start | pip install paddleocr | Experimentation, prototyping | Basic | CPU/GPU |
| CLI | paddleocr ocr -i image.png | Scripting, batch processing | Basic-High | CPU/GPU/XPU/NPU |
| Python API | PaddleOCR(enable_hpi=True) | Application integration | High (with HPI) | CPU/GPU/XPU/NPU |
| Service Deployment | Docker Compose + HTTP API | Web services, microservices | High | GPU/CPU/XPU/NPU/MLU/DCU |
| C++ Local | deploy/cpp_infer/ | Embedded systems, standalone apps | High | GPU/CPU |
| On-Device | Paddle Lite | Mobile apps, edge computing | Medium | ARM CPU/GPU |
| MCP Server | paddleocr._deployment.mcp | AI agent integration (Claude Desktop) | Medium | GPU/CPU |
Configuration Options Across Entry Points:
PP-OCRv5_server_rec vs. PP-OCRv5_mobile_rec)lang parameterdevice parameter (cpu, gpu, gpu:0,1, xpu, npu)precision and use_tensorrt parametersSources: tools/train.py1-50 tools/export_model.py1-50 paddleocr/__main__.py1-40 README.md107-146 Diagram 3 (User Journey and Access Patterns), Diagram 4 (Training, Inference, and Deployment Pipeline)
PaddleOCR 3.x supports diverse hardware platforms through PaddlePaddle's hardware abstraction layer and specialized acceleration frameworks. Different pipelines have varying hardware requirements based on their computational characteristics.
Traditional Pipeline (PP-StructureV3) vs VLM Pipeline (PaddleOCR-VL):
Device String Format:
cpu, gpu, gpu:0, npu, xpugpu:0,1,2,3 (parallel inference where supported)Optimization by Platform and Pipeline:
| Hardware | Traditional Pipelines | VLM Pipelines (PaddleOCR-VL) | Configuration |
|---|---|---|---|
| NVIDIA GPU (CC ≥ 7.0) | TensorRT subgraph, CUDA, fp16 | Base inference (CC < 8.0) | device="gpu", use_tensorrt=True |
| NVIDIA GPU (CC ≥ 8.0) | TensorRT subgraph, CUDA, fp16 | vLLM acceleration | device="gpu", acceleration auto-selected |
| NVIDIA GPU (8.0 ≤ CC < 12.0) | TensorRT subgraph, CUDA, fp16 | SGLang acceleration | device="gpu" |
| CPU (x64/ARM) | MKL-DNN (oneDNN), OpenMP threading | Base inference | device="cpu", enable_mkldnn=True |
| Kunlunxin XPU | FastDeploy plugin | FastDeploy plugin | device="xpu" |
| Ascend NPU | FastDeploy plugin | vLLM/SGLang support | device="npu" |
| HYGON DCU | FastDeploy plugin | FastDeploy plugin | device="dcu" |
| MetaX GPU | FastDeploy plugin | FastDeploy plugin | device="metax" |
| Iluvatar GPU | FastDeploy plugin | FastDeploy plugin | device="iluvatar" |
| Apple Silicon | MKL-DNN on CPU | MLX-VLM acceleration | device="cpu" |
Comparison Metrics:
| Aspect | Traditional Pipelines | VLM Pipelines |
|---|---|---|
| Speed | Fast (optimized multi-model) | Variable (benefits from acceleration) |
| Accuracy | High (specialized models) | 94.5% on OmniDocBench |
| Complexity | 4-5 separate models | Single 0.9B model |
| Languages | 37+ (PP-OCRv5) | 111 (PaddleOCR-VL) |
| GPU Requirements | Any NVIDIA (CC ≥ 7.0) | CC ≥ 8.0 for acceleration |
Device Auto-Detection Logic:
device parameterSources: paddleocr/_common_args.py60-94 README.md20-22 README.md53-56 Diagram 6 (PaddleOCR-VL vs Traditional Pipeline Comparison)
PaddleOCR is distributed as a Python package with optional dependency groups:
paddleocr/
├── Core dependencies (always installed)
│ ├── paddlex[ocr-core] (3.3.0)
│ ├── PyYAML
│ └── requests
│
└── Optional dependency groups
├── [all] - Full functionality
├── [doc-parser] - PP-StructureV3, PaddleOCR-VL
├── [ie] - Information extraction (PP-ChatOCRv4)
└── [trans] - Document translation (PP-DocTranslation)
Installation Commands:
Dependency Isolation:
Sources: pyproject.toml1-76 requirements.txt1-18 README.md244-258 docs/version3.x/paddleocr_and_paddlex.en.md23-24
paddleocr/
├── __init__.py # Package exports
├── __main__.py # CLI entry point (console_entry)
├── _constants.py # Default configuration values
├── _common_args.py # Common argument parsing
├── _abstract.py # Abstract base classes
│
├── _pipelines/ # Pipeline implementations
│ ├── base.py # PaddleXPipelineWrapper
│ ├── ocr.py # PaddleOCR (PP-OCRv5)
│ ├── pp_structurev3.py # PPStructureV3
│ ├── pp_chatocrv4_doc.py # PPChatOCRv4Doc
│ ├── paddleocr_vl.py # PaddleOCRVL
│ ├── doc_preprocessor.py # DocPreprocessor
│ └── ...
│
├── _models/ # Individual module implementations
│ ├── base.py # PaddleXPredictorWrapper
│ ├── text_detection.py # TextDetection
│ ├── text_recognition.py # TextRecognition
│ ├── layout_detection.py # LayoutDetection
│ └── ...
│
└── _utils/ # Utility modules
├── cli.py # CLI helpers
└── logging.py # Logging configuration
Key Design Patterns:
PaddleXPipelineWrapper and PaddleXPredictorWrapper provide consistent interface over PaddleXcreate_pipeline() and create_predictor() from PaddleX_paddlex_pipeline_name, _get_paddlex_config_overrides(), etc.Sources: paddleocr/_pipelines/base.py1-136 paddleocr/_models/base.py1-108 paddleocr/__main__.py1-50
PaddleOCR 3.x includes comprehensive testing infrastructure:
GitHub Actions Workflow .github/workflows/tests.yaml1-64:
main and release/* branchespytest, paddlepaddle==3.2.0, paddleocr[all]tests/
├── pipelines/
│ ├── test_pp_structurev3.py # PP-StructureV3 tests
│ ├── test_pp_chatocrv4_doc.py # PP-ChatOCRv4 tests
│ └── ...
│
├── models/
│ ├── test_text_detection.py # TextDetection tests
│ └── ...
│
└── testing_utils.py # Shared test utilities
Test Markers:
@pytest.mark.resource_intensive: Heavy tests (skipped by default)Example Test Pattern:
Sources: .github/workflows/tests.yaml1-64 tests/pipelines/test_pp_structurev3.py1-81 tests/pipelines/test_pp_chatocrv4_doc.py1-81
PaddleOCR 3.x uses MkDocs for documentation with internationalization support:
Configuration: mkdocs.yml1-389
Content Organization:
docs/
├── index.md / index.en.md # Home pages
├── quick_start.md / quick_start.en.md # Getting started
├── version3.x/ # Current version docs
│ ├── installation.md
│ ├── pipeline_usage/ # Pipeline tutorials
│ ├── module_usage/ # Module reference
│ ├── deployment/ # Deployment guides
│ └── ...
├── version2.x/ # Legacy docs
└── update/ # Version upgrade guides
Navigation Structure mkdocs.yml290-389:
Sources: mkdocs.yml1-389 docs/index.en.md1-88 docs/quick_start.en.md1-197
This overview provides a comprehensive introduction to PaddleOCR 3.x architecture, capabilities, and design patterns. For specific implementation details, refer to the section-specific pages linked throughout this document.
Refresh this wiki
This wiki was recently refreshed. Please wait 7 days to refresh again.