如何用Python提取图片文字 | PIL识别图片文字教程

2026年2月12日 15:22 • 云服务器 • 阅读 2

要使用 Python 识别图片中的文字，推荐使用 Pillow (PIL) 配合 Tesseract OCR 引擎（通过 pytesseract 库）,以下是详细步骤：

步骤 1：安装依赖

安装 Tesseract OCR（核心引擎）：
- Windows：下载安装包 UB-Mannheim/tesseract
- MacOS：brew install tesseract
- Linux (Debian/Ubuntu)：sudo apt install tesseract-ocr
- 语言包（如需要中文）：
  sudo apt install tesseract-ocr-chi-sim（简体中文）
  sudo apt install tesseract-ocr-chi-tra（繁体中文）
安装 Python 库：
```
pip install pillow pytesseract
```

步骤 2：Python 示例代码

from PIL import Image
import pytesseract
# 设置 Tesseract 路径（Windows 需要指定安装路径）
# pytesseract.pytesseract.tesseract_cmd = r'C:Program FilesTesseract-OCRtesseract.exe'
# 打开图片
image = Image.open('your_image.jpg')  # 替换为你的图片路径
# 识别文字（默认英文）
text = pytesseract.image_to_string(image)
# 识别中文（简体）
# text = pytesseract.image_to_string(image, lang='chi_sim')
print("识别结果：")
print(text)

常见问题解决

中文识别不准确：
- 确保安装了中文语言包（如 tesseract-ocr-chi-sim）。
- 使用 lang='chi_sim' 参数。
- 优化图片质量（清晰、无反光、正对拍摄）。
报错 tesseract is not installed：
- 检查 tesseract 是否在系统路径中。
- Windows 需手动设置路径（取消注释代码中的 tesseract_cmd）。
提高识别精度：
- 预处理图片：转为灰度、二值化、降噪。
- 调整图片：使用图像处理库（如 OpenCV）增强对比度。
- 指定区域识别：通过 image.crop((x, y, width, height)) 裁剪局部区域。

预处理增强示例

from PIL import Image, ImageFilter
# 打开图片并预处理
image = Image.open('your_image.jpg')
image = image.convert('L')  # 转为灰度
image = image.filter(ImageFilter.SHARPEN)  # 锐化
image = image.point(lambda x: 0 if x < 140 else 255)  # 二值化
# 识别文字
text = pytesseract.image_to_string(image, lang='chi_sim')
print(text)

替代方案：第三方 API

如果本地识别效果不佳，可使用在线 OCR API：

百度 OCR：高精度中文识别（有免费额度）
Google Vision：英文识别效果好
腾讯 OCR：支持多语言

提示：在线 API 需处理网络请求和隐私问题。

图片来源于AI模型，如侵权请联系管理员。作者：酷小编，如若转载，请注明出处：https://www.kufanyun.com/ask/294241.html

如何用Python提取图片文字 | PIL识别图片文字教程

步骤 1：安装依赖

步骤 2：Python 示例代码

常见问题解决

预处理增强示例

替代方案：第三方 API

相关推荐

初次使用虚拟主机，主控面板密码应该在哪里设置？

PostgreSQL主从复制配置与优化，如何解决同步延迟与数据不一致问题？

服务器间歇性无响应是什么原因？如何排查解决？

新手玩PS4时，网络设定常见问题及解决方法是什么？

如何通过post抓包获取短信校验码？流程、问题与解决全解析？

发表回复