type
status
date
slug
summary
tags
category
icon
password
URL
Rating
 
[English] | [中文版]
 
Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, mathematical formulas, and integrate all of these contents into Markdown format. P2T can also convert an entire PDF file (which can contain scanned images or any other format) into Markdown format. The text recognition engine of Pix2Text supports 80+ languages, including English, Simplified Chinese, Traditional Chinese, Vietnamese, etc.
Pix2Text (P2T) integrates the following models:
  • Text Recognition Engine: Supports 80+ languages such as English, Simplified Chinese, Traditional Chinese, Vietnamese, etc. For English and Simplified Chinese recognition, it uses the open-source OCR tool CnOCR, while for other languages, it uses the open-source OCR tool EasyOCR.
  • Mathematical Formula Detection Model (MFD): Mathematical formula detection model (MFD) from CnSTD.
Several models are contributed by other open-source authors, and their contributions are highly appreciated.
notion image
 
For detailed explanations, please refer to the Models.

Online Service

 
Everyone can use the P2T Online Service for free, with a daily limit of 10,000 characters per account, which should be sufficient for normal use. Please refrain from bulk API calls, as machine resources are limited, and this could prevent others from accessing the service.
 
Due to hardware constraints, the Online Service currently only supports Simplified Chinese and English languages. To try the models in other languages, please use the following Online Demo.

Demo 🤗

 
You can also try the Online Demo to see the performance of P2T in various languages. However, the online demo operates on lower hardware specifications and may be slower. For Simplified Chinese or English images, it is recommended to use the P2T Online Service.

Documentation

<ins/>

Available Models

P2T includes two kinds of models: Math Formula Detection (MFD) and Math Formula Recognition (MFR). For details, see the project description. By default, P2T uses free open-source models and will automatically download them when in use. Besides the free models, I will continue to optimize the models. The latest models require purchase for downloading and usage. If you are not deploying locally, it's recommended to directly use the P2T Online Service, as the Online Service always utilizes the most recent models.
 
The current models (compatible with Pix2Text V1.1 and V1.0) used in the Online Service are:
  • MFD: version-20230613
  • MFR: version-1.0 (updated: 2024-02-26)
The paid models used in the Online Service perform better than the open-source models. If you need to deploy the P2T service on your own, it's advisable to purchase the same models used in the Online Service.
 
To thank our Planet Members for their support, all models (only for personal use) are available at a 20% discount for Planet Members. To purchase, add the assistant as a friend, and after arranging payment, the assistant will provide the model files directly. Note: No discounts are offered for the enterprise versions.
 
Things to note before purchasing:
📌
Make sure you've successfully run Pix2Text using the open-source models. Otherwise, after downloading the paid models, you might encounter problems getting them to work. Detailed installation and usage instructions can be found in the Pix2Text project documentation. If you face any issues, feel free to comment here or join the group chat to communicate with me. However, please note that helping you to get the code running is not within the services provided by the Planet host (refer to Planet Description).
📌
For personal use, please follow the column “Individual Purchase” of the tables; For business or commercial use, please follow the column “Commercial Purchase” of the tables, or contact the author (Email: breezedeus AT gmail.com).

Purchasing the Math Formula Detection (MFD) models

If you wish to purchase only the MFD model, please use the following link to make your purchase. However, if you also need to buy the MFR model, please visit the Model Store and buy both models together; there is no need to buy the MFD model separately in this case.
 
Available MFR models are listed in the table below. For detailed descriptions, see Pix2Text’s New YoloV7 MFD Model.
Model Version
Commercial Purchase
Individual Purchase
For Planet Members
Free Download
YoloV7_Tiny Open-source Model
✖️
✖️
✔️
✔️
version-20230208
✖️
✔️ Bilibili
✔️ Free
✖️
version-20230613
✔️ 20% off
✖️
 
Instructions after purchase can be found here.
📌
These models are compatible with both Pix2Text V1.1 and V1.0.

Purchasing the Math Formula Recognition (MFR) models

If you are a personal user, you can purchase the [Personal Use Only] model from Lemon Squeezy. The [Personal Use Only] model is for personal use only and cannot be used for commercial purposes. This product only includes the ONNX version of the model and does not include the PyTorch version. For commercial use by enterprises or invoices, please refer to the following instructions.
 
Pix2Text V1.1/V1.0 includes two types of enterprise editions. The differences of both are shown in the figure below. Enterprise Basic Edition is a one-time purchase; new models require a separate purchase. Enterprise Basic Edition is allowed only for internal corporate use or for providing free services externally (such as educational institutions), and cannot be used for offering paid services. Enterprise Pro Subscription Edition is an annual subscription, allowing free access to all new models during the subscription period. Enterprise Pro Subscription Edition also offers a PyTorch version of the model, enabling the enterprises to fine-tune it using their own data. Enterprise Pro Subscription Edition permits the provision of paid services.
For more detailed information, please visit the Model Store (specific details are available on the product detail pages).
notion image
 
Purchase links can be found at: Model Store (specific details are available on the product detail pages).
📌
These models are compatible with both Pix2Text V1.1 and V1.0.

Usage Instructions After Purchase

After purchasing the Enterprise Basic Edition through the Model Store, you can download two compressed files related to the models. The file starting with p2t-mfd- is the MFD (Math Formula Detection) model, and the one starting with p2t-mfr- is the MFR (Math Formula Recognition) model. After unzipping the MFD model file, you will find a folder named yolov7-model containing the model file, for example, mfd-yolov7-20230613.pt. Suppose the path to the file p2t-mfr-20230702.pth is abc/def/yolov7-model/p2t-mfr-20230702.pth. After unzipping the MFR model file, you will find a folder named mfr-pro-onnx, which includes the model file and related configuration files. Assume the path to the mfr-pro-onnx folder is abc/def/mfr-pro-onnx.
 
If you are using P2T V1.0, please refer to: Pix2Text V1.0 New Release: The Best Open-Source Formula Recognition Model .
When initializing Pix2Text, pass the parameters as follows. The usage after initialization is the same as the open-source model, and the structure of the detection and recognition results is also the same.
 
If you purchase the Enterprise Pro Subscription Edition, you will have access to more model files (currently 5), including the PyTorch version of MFR and the latest paid model of CnOCR (text OCR) (both ONNX and PyTorch versions), which has better recognition performance for English and Simplified Chinese text. Use the following method to input the corresponding model.
📌
The CnOCR text model only supports English and Simplified Chinese. If you need to recognize text in other languages, do not use the CnOCR model. Simply remove the text_config from the code above.
 
<ins/>

Code Repo

 
📌
P2T uses CnOCR or EasyOCR to recognize the text part in images. For more information on CnOCR, refer to this link.
 
P2T详细资料你真的读懂了Youtube DNN推荐论文吗?
Loading...