Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.0.0
metadata
title: Doc2Page - Document to Webpage Converter
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.47.2
app_file: app.py
pinned: false
license: apache-2.0
short_description: Convert docs to webpages using PaddleOCR and ERNIE
πβ‘οΈπ Doc2Page - Document to Webpage Converter
Convert your PDF documents or images into beautiful, responsive HTML webpages!
β¨ Features
- π Smart OCR: Extract text from PDFs and images using PaddleOCR
- π€ AI Enhancement: Transform content into well-structured HTML using ERNIE
- π¨ Beautiful Output: Generate responsive, styled webpages with modern CSS
- π Easy Deployment: Optional one-click deployment to GitHub Pages
- π± Mobile Friendly: Responsive design that works on all devices
π§ How It Works
- Upload: Drop your PDF or image file
- Extract: PaddleOCR extracts text and structure
- Transform: ERNIE converts to beautiful HTML
- Deploy: Optionally publish to GitHub Pages
π Supported Formats
- PDFs:
.pdf - Images:
.png,.jpg,.jpeg,.bmp,.tiff
π Quick Start
- Upload a document using the file picker
- Click "Convert to Webpage"
- Preview your generated webpage
- Download the HTML file
- Optionally deploy to GitHub Pages
βοΈ Configuration
Setup using .env file:
- Copy the example environment file:
cp .env.example .env
- Edit the
.envfile with your credentials:
# Required API Configuration for PP-StructureV3
API_URL=your_pp_structurev3_api_url
API_TOKEN=your_api_token
# Optional ERNIE API Configuration for enhanced HTML generation
ERNIE_CLIENT_ID=your_client_id_here
ERNIE_CLIENT_SECRET=your_client_secret_here
Note: The .env file is automatically loaded when the application starts. Without ERNIE credentials, the app will use a high-quality fallback HTML generator.
ποΈ Technical Stack
- Frontend: Gradio for the web interface
- OCR Engine: PP-StructureV3 API (PaddlePaddle)
- AI Processing: ERNIE 4.5-X1.1-Preview (optional)
- Image Processing: Pillow
π Example Use Cases
- Convert research papers to web format
- Digitize scanned documents
- Create web-friendly versions of presentations
- Transform printed materials to responsive websites
- Archive documents in searchable HTML format
π License
This project is licensed under the Apache 2.0 License.