Spaces:
Running
Running
File size: 2,443 Bytes
da50597 1f4004d da50597 1f4004d da50597 1f4004d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
title: Doc2Page - Document to Webpage Converter
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.47.2
app_file: app.py
pinned: false
license: apache-2.0
short_description: Convert docs to webpages using PaddleOCR and ERNIE
---
# πβ‘οΈπ Doc2Page - Document to Webpage Converter
Convert your PDF documents or images into beautiful, responsive HTML webpages!
## β¨ Features
- π **Smart OCR**: Extract text from PDFs and images using PaddleOCR
- π€ **AI Enhancement**: Transform content into well-structured HTML using ERNIE
- π¨ **Beautiful Output**: Generate responsive, styled webpages with modern CSS
- π **Easy Deployment**: Optional one-click deployment to GitHub Pages
- π± **Mobile Friendly**: Responsive design that works on all devices
## π§ How It Works
1. **Upload**: Drop your PDF or image file
2. **Extract**: PaddleOCR extracts text and structure
3. **Transform**: ERNIE converts to beautiful HTML
4. **Deploy**: Optionally publish to GitHub Pages
## π Supported Formats
- **PDFs**: `.pdf`
- **Images**: `.png`, `.jpg`, `.jpeg`, `.bmp`, `.tiff`
## π Quick Start
1. Upload a document using the file picker
2. Click "Convert to Webpage"
3. Preview your generated webpage
4. Download the HTML file
5. Optionally deploy to GitHub Pages
## βοΈ Configuration
**Setup using .env file:**
1. Copy the example environment file:
```bash
cp .env.example .env
```
2. Edit the `.env` file with your credentials:
```bash
# Required API Configuration for PP-StructureV3
API_URL=your_pp_structurev3_api_url
API_TOKEN=your_api_token
# Optional ERNIE API Configuration for enhanced HTML generation
ERNIE_CLIENT_ID=your_client_id_here
ERNIE_CLIENT_SECRET=your_client_secret_here
```
**Note:** The `.env` file is automatically loaded when the application starts. Without ERNIE credentials, the app will use a high-quality fallback HTML generator.
## ποΈ Technical Stack
- **Frontend**: Gradio for the web interface
- **OCR Engine**: PP-StructureV3 API (PaddlePaddle)
- **AI Processing**: ERNIE 4.5-X1.1-Preview (optional)
- **Image Processing**: Pillow
## π Example Use Cases
- Convert research papers to web format
- Digitize scanned documents
- Create web-friendly versions of presentations
- Transform printed materials to responsive websites
- Archive documents in searchable HTML format
## π License
This project is licensed under the Apache 2.0 License.
|