--- title: Doc2Page - Document to Webpage Converter emoji: 🏄 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.47.2 app_file: app.py pinned: false license: apache-2.0 short_description: Convert docs to webpages using PaddleOCR and ERNIE --- # 📄➡️🌐 Doc2Page - Document to Webpage Converter Convert your PDF documents or images into beautiful, responsive HTML webpages! ## ✨ Features - 📖 **Smart OCR**: Extract text from PDFs and images using PaddleOCR - 🤖 **AI Enhancement**: Transform content into well-structured HTML using ERNIE - 🎨 **Beautiful Output**: Generate responsive, styled webpages with modern CSS - 🚀 **Easy Deployment**: Optional one-click deployment to GitHub Pages - 📱 **Mobile Friendly**: Responsive design that works on all devices ## 🔧 How It Works 1. **Upload**: Drop your PDF or image file 2. **Extract**: PaddleOCR extracts text and structure 3. **Transform**: ERNIE converts to beautiful HTML 4. **Deploy**: Optionally publish to GitHub Pages ## 📁 Supported Formats - **PDFs**: `.pdf` - **Images**: `.png`, `.jpg`, `.jpeg`, `.bmp`, `.tiff` ## 🚀 Quick Start 1. Upload a document using the file picker 2. Click "Convert to Webpage" 3. Preview your generated webpage 4. Download the HTML file 5. Optionally deploy to GitHub Pages ## ⚙️ Configuration **Setup using .env file:** 1. Copy the example environment file: ```bash cp .env.example .env ``` 2. Edit the `.env` file with your credentials: ```bash # Required API Configuration for PP-StructureV3 API_URL=your_pp_structurev3_api_url API_TOKEN=your_api_token # Optional ERNIE API Configuration for enhanced HTML generation ERNIE_CLIENT_ID=your_client_id_here ERNIE_CLIENT_SECRET=your_client_secret_here ``` **Note:** The `.env` file is automatically loaded when the application starts. Without ERNIE credentials, the app will use a high-quality fallback HTML generator. ## 🏗️ Technical Stack - **Frontend**: Gradio for the web interface - **OCR Engine**: PP-StructureV3 API (PaddlePaddle) - **AI Processing**: ERNIE 4.5-X1.1-Preview (optional) - **Image Processing**: Pillow ## 📝 Example Use Cases - Convert research papers to web format - Digitize scanned documents - Create web-friendly versions of presentations - Transform printed materials to responsive websites - Archive documents in searchable HTML format ## 📄 License This project is licensed under the Apache 2.0 License.