Spaces:
Sleeping
Sleeping
Оновлено метадані проекту MarkItDown, включаючи нову інформацію про версії SDK та Python. Додано інструкції для розгортання на Hugging Face Spaces, що включають налаштування секретів та змінних середовища. Змінено залежності у requirements.txt для відповідності новим версіям бібліотек.
Browse files- README.md +41 -7
- requirements.txt +2 -2
- spaces_metadata.yaml +3 -3
README.md
CHANGED
|
@@ -1,8 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# 🚀 MarkItDown Testing Platform
|
| 2 |
|
| 3 |
**Enterprise-Grade Document Conversion Testing with AI-Powered Analysis**
|
| 4 |
|
| 5 |
-
[](https://huggingface.co/spaces/
|
| 6 |
[](https://www.python.org/downloads/)
|
| 7 |
[](https://opensource.org/licenses/MIT)
|
| 8 |
|
|
@@ -23,7 +35,7 @@ A comprehensive testing platform for Microsoft's MarkItDown document conversion
|
|
| 23 |
|
| 24 |
### Using the Hugging Face Space
|
| 25 |
|
| 26 |
-
1. **Visit the Space**: [MarkItDown Testing Platform](https://huggingface.co/spaces/
|
| 27 |
2. **Upload Document**: Drag & drop or select your document
|
| 28 |
3. **Configure Analysis**: Enter Gemini API key for AI analysis (optional)
|
| 29 |
4. **Process**: Click "Process Document" and review results
|
|
@@ -78,11 +90,11 @@ A comprehensive testing platform for Microsoft's MarkItDown document conversion
|
|
| 78 |
|
| 79 |
### Key Dependencies
|
| 80 |
```python
|
| 81 |
-
gradio>=4.
|
| 82 |
-
markitdown[all]>=0.1.0
|
| 83 |
-
google-genai>=
|
| 84 |
-
plotly>=5.17.0
|
| 85 |
-
pandas>=1.5.0
|
| 86 |
```
|
| 87 |
|
| 88 |
## 📊 Analysis Capabilities
|
|
@@ -164,6 +176,28 @@ export MAX_FILE_SIZE="52428800" # 50MB in bytes
|
|
| 164 |
export PROCESSING_TIMEOUT="300" # 5 minutes
|
| 165 |
```
|
| 166 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
## 📚 API Reference
|
| 168 |
|
| 169 |
### Core Processing Pipeline
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: MarkItDownTestingPlatform
|
| 3 |
+
emoji: 📊
|
| 4 |
+
colorFrom: pink
|
| 5 |
+
colorTo: gray
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 4.44.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
short_description: Enterprise-Grade Document Conversion Testing with AI-Powered
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
# 🚀 MarkItDown Testing Platform
|
| 14 |
|
| 15 |
**Enterprise-Grade Document Conversion Testing with AI-Powered Analysis**
|
| 16 |
|
| 17 |
+
[](https://huggingface.co/spaces/DocSA/MarkItDownTestingPlatform)
|
| 18 |
[](https://www.python.org/downloads/)
|
| 19 |
[](https://opensource.org/licenses/MIT)
|
| 20 |
|
|
|
|
| 35 |
|
| 36 |
### Using the Hugging Face Space
|
| 37 |
|
| 38 |
+
1. **Visit the Space**: [MarkItDown Testing Platform](https://huggingface.co/spaces/DocSA/MarkItDownTestingPlatform)
|
| 39 |
2. **Upload Document**: Drag & drop or select your document
|
| 40 |
3. **Configure Analysis**: Enter Gemini API key for AI analysis (optional)
|
| 41 |
4. **Process**: Click "Process Document" and review results
|
|
|
|
| 90 |
|
| 91 |
### Key Dependencies
|
| 92 |
```python
|
| 93 |
+
gradio>=4.44.0 # Gradio interface (HF Spaces compatible)
|
| 94 |
+
markitdown[all]>=0.1.0 # Microsoft conversion engine
|
| 95 |
+
google-genai>=1.0.0 # Gemini integration (new client)
|
| 96 |
+
plotly>=5.17.0 # Interactive visualizations
|
| 97 |
+
pandas>=1.5.0 # Data processing
|
| 98 |
```
|
| 99 |
|
| 100 |
## 📊 Analysis Capabilities
|
|
|
|
| 176 |
export PROCESSING_TIMEOUT="300" # 5 minutes
|
| 177 |
```
|
| 178 |
|
| 179 |
+
### Deploying to Hugging Face Spaces
|
| 180 |
+
|
| 181 |
+
1. **Створіть Space**
|
| 182 |
+
- Відкрийте [huggingface.co/spaces/new](https://huggingface.co/spaces/new)
|
| 183 |
+
- Оберіть SDK **Gradio**, назву `DocSA/MarkItDownTestingPlatform`, runtime **Python 3.11**
|
| 184 |
+
- `app_file` має залишатися `app.py`
|
| 185 |
+
|
| 186 |
+
2. **Запуште код**
|
| 187 |
+
```bash
|
| 188 |
+
git remote add hf https://huggingface.co/spaces/DocSA/MarkItDownTestingPlatform
|
| 189 |
+
git push hf main
|
| 190 |
+
```
|
| 191 |
+
|
| 192 |
+
3. **Налаштуйте секрети та змінні середовища**
|
| 193 |
+
- Додайте секрет `GEMINI_API_KEY` (Settings → Repository secrets → Add)
|
| 194 |
+
- Додаткові змінні (не секретні): `MAX_FILE_SIZE_MB=50`, `PROCESSING_TIMEOUT=300`, `APP_VERSION=2.0.0-enterprise`
|
| 195 |
+
|
| 196 |
+
4. **Особливості рантайму**
|
| 197 |
+
- Gemini-аналіз вимкнений за замовчуванням; користувач активує його вручну
|
| 198 |
+
- Стандартні налаштування: тип аналізу **Content Summary**, модель **Gemini 2.0 Flash**
|
| 199 |
+
- Обмеження квот Gemini обробляються автоматичними fallback-моделями
|
| 200 |
+
|
| 201 |
## 📚 API Reference
|
| 202 |
|
| 203 |
### Core Processing Pipeline
|
requirements.txt
CHANGED
|
@@ -2,11 +2,11 @@
|
|
| 2 |
# Strategic dependency selection for enterprise-grade reliability
|
| 3 |
|
| 4 |
# Core Framework Dependencies
|
| 5 |
-
gradio>=4.
|
| 6 |
markitdown[all]>=0.1.0 # Microsoft's document conversion engine
|
| 7 |
|
| 8 |
# LLM Integration - Gemini Focus
|
| 9 |
-
google-genai>=
|
| 10 |
google-auth>=2.0.0 # Authentication for Google services
|
| 11 |
|
| 12 |
# Data Processing & Visualization
|
|
|
|
| 2 |
# Strategic dependency selection for enterprise-grade reliability
|
| 3 |
|
| 4 |
# Core Framework Dependencies
|
| 5 |
+
gradio>=4.44.0,<5.0.0 # UI framework - aligned with production deployment
|
| 6 |
markitdown[all]>=0.1.0 # Microsoft's document conversion engine
|
| 7 |
|
| 8 |
# LLM Integration - Gemini Focus
|
| 9 |
+
google-genai>=1.0.0 # Google Gemini API client (latest)
|
| 10 |
google-auth>=2.0.0 # Authentication for Google services
|
| 11 |
|
| 12 |
# Data Processing & Visualization
|
spaces_metadata.yaml
CHANGED
|
@@ -6,9 +6,9 @@ emoji: "🚀"
|
|
| 6 |
colorFrom: "blue"
|
| 7 |
colorTo: "purple"
|
| 8 |
sdk: "gradio"
|
| 9 |
-
sdk_version: "4.
|
| 10 |
app_file: "app.py"
|
| 11 |
-
python_version: "3.
|
| 12 |
|
| 13 |
# Space configuration
|
| 14 |
models:
|
|
@@ -74,4 +74,4 @@ custom:
|
|
| 74 |
max_file_size: "50MB (HF Spaces free tier)"
|
| 75 |
processing_timeout: "5 minutes"
|
| 76 |
memory_optimization: "Stateless architecture with automatic cleanup"
|
| 77 |
-
concurrent_processing: "Async pipeline with resource management"
|
|
|
|
| 6 |
colorFrom: "blue"
|
| 7 |
colorTo: "purple"
|
| 8 |
sdk: "gradio"
|
| 9 |
+
sdk_version: "4.44.1"
|
| 10 |
app_file: "app.py"
|
| 11 |
+
python_version: "3.11"
|
| 12 |
|
| 13 |
# Space configuration
|
| 14 |
models:
|
|
|
|
| 74 |
max_file_size: "50MB (HF Spaces free tier)"
|
| 75 |
processing_timeout: "5 minutes"
|
| 76 |
memory_optimization: "Stateless architecture with automatic cleanup"
|
| 77 |
+
concurrent_processing: "Async pipeline with resource management"
|