File size: 12,109 Bytes
963ae98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
# Unified AI Services

A comprehensive AI platform that integrates Named Entity Recognition (NER), Optical Character Recognition (OCR), and Retrieval-Augmented Generation (RAG) services into a unified application.

## 🌟 Features

### Core Services
- **NER Service** (Port 8500): Advanced named entity recognition with relationship extraction
- **OCR Service** (Port 8400): Document processing with Azure Document Intelligence
- **RAG Service** (Port 8401): Vector search and document retrieval
- **Unified App** (Port 8000): Coordinated workflows and service management

### Key Capabilities
- βœ… Multi-language support (Thai + English)
- βœ… Complex relationship extraction 
- βœ… Entity deduplication
- βœ… Graph database exports (Neo4j, GraphML, GEXF)
- βœ… Vector search with semantic similarity
- βœ… Document processing (PDF, images, text)
- βœ… Real-time service health monitoring
- βœ… Unified workflows combining all services
- βœ… Comprehensive API documentation

## πŸš€ Quick Start

### Prerequisites
- Python 3.8 or higher
- PostgreSQL with vector extension support
- Azure OpenAI account
- Azure Document Intelligence account
- DeepSeek API account (for advanced NER)

### Automated Setup

1. **Clone and navigate to the project directory**
   ```bash

   cd unified-ai-services

   ```

2. **Run the automated setup**
   ```bash

   python setup.py

   ```
   
   This will:
   - Check your Python environment
   - Create necessary directories
   - Help you configure .env file
   - Install dependencies
   - Validate configuration
   - Create startup scripts

3. **Start the unified application**
   ```bash

   python app.py

   ```
   
   Or use the generated scripts:
   - Windows: `start_services.bat`
   - Unix/Linux/Mac: `./start_services.sh`

4. **Run comprehensive tests**
   ```bash

   python test_unified.py

   ```
   
   Or use the generated scripts:
   - Windows: `run_tests.bat`
   - Unix/Linux/Mac: `./run_tests.sh`

### Manual Setup

If you prefer manual setup:

1. **Install dependencies**
   ```bash

   pip install -r requirements.txt

   ```

2. **Create .env file** (copy from .env.example)
   ```bash

   cp .env.example .env

   # Edit .env with your configuration

   ```

3. **Set up directories**
   ```bash

   mkdir -p services exports logs temp tests data

   ```

4. **Place service files in the services directory**
   ```

   services/

   β”œβ”€β”€ ner_service.py

   β”œβ”€β”€ ocr_service.py

   └── rag_service.py

   ```

## πŸ“ Project Structure

```

unified-ai-services/

β”œβ”€β”€ app.py                    # Main unified application

β”œβ”€β”€ configs.py               # Centralized configuration

β”œβ”€β”€ setup.py                 # Automated setup script

β”œβ”€β”€ requirements.txt         # Python dependencies

β”œβ”€β”€ test_unified.py          # Comprehensive test suite

β”œβ”€β”€ .env                     # Environment configuration

β”œβ”€β”€ services/                # Individual service files

β”‚   β”œβ”€β”€ ner_service.py      # NER service implementation

β”‚   β”œβ”€β”€ ocr_service.py      # OCR service implementation

β”‚   └── rag_service.py      # RAG service implementation

β”œβ”€β”€ exports/                 # Generated export files

β”œβ”€β”€ logs/                    # Application logs

β”œβ”€β”€ temp/                    # Temporary files

β”œβ”€β”€ tests/                   # Additional test files

└── data/                    # Data files

```

## βš™οΈ Configuration

### Environment Variables

The system uses a `.env` file for configuration. Key variables include:

#### Server Configuration
```bash

HOST=0.0.0.0

DEBUG=True

MAIN_PORT=8000

NER_PORT=8500

OCR_PORT=8400

RAG_PORT=8401

```

#### Database Configuration
```bash

POSTGRES_HOST=your-postgres-server.com

POSTGRES_PORT=5432

POSTGRES_USER=your-username

POSTGRES_PASSWORD=your-password

POSTGRES_DATABASE=postgres

```

#### Azure OpenAI Configuration
```bash

AZURE_OPENAI_ENDPOINT=https://your-openai.openai.azure.com/

AZURE_OPENAI_API_KEY=your-api-key

EMBEDDING_MODEL=text-embedding-3-large

```

#### DeepSeek Configuration
```bash

DEEPSEEK_ENDPOINT=https://your-deepseek-endpoint/

DEEPSEEK_API_KEY=your-deepseek-key

DEEPSEEK_MODEL=DeepSeek-R1-0528

```

#### Azure Document Intelligence Configuration
```bash

AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-di.cognitiveservices.azure.com/

AZURE_DOCUMENT_INTELLIGENCE_KEY=your-di-key

```

#### Azure Storage Configuration
```bash

AZURE_STORAGE_ACCOUNT_URL=https://yourstorage.blob.core.windows.net/

AZURE_BLOB_SAS_TOKEN=your-sas-token

BLOB_CONTAINER=historylog

```

## πŸ”§ API Documentation

Once running, access the interactive API documentation:
- **Unified API**: http://localhost:8000/docs
- **NER Service**: http://localhost:8500/docs
- **OCR Service**: http://localhost:8400/docs
- **RAG Service**: http://localhost:8401/docs

## 🎯 API Usage Examples

### 1. Unified Analysis (Text + RAG Indexing)

```python

import httpx



async def unified_analysis():

    data = {

        "text": "Your text content here...",

        "extract_relationships": True,

        "include_embeddings": False,

        "generate_graph_files": True,

        "export_formats": ["neo4j", "json"],

        "enable_rag_indexing": True,

        "rag_title": "My Document",

        "rag_keywords": ["keyword1", "keyword2"]

    }

    

    async with httpx.AsyncClient() as client:

        response = await client.post("http://localhost:8000/analyze/unified", json=data)

        return response.json()

```

### 2. Combined Search with NER Analysis

```python

async def combined_search():

    data = {

        "query": "search query here",

        "limit": 10,

        "similarity_threshold": 0.2,

        "include_ner_analysis": True

    }

    

    async with httpx.AsyncClient() as client:

        response = await client.post("http://localhost:8000/search/combined", json=data)

        return response.json()

```

### 3. File Upload Analysis

```python

async def analyze_file():

    files = {"file": ("document.pdf", open("document.pdf", "rb"), "application/pdf")}

    data = {

        "extract_relationships": "true",

        "generate_graph_files": "true",

        "export_formats": "neo4j,json"

    }

    

    async with httpx.AsyncClient() as client:

        response = await client.post("http://localhost:8000/ner/analyze/file", files=files, data=data)

        return response.json()

```

## πŸ§ͺ Testing

### Comprehensive Test Suite

The project includes comprehensive tests covering:
- βœ… Service health checks
- βœ… Individual service functionality
- βœ… Unified workflow testing
- βœ… Service proxy functionality
- βœ… Error handling and resilience
- βœ… Performance testing
- βœ… File upload/download testing

Run tests with:
```bash

python test_unified.py

```

### Individual Service Tests

Test individual services:
```bash

# Test NER service

python test_ner.py



# Test RAG service  

python test_rag.py

```

### Quick Health Check

```bash

curl http://localhost:8000/health

```

## πŸ” Monitoring and Health Checks

### Health Endpoints
- **Unified System**: `GET /health`
- **Individual Services**: `GET /ner/health`, `GET /ocr/health`, `GET /rag/health`
- **Detailed Status**: `GET /status`
- **Service Discovery**: `GET /services`

### Monitoring Features
- Real-time service health monitoring
- Response time tracking
- Service uptime monitoring
- Error rate tracking
- Resource usage monitoring

## πŸ“Š Service Architecture

```mermaid

graph TB

    Client[Client Applications]

    

    subgraph "Unified AI Services (Port 8000)"

        UA[Unified App]

        Proxy[Service Proxies]

        Health[Health Monitor]

    end

    

    subgraph "Core Services"

        NER[NER Service<br/>Port 8500]

        OCR[OCR Service<br/>Port 8400]

        RAG[RAG Service<br/>Port 8401]

    end

    

    subgraph "External Services"

        Azure[Azure Services]

        DeepSeek[DeepSeek API]

        DB[(PostgreSQL)]

    end

    

    Client --> UA

    UA --> Proxy

    Proxy --> NER

    Proxy --> OCR

    Proxy --> RAG

    

    NER --> Azure

    NER --> DeepSeek

    NER --> DB

    

    OCR --> Azure

    

    RAG --> Azure

    RAG --> DB

    RAG --> OCR

```

## πŸ› οΈ Development

### Adding New Features

1. **Service Modifications**: Update individual service files in `services/`
2. **Unified Workflows**: Modify `app.py` for new combined workflows
3. **Configuration**: Update `configs.py` for new settings
4. **Tests**: Add tests to `test_unified.py`

### Debugging

1. **Check Service Logs**: Services log to console
2. **Health Checks**: Use `/health` endpoints
3. **Configuration**: Run `python configs.py` to validate
4. **Database**: Check PostgreSQL connectivity
5. **Azure Services**: Verify API keys and endpoints

### Service Management

Start individual services for development:
```bash

# Start NER service only

cd services && python ner_service.py



# Start OCR service only  

cd services && python ocr_service.py



# Start RAG service only

cd services && python rag_service.py

```

## 🚨 Troubleshooting

### Common Issues

#### 1. Services Won't Start
- Check port availability: `netstat -an | grep :8000`
- Verify Python dependencies: `pip list`
- Check .env configuration: `python configs.py`

#### 2. Database Connection Issues
- Verify PostgreSQL is running
- Check connection string in .env
- Test connectivity: `python -c "import asyncpg; asyncio.run(asyncpg.connect('your-connection-string'))"`

#### 3. Azure Service Issues
- Verify API keys and endpoints
- Check Azure service status
- Review rate limits and quotas

#### 4. Performance Issues
- Monitor resource usage: `top` or Task Manager
- Check database performance
- Review log files for errors

### Error Codes

- **500**: Internal service error
- **503**: Service unavailable
- **400**: Bad request (check input data)
- **422**: Validation error
- **404**: Endpoint not found

## πŸ“ˆ Performance Optimization

### Recommended Settings

#### Production Configuration
```bash

DEBUG=False

MAX_FILE_SIZE=50

REQUEST_TIMEOUT=300

CHUNK_SIZE=1000

CHUNK_OVERLAP=200

```

#### Database Optimization
- Use connection pooling
- Configure appropriate indexes
- Monitor query performance
- Regular maintenance

#### Service Optimization
- Enable caching where appropriate
- Use async operations
- Optimize batch processing
- Monitor memory usage

## πŸ” Security Considerations

### API Security
- Implement authentication/authorization as needed
- Use HTTPS in production
- Validate all input data
- Rate limiting

### Data Security
- Secure database connections (SSL)
- Encrypt sensitive data
- Regular security updates
- Monitor access logs

### Azure Security
- Rotate API keys regularly
- Use managed identities where possible
- Monitor usage and costs
- Follow Azure security best practices

## πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Run the test suite
6. Submit a pull request

## πŸ“ž Support

For support and questions:
1. Check this README for common issues
2. Review the test suite for usage examples
3. Check service logs for error details
4. Verify configuration with `python configs.py`

## 🎯 Roadmap

### Current Version (1.0.0)
- βœ… Unified service integration
- βœ… Comprehensive testing
- βœ… Multi-language support
- βœ… Graph database exports

### Future Enhancements
- πŸ”„ Advanced caching mechanisms
- πŸ”„ Enhanced monitoring and analytics
- πŸ”„ Additional export formats
- πŸ”„ Improved error recovery
- πŸ”„ Performance optimizations
- πŸ”„ Additional language support