Spaces:

minhvtt
/

EBD_Fest

Running

App Files Files Community

EBD_Fest / ADVANCED_RAG_GUIDE.md

minhvtt

Upload 20 files

cb93402 verified 18 days ago

preview code

raw

history blame contribute delete

7.2 kB

	# Advanced RAG Chatbot - User Guide

	## What's New?

	### 1. Multiple Images & Texts Support in `/index` API

	The `/index` endpoint now supports indexing multiple texts and images in a single request (max 10 each).

	Before:
	```python
	# Old: Only 1 text and 1 image
	data = {
	'id': 'doc1',
	'text': 'Single text',
	}
	files = {'image': open('image.jpg', 'rb')}
	```

	After:
	```python
	# New: Multiple texts and images (max 10 each)
	data = {
	'id': 'doc1',
	'texts': ['Text 1', 'Text 2', 'Text 3'], # Up to 10
	}
	files = [
	('images', open('image1.jpg', 'rb')),
	('images', open('image2.jpg', 'rb')),
	('images', open('image3.jpg', 'rb')), # Up to 10
	]
	response = requests.post('http://localhost:8000/index', data=data, files=files)
	```

	Example with cURL:
	```bash
	curl -X POST "http://localhost:8000/index" \
	-F "id=event123" \
	-F "texts=Sự kiện âm nhạc tại Hà Nội" \
	-F "texts=Diễn ra vào ngày 20/10/2025" \
	-F "texts=Địa điểm: Trung tâm Hội nghị Quốc gia" \
	-F "[email protected]" \
	-F "[email protected]" \
	-F "[email protected]"
	```

	### 2. Advanced RAG Pipeline in `/chat` API

	The chat endpoint now uses modern RAG techniques for better response quality:

	#### Key Improvements:

	1. Query Expansion: Automatically expands your question with variations
	2. Multi-Query Retrieval: Searches with multiple query variants
	3. Reranking: Re-scores results for better relevance
	4. Contextual Compression: Keeps only the most relevant parts
	5. Better Prompt Engineering: Optimized prompts for LLM

	#### How to Use:

	Basic Usage (Auto-enabled):
	```python
	import requests

	response = requests.post('http://localhost:8000/chat', json={
	'message': 'Dao có nguy hiểm không?',
	'use_rag': True,
	'use_advanced_rag': True, # Default: True
	'hf_token': 'hf_xxxxx'
	})

	result = response.json()
	print("Response:", result['response'])
	print("RAG Stats:", result['rag_stats']) # See pipeline statistics
	```

	Advanced Configuration:
	```python
	response = requests.post('http://localhost:8000/chat', json={
	'message': 'Làm sao để tạo event mới?',
	'use_rag': True,
	'use_advanced_rag': True,

	# RAG Pipeline Options
	'use_query_expansion': True, # Expand query with variations
	'use_reranking': True, # Rerank results
	'use_compression': True, # Compress context
	'score_threshold': 0.5, # Min relevance score (0-1)
	'top_k': 5, # Number of documents to retrieve

	# LLM Options
	'max_tokens': 512,
	'temperature': 0.7,
	'hf_token': 'hf_xxxxx'
	})
	```

	Disable Advanced RAG (Use Basic):
	```python
	response = requests.post('http://localhost:8000/chat', json={
	'message': 'Your question',
	'use_rag': True,
	'use_advanced_rag': False, # Use basic RAG
	})
	```

	## API Changes Summary

	### `/index` Endpoint

	Old Parameters:
	- `id`: str (required)
	- `text`: str (required)
	- `image`: UploadFile (optional)

	New Parameters:
	- `id`: str (required)
	- `texts`: List[str] (optional, max 10)
	- `images`: List[UploadFile] (optional, max 10)

	Response:
	```json
	{
	"success": true,
	"id": "doc123",
	"message": "Đã index thành công document doc123 với 3 texts và 2 images"
	}
	```

	### `/chat` Endpoint

	New Parameters:
	- `use_advanced_rag`: bool (default: True) - Enable advanced RAG
	- `use_query_expansion`: bool (default: True) - Expand query
	- `use_reranking`: bool (default: True) - Rerank results
	- `use_compression`: bool (default: True) - Compress context
	- `score_threshold`: float (default: 0.5) - Min relevance score

	Response (New):
	```json
	{
	"response": "AI generated answer...",
	"context_used": [...],
	"timestamp": "2025-10-29T...",
	"rag_stats": {
	"original_query": "Your question",
	"expanded_queries": ["Query variant 1", "Query variant 2"],
	"initial_results": 10,
	"after_rerank": 5,
	"after_compression": 5
	}
	}
	```

	## Complete Examples

	### Example 1: Index Multiple Social Media Posts

	```python
	import requests

	# Index a social media event with multiple posts and images
	data = {
	'id': 'event_festival_2025',
	'texts': [
	'Festival âm nhạc quốc tế Hà Nội 2025',
	'Ngày 15-17 tháng 11 năm 2025',
	'Địa điểm: Công viên Thống Nhất',
	'Line-up: Sơn Tùng MTP, Đen Vâu, Hoàng Thùy Linh',
	'Giá vé từ 500.000đ - 2.000.000đ'
	]
	}

	files = [
	('images', open('poster_festival.jpg', 'rb')),
	('images', open('lineup.jpg', 'rb')),
	('images', open('venue_map.jpg', 'rb'))
	]

	response = requests.post('http://localhost:8000/index', data=data, files=files)
	print(response.json())
	```

	### Example 2: Advanced RAG Chat

	```python
	import requests

	# Chat with advanced RAG
	chat_response = requests.post('http://localhost:8000/chat', json={
	'message': 'Festival âm nhạc Hà Nội diễn ra khi nào và ở đâu?',
	'use_rag': True,
	'use_advanced_rag': True,
	'top_k': 3,
	'score_threshold': 0.6,
	'hf_token': 'your_hf_token_here'
	})

	result = chat_response.json()
	print("Answer:", result['response'])
	print("\nRetrieved Context:")
	for ctx in result['context_used']:
	print(f"- [{ctx['id']}] Confidence: {ctx['confidence']:.2%}")

	print("\nRAG Pipeline Stats:")
	print(f"- Original query: {result['rag_stats']['original_query']}")
	print(f"- Query variants: {result['rag_stats']['expanded_queries']}")
	print(f"- Documents retrieved: {result['rag_stats']['initial_results']}")
	print(f"- After reranking: {result['rag_stats']['after_rerank']}")
	```

	## Performance Comparison

	\| Feature \| Basic RAG \| Advanced RAG \|
	\|---------\|-----------\|--------------\|
	\| Query Understanding \| Single query \| Multiple query variants \|
	\| Retrieval Method \| Direct vector search \| Multi-query + hybrid \|
	\| Result Ranking \| Score from DB \| Reranked with semantic similarity \|
	\| Context Quality \| Full text \| Compressed, relevant parts only \|
	\| Response Accuracy \| Good \| Better \|
	\| Response Time \| Faster \| Slightly slower but better quality \|

	## When to Use What?

	Use Basic RAG when:
	- You need fast response time
	- Queries are straightforward
	- Context is already well-structured

	Use Advanced RAG when:
	- You need higher accuracy
	- Queries are complex or ambiguous
	- Context documents are long
	- You want better relevance

	## Troubleshooting

	### Error: "Tối đa 10 texts"
	You're sending more than 10 texts. Reduce to max 10.

	### Error: "Tối đa 10 images"
	You're sending more than 10 images. Reduce to max 10.

	### RAG stats show 0 results
	Your `score_threshold` might be too high. Try lowering it (e.g., 0.3-0.5).

	## Next Steps

	To further improve RAG, consider:

	1. Add BM25 Hybrid Search: Combine dense + sparse retrieval
	2. Use Cross-Encoder for Reranking: Better than embedding similarity
	3. Implement Query Decomposition: Break complex queries into sub-queries
	4. Add Citation/Source Tracking: Show which document each fact comes from
	5. Integrate RAG-Anything: For advanced multimodal document processing

	For RAG-Anything integration (more complex), see: https://github.com/HKUDS/RAG-Anything