devjas1
(FEAT/DOCS)[Docs: Readme + .gitignore]: add README.md with project details and setup instructions
03e744b
| # CodeMind | |
| A CLI tool for intelligent document analysis and commit message generation using EmbeddingGemma-300m for embeddings, FAISS for vector storage, and Phi-2 for text generation. | |
| ## Features | |
| - **Document Indexing**: Embed and index documents for semantic search | |
| - **Semantic Search**: Find relevant documents using natural language queries | |
| - **Smart Commit Messages**: Generate meaningful commit messages from staged git changes | |
| - **RAG (Retrieval-Augmented Generation)**: Answer questions using indexed document context | |
| ## Setup | |
| ### Prerequisites | |
| - Windows 11 | |
| - Conda environment | |
| - Git | |
| ### Installation | |
| 1. **Create a Conda environment:** | |
| ```bash | |
| conda create -n codemind python=3.9 | |
| conda activate codemind | |
| ``` | |
| 2. **Clone the repository:** | |
| ```bash | |
| git clone https://github.com/devjas1/codemind.git | |
| cd codemind | |
| ``` | |
| 3. **Install dependencies:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 4. **Download models:** | |
| **Embedding Model (EmbeddingGemma-300m):** | |
| - Download from Hugging Face: `google/embeddinggemma-300m` | |
| - Place in `./models/embeddinggemma-300m/` directory | |
| **Generation Model (Phi-2 GGUF):** | |
| - Download the quantized Phi-2 model: `phi-2.Q4_0.gguf` | |
| - Place in `./models/` directory | |
| - Download from: [Microsoft Phi-2 GGUF](https://huggingface.co/microsoft/phi-2-gguf) or similar quantized versions | |
| ### Directory Structure | |
| ``` | |
| CodeMind/ | |
| βββ cli.py # Main CLI entry point | |
| βββ config.yaml # Configuration file | |
| βββ requirements.txt # Python dependencies | |
| βββ models/ # Model storage | |
| β βββ embeddinggemma-300m/ # Embedding model directory | |
| β βββ phi-2.Q4_0.gguf # Phi-2 quantized model file | |
| βββ src/ # Core modules | |
| β βββ config_loader.py # Configuration management | |
| β βββ embedder.py # Document embedding | |
| β βββ retriever.py # Semantic search | |
| β βββ generator.py # Text generation | |
| β βββ diff_analyzer.py # Git diff analysis | |
| βββ docs/ # Documentation | |
| βββ vector_cache/ # FAISS index storage (auto-created) | |
| ``` | |
| ## Usage | |
| ### Initialize Document Index | |
| Index documents from a directory for semantic search: | |
| ```bash | |
| python cli.py init ./docs/ | |
| ``` | |
| This will: | |
| - Embed all documents in the specified directory | |
| - Create a FAISS index in `vector_cache/` | |
| - Save metadata for retrieval | |
| ### Semantic Search | |
| Search for relevant documents using natural language: | |
| ```bash | |
| python cli.py search "how to configure the model" | |
| ``` | |
| Returns ranked results with similarity scores. | |
| ### Ask Questions (RAG) | |
| Get answers based on your indexed documents: | |
| ```bash | |
| python cli.py ask "What are the configuration options?" | |
| ``` | |
| Uses retrieval-augmented generation to provide contextual answers. | |
| ### Git Commit Message Generation | |
| Generate intelligent commit messages from staged changes: | |
| ```bash | |
| # Preview commit message without applying | |
| python cli.py commit --preview | |
| # Show staged files and analysis without generating message | |
| python cli.py commit --dry-run | |
| # Generate and apply commit message | |
| python cli.py commit --apply | |
| ``` | |
| ### Start API Server (Future Feature) | |
| ```bash | |
| python cli.py serve --port 8000 | |
| ``` | |
| _Note: API server functionality is planned for future releases._ | |
| ## Configuration | |
| Edit `config.yaml` to customize behavior: | |
| ```yaml | |
| embedding: | |
| model_path: "./models/embeddinggemma-300m" | |
| dim: 768 | |
| truncate_to: 128 | |
| generator: | |
| model_path: "./models/phi-2.Q4_0.gguf" | |
| quantization: "Q4_0" | |
| max_tokens: 512 | |
| n_ctx: 2048 | |
| retrieval: | |
| vector_store: "faiss" | |
| top_k: 5 | |
| similarity_threshold: 0.75 | |
| commit: | |
| tone: "imperative" | |
| style: "conventional" | |
| max_length: 72 | |
| logging: | |
| verbose: true | |
| telemetry: false | |
| ``` | |
| ### Configuration Options | |
| - **embedding.model_path**: Path to the EmbeddingGemma-300m model | |
| - **generator.model_path**: Path to the Phi-2 GGUF model file | |
| - **retrieval.top_k**: Number of documents to retrieve for context | |
| - **retrieval.similarity_threshold**: Minimum similarity score for results | |
| - **generator.max_tokens**: Maximum tokens for generation | |
| - **generator.n_ctx**: Context window size for Phi-2 | |
| ## Dependencies | |
| - `sentence-transformers>=2.2.2` - Document embedding | |
| - `faiss-cpu>=1.7.4` - Vector similarity search | |
| - `llama-cpp-python>=0.2.23` - Phi-2 model inference (Windows compatible) | |
| - `typer>=0.9.0` - CLI framework | |
| - `PyYAML>=6.0` - Configuration file parsing | |
| ## Troubleshooting | |
| ### Model Loading Issues | |
| If you encounter model loading errors: | |
| 1. **Embedding Model**: Ensure `embeddinggemma-300m` is a directory containing all model files | |
| 2. **Phi-2 Model**: Ensure `phi-2.Q4_0.gguf` is a single GGUF file | |
| 3. **Paths**: All paths in `config.yaml` should be relative to the project root | |
| ### Memory Issues | |
| For systems with limited RAM: | |
| - Use Q4_0 quantization for Phi-2 (already configured) | |
| - Reduce `n_ctx` in config.yaml if needed | |
| - Process documents in smaller batches | |
| ### Windows-Specific Issues | |
| - Ensure `llama-cpp-python` version supports Windows | |
| - Use PowerShell or Command Prompt for CLI commands | |
| - Check file path separators in configuration | |
| ## Development | |
| To test the modules: | |
| ```bash | |
| python -c "from src import *; print('All modules imported successfully')" | |
| ``` | |
| To run in development mode: | |
| ```bash | |
| python cli.py --help | |
| ``` | |
| ## License | |
| [Insert your license information here] | |
| ## Contributing | |
| [Insert contribution guidelines here] | |