| # LinkedIn Profile Enhancer - File-by-File Technical Guide | |
| ## π Current File Analysis & Architecture | |
| --- | |
| ## π **Entry Point Files** | |
| ### **app.py** - Main Gradio Application | |
| **Purpose**: Primary web interface using Gradio framework with streamlined one-click enhancement | |
| **Architecture**: Modern UI with single-button workflow that automatically handles all processing steps | |
| **Key Components**: | |
| ```python | |
| class LinkedInEnhancerGradio: | |
| def __init__(self): | |
| self.orchestrator = ProfileOrchestrator() | |
| self.current_profile_data = None | |
| self.current_analysis = None | |
| self.current_suggestions = None | |
| ``` | |
| **Core Method - Enhanced Profile Processing**: | |
| ```python | |
| def enhance_linkedin_profile(self, linkedin_url: str, job_description: str = "") -> Tuple[str, str, str, str, str, str, str, str, Optional[Image.Image]]: | |
| # Complete automation pipeline: | |
| # 1. Extract profile data via Apify | |
| # 2. Analyze profile automatically | |
| # 3. Generate AI suggestions automatically | |
| # 4. Format all results for display | |
| # Returns: status, basic_info, about, experience, details, analysis, keywords, suggestions, image | |
| ``` | |
| **UI Features**: | |
| - **Single Action Button**: "π Enhance LinkedIn Profile" - handles entire workflow | |
| - **Automatic Processing**: No manual steps required for analysis or suggestions | |
| - **Tabbed Results Interface**: | |
| - Basic Information with profile image | |
| - About Section display | |
| - Experience breakdown | |
| - Education & Skills overview | |
| - Analysis Results with scoring | |
| - Enhancement Suggestions from AI | |
| - Export & Download functionality | |
| - **API Status Testing**: Real-time connection verification for Apify and OpenAI | |
| - **Comprehensive Export**: Downloadable markdown reports with all data and suggestions | |
| **Interface Workflow**: | |
| 1. User enters LinkedIn URL + optional job description | |
| 2. Clicks "π Enhance LinkedIn Profile" | |
| 3. System automatically: scrapes β analyzes β generates suggestions | |
| 4. Results displayed across organized tabs | |
| 5. User can export comprehensive report | |
| ### **streamlit_app.py** - Alternative Streamlit Interface | |
| **Purpose**: Data visualization focused interface for analytics and detailed insights | |
| **Key Features**: | |
| - **Advanced Visualizations**: Plotly charts for profile metrics | |
| - **Sidebar Controls**: Input management and API status | |
| - **Interactive Dashboard**: Multi-tab analytics interface | |
| - **Session State Management**: Persistent data across refreshes | |
| **Streamlit Layout Structure**: | |
| ```python | |
| def main(): | |
| # Header with gradient styling | |
| # Sidebar: Input controls, API status, examples | |
| # Main Dashboard Tabs: | |
| # - Profile Analysis: Metrics, charts, scoring | |
| # - Scraped Data: Raw profile information | |
| # - Enhancement Suggestions: AI-generated content | |
| # - Implementation Roadmap: Action items | |
| ``` | |
| --- | |
| ## π€ **Core Agent System** | |
| ### **agents/orchestrator.py** - Central Workflow Coordinator | |
| **Purpose**: Manages the complete enhancement workflow using Facade pattern | |
| **Architecture Role**: Single entry point that coordinates all agents | |
| **Class Structure**: | |
| ```python | |
| class ProfileOrchestrator: | |
| def __init__(self): | |
| self.scraper = ScraperAgent() # LinkedIn data extraction | |
| self.analyzer = AnalyzerAgent() # Profile analysis engine | |
| self.content_generator = ContentAgent() # AI content generation | |
| self.memory = MemoryManager() # Session & cache management | |
| ``` | |
| **Enhanced Workflow** (`enhance_profile` method): | |
| 1. **Cache Management**: `force_refresh` option to clear old data | |
| 2. **Data Extraction**: `scraper.extract_profile_data(linkedin_url)` | |
| 3. **Profile Analysis**: `analyzer.analyze_profile(profile_data, job_description)` | |
| 4. **AI Suggestions**: `content_generator.generate_suggestions(analysis, job_description)` | |
| 5. **Memory Storage**: `memory.store_session(linkedin_url, session_data)` | |
| 6. **Result Formatting**: Structured output for UI consumption | |
| **Key Features**: | |
| - **URL Validation**: Ensures data consistency and proper formatting | |
| - **Error Recovery**: Comprehensive exception handling with user-friendly messages | |
| - **Progress Tracking**: Detailed logging for debugging and monitoring | |
| - **Cache Control**: Smart refresh mechanisms to ensure data accuracy | |
| ### **agents/scraper_agent.py** - LinkedIn Data Extraction | |
| **Purpose**: Extracts comprehensive profile data using Apify's LinkedIn scraper | |
| **API Integration**: Apify REST API with specialized LinkedIn profile scraper actor | |
| **Key Methods**: | |
| ```python | |
| def extract_profile_data(self, linkedin_url: str) -> Dict[str, Any]: | |
| # Main extraction with timeout handling and error recovery | |
| def test_apify_connection(self) -> bool: | |
| # Connectivity and authentication verification | |
| def _process_apify_data(self, raw_data: Dict, url: str) -> Dict[str, Any]: | |
| # Converts raw Apify response to standardized profile format | |
| ``` | |
| **Extracted Data Structure** (20+ fields): | |
| - **Basic Information**: name, headline, location, about, connections, followers | |
| - **Professional Details**: current job_title, company_name, industry, company_size | |
| - **Experience Array**: positions with titles, companies, durations, descriptions, current status | |
| - **Education Array**: schools, degrees, fields of study, years, grades | |
| - **Skills Array**: technical and professional skills with categorization | |
| - **Additional Data**: certifications, languages, volunteer work, honors, projects | |
| - **Media Assets**: profile images (standard and high-quality), company logos | |
| **Error Handling Scenarios**: | |
| - **401 Unauthorized**: Invalid Apify API token guidance | |
| - **404 Not Found**: Actor availability or LinkedIn URL issues | |
| - **429 Rate Limited**: API quota management and retry logic | |
| - **Timeout Errors**: Long scraping operations (30-60 seconds typical) | |
| - **Data Quality**: Validation of extracted fields and completeness | |
| ### **agents/analyzer_agent.py** - Advanced Profile Analysis Engine | |
| **Purpose**: Multi-dimensional profile analysis with weighted scoring algorithms | |
| **Analysis Domains**: Completeness assessment, content quality, job matching, keyword optimization | |
| **Core Analysis Pipeline**: | |
| ```python | |
| def analyze_profile(self, profile_data: Dict, job_description: str = "") -> Dict[str, Any]: | |
| # Master analysis orchestrator returning comprehensive insights | |
| def _calculate_completeness(self, profile_data: Dict) -> float: | |
| # Weighted scoring algorithm with configurable section weights | |
| def _calculate_job_match(self, profile_data: Dict, job_description: str) -> float: | |
| # Multi-factor job compatibility analysis with synonym matching | |
| def _analyze_keywords(self, profile_data: Dict, job_description: str) -> Dict: | |
| # Advanced keyword extraction and optimization recommendations | |
| def _assess_content_quality(self, profile_data: Dict) -> Dict: | |
| # Content quality metrics using action words and professional language patterns | |
| ``` | |
| **Scoring Algorithms**: | |
| **Completeness Scoring** (0-100% with weighted sections): | |
| ```python | |
| completion_weights = { | |
| 'basic_info': 0.20, # Name, headline, location, about presence | |
| 'about_section': 0.25, # Professional summary quality and length | |
| 'experience': 0.25, # Work history completeness and descriptions | |
| 'skills': 0.15, # Skills count and relevance | |
| 'education': 0.15 # Educational background completeness | |
| } | |
| ``` | |
| **Job Match Scoring** (Multi-factor analysis): | |
| - **Skills Overlap** (40%): Technical and professional skills alignment | |
| - **Experience Relevance** (30%): Work history relevance to target role | |
| - **Keyword Density** (20%): Industry terminology and buzzword matching | |
| - **Education Match** (10%): Educational background relevance | |
| **Content Quality Assessment**: | |
| - **Action Words Count**: Impact verbs (managed, developed, led, implemented) | |
| - **Quantifiable Results**: Presence of metrics, percentages, achievements | |
| - **Professional Language**: Industry-appropriate terminology usage | |
| - **Description Quality**: Completeness and detail level of experience descriptions | |
| ### **agents/content_agent.py** - AI Content Generation Engine | |
| **Purpose**: Generates professional content enhancements using OpenAI GPT-4o-mini | |
| **AI Integration**: Structured prompt engineering with context-aware content generation | |
| **Content Generation Pipeline**: | |
| ```python | |
| def generate_suggestions(self, analysis: Dict, job_description: str = "") -> Dict[str, Any]: | |
| # Master content generation orchestrator | |
| def _generate_ai_content(self, analysis: Dict, job_description: str) -> Dict: | |
| # AI-powered content creation with structured prompts | |
| def _generate_headlines(self, profile_data: Dict, job_description: str) -> List[str]: | |
| # Creates 3-5 optimized professional headlines (120 char limit) | |
| def _generate_about_section(self, profile_data: Dict, job_description: str) -> str: | |
| # Compelling professional summary with value proposition | |
| ``` | |
| **AI Content Types Generated**: | |
| 1. **Professional Headlines**: 3-5 optimized alternatives with keyword integration | |
| 2. **Enhanced About Sections**: Compelling narrative with clear value proposition | |
| 3. **Experience Descriptions**: Action-oriented, results-focused bullet points | |
| 4. **Skills Optimization**: Industry-relevant skill recommendations | |
| 5. **Keyword Integration**: SEO-optimized professional terminology suggestions | |
| **OpenAI Configuration**: | |
| ```python | |
| model = "gpt-4o-mini" # Cost-effective, high-quality model choice | |
| max_tokens = 500 # Balanced response length | |
| temperature = 0.7 # Optimal creativity vs consistency balance | |
| ``` | |
| **Prompt Engineering Strategy**: | |
| - **Context Inclusion**: Profile data + target job requirements | |
| - **Output Structure**: Consistent formatting for easy parsing | |
| - **Constraint Definition**: Character limits, professional tone requirements | |
| - **Quality Guidelines**: Professional, appropriate, industry-specific content | |
| --- | |
| ## π§ **Memory & Data Management** | |
| ### **memory/memory_manager.py** - Session & Persistence Layer | |
| **Purpose**: Manages temporary session data and persistent storage with smart caching | |
| **Storage Strategy**: Hybrid approach combining session memory with JSON persistence | |
| **Key Capabilities**: | |
| ```python | |
| def store_session(self, profile_url: str, data: Dict[str, Any]) -> None: | |
| # Store session data keyed by LinkedIn URL | |
| def get_session(self, profile_url: str) -> Optional[Dict[str, Any]]: | |
| # Retrieve cached session data with timestamp validation | |
| def force_refresh_session(self, profile_url: str) -> None: | |
| # Clear cache to force fresh data extraction | |
| def clear_session_cache(self, profile_url: str = None) -> None: | |
| # Selective or complete cache clearing | |
| ``` | |
| **Session Data Structure**: | |
| ```python | |
| session_data = { | |
| 'timestamp': '2025-01-XX XX:XX:XX', | |
| 'profile_url': 'https://linkedin.com/in/username', | |
| 'data': { | |
| 'profile_data': {...}, # Raw scraped LinkedIn data | |
| 'analysis': {...}, # Scoring and analysis results | |
| 'suggestions': {...}, # AI-generated enhancement suggestions | |
| 'job_description': '...' # Target job requirements | |
| } | |
| } | |
| ``` | |
| **Memory Management Features**: | |
| - **URL-Based Isolation**: Each LinkedIn profile has separate session space | |
| - **Automatic Timestamping**: Data freshness tracking and expiration | |
| - **Smart Cache Invalidation**: Intelligent refresh based on URL changes | |
| - **Persistence Layer**: JSON-based storage for cross-session data retention | |
| --- | |
| ## π οΈ **Utility Components** | |
| ### **utils/linkedin_parser.py** - Data Processing & Standardization | |
| **Purpose**: Cleans and standardizes raw LinkedIn data for consistent processing | |
| **Processing Functions**: Text normalization, date parsing, skill categorization, URL validation | |
| **Key Processing Operations**: | |
| ```python | |
| def clean_profile_data(self, raw_data: Dict[str, Any]) -> Dict[str, Any]: | |
| # Master data cleaning orchestrator | |
| def _clean_experience_list(self, experience_list: List) -> List[Dict]: | |
| # Standardize work experience entries with duration calculation | |
| def _parse_date_range(self, date_string: str) -> Dict: | |
| # Parse various date formats to ISO standard | |
| def _categorize_skills(self, skills_list: List[str]) -> Dict: | |
| # Intelligent skill grouping by category | |
| ``` | |
| **Skill Categorization System**: | |
| ```python | |
| skill_categories = { | |
| 'technical': ['Python', 'JavaScript', 'React', 'AWS', 'Docker', 'SQL'], | |
| 'management': ['Leadership', 'Project Management', 'Agile', 'Team Building'], | |
| 'marketing': ['SEO', 'Social Media', 'Content Marketing', 'Analytics'], | |
| 'design': ['UI/UX', 'Figma', 'Adobe Creative', 'Design Thinking'], | |
| 'business': ['Strategy', 'Operations', 'Sales', 'Business Development'] | |
| } | |
| ``` | |
| ### **utils/job_matcher.py** - Advanced Job Compatibility Analysis | |
| **Purpose**: Sophisticated job matching with configurable weighted scoring | |
| **Matching Strategy**: Multi-dimensional analysis with industry context awareness | |
| **Scoring Configuration**: | |
| ```python | |
| match_weights = { | |
| 'skills': 0.4, # 40% - Technical/professional skills compatibility | |
| 'experience': 0.3, # 30% - Relevant work experience and seniority | |
| 'keywords': 0.2, # 20% - Industry terminology alignment | |
| 'education': 0.1 # 10% - Educational background relevance | |
| } | |
| ``` | |
| **Advanced Matching Features**: | |
| - **Synonym Recognition**: Handles skill variations (JS/JavaScript, ML/Machine Learning) | |
| - **Experience Weighting**: Recent and relevant experience valued higher | |
| - **Industry Context**: Sector-specific terminology and role requirements | |
| - **Seniority Analysis**: Career progression and leadership experience consideration | |
| --- | |
| ## π¬ **AI Prompt Engineering System** | |
| ### **prompts/agent_prompts.py** - Structured Prompt Library | |
| **Purpose**: Organized, reusable prompts for consistent AI output quality | |
| **Structure**: Modular prompt classes for different content enhancement types | |
| **Prompt Categories**: | |
| ```python | |
| class ContentPrompts: | |
| def __init__(self): | |
| self.headline_prompts = HeadlinePrompts() # LinkedIn headline optimization | |
| self.about_prompts = AboutPrompts() # Professional summary enhancement | |
| self.experience_prompts = ExperiencePrompts() # Job description improvements | |
| self.general_prompts = GeneralPrompts() # Overall profile suggestions | |
| ``` | |
| **Prompt Engineering Principles**: | |
| - **Context Awareness**: Include relevant profile data and target role information | |
| - **Output Formatting**: Specify desired structure, length, and professional tone | |
| - **Constraint Management**: Character limits, industry standards, LinkedIn best practices | |
| - **Quality Examples**: High-quality reference content for AI model guidance | |
| --- | |
| ## π **Configuration & Dependencies** | |
| ### **requirements.txt** - Current Dependencies | |
| **Purpose**: Comprehensive Python package management for production deployment | |
| **Core Dependencies**: | |
| ```txt | |
| gradio # Primary web UI framework | |
| streamlit # Alternative UI for data visualization | |
| requests # HTTP client for API integrations | |
| openai # AI content generation | |
| apify-client # LinkedIn scraping service | |
| plotly # Interactive data visualizations | |
| Pillow # Image processing for profile pictures | |
| pandas # Data manipulation and analysis | |
| numpy # Numerical computations | |
| python-dotenv # Environment variable management | |
| pydantic # Data validation and serialization | |
| ``` | |
| **Framework Rationale**: | |
| - **Gradio**: Rapid prototyping, easy sharing, demo-friendly interface | |
| - **Streamlit**: Superior data visualization capabilities, analytics dashboard | |
| - **OpenAI**: High-quality AI content generation with cost efficiency | |
| - **Apify**: Specialized LinkedIn scraping with legal compliance | |
| - **Plotly**: Professional interactive charts and visualizations | |
| --- | |
| ## π **Enhanced Export & Reporting System** | |
| ### **Comprehensive Markdown Export** | |
| **Purpose**: Generate downloadable reports with complete analysis and suggestions | |
| **File Format**: Professional markdown reports compatible with GitHub, Notion, and text editors | |
| **Export Content Structure**: | |
| ```markdown | |
| # LinkedIn Profile Enhancement Report | |
| ## Executive Summary | |
| ## Basic Profile Information (formatted table) | |
| ## Current About Section | |
| ## Professional Experience (detailed breakdown) | |
| ## Education & Skills Analysis | |
| ## AI Analysis Results (scoring, strengths, weaknesses) | |
| ## Keyword Analysis (found vs missing) | |
| ## AI-Powered Enhancement Suggestions | |
| - Professional Headlines (multiple options) | |
| - Enhanced About Section | |
| - Experience Description Ideas | |
| ## Recommended Action Items | |
| - Immediate Actions (this week) | |
| - Medium-term Goals (this month) | |
| - Long-term Strategy (next 3 months) | |
| ## Additional Resources & Next Steps | |
| ``` | |
| **Download Features**: | |
| - **Timestamped Filenames**: Organized file management | |
| - **Complete Data**: All extracted, analyzed, and generated content | |
| - **Action Planning**: Structured implementation roadmap | |
| - **Professional Formatting**: Ready for sharing with mentors/colleagues | |
| --- | |
| ## π **Current System Architecture** | |
| ### **Streamlined User Experience** | |
| - **One-Click Enhancement**: Single button handles entire workflow automatically | |
| - **Real-Time Processing**: Live status updates during 30-60 second operations | |
| - **Comprehensive Results**: All data, analysis, and suggestions in organized tabs | |
| - **Professional Export**: Downloadable reports for implementation planning | |
| ### **Technical Performance** | |
| - **Profile Extraction**: 95%+ success rate for public LinkedIn profiles | |
| - **Processing Time**: 45-90 seconds end-to-end (API-dependent) | |
| - **AI Content Quality**: Professional, context-aware suggestions | |
| - **System Reliability**: Robust error handling and graceful degradation | |
| ### **Production Readiness Features** | |
| - **API Integration**: Robust external service management (Apify, OpenAI) | |
| - **Error Recovery**: Comprehensive exception handling with user guidance | |
| - **Session Management**: Smart caching and data persistence | |
| - **Security Practices**: Environment variable management, input validation | |
| - **Monitoring**: Detailed logging and performance tracking | |
| This updated technical guide reflects the current streamlined architecture with enhanced automation, comprehensive export functionality, and production-ready features for professional LinkedIn profile enhancement. | |
| --- | |
| ## π― **Key Differentiators** | |
| ### **Current Implementation Advantages** | |
| 1. **Fully Automated Workflow**: One-click enhancement replacing multi-step processes | |
| 2. **Real LinkedIn Data**: Actual profile scraping vs mock data demonstrations | |
| 3. **Comprehensive AI Integration**: Context-aware content generation with professional quality | |
| 4. **Dual UI Frameworks**: Demonstrating versatility with Gradio and Streamlit | |
| 5. **Production Export**: Professional markdown reports ready for implementation | |
| 6. **Smart Caching**: Efficient session management with intelligent refresh capabilities | |
| This technical guide provides comprehensive insight into the current LinkedIn Profile Enhancer architecture, enabling detailed technical discussions and code reviews. MemoryManager() # Session management | |
| ``` | |
| **Main Workflow** (`enhance_profile` method): | |
| 1. **Data Extraction**: `self.scraper.extract_profile_data(linkedin_url)` | |
| 2. **Profile Analysis**: `self.analyzer.analyze_profile(profile_data, job_description)` | |
| 3. **Content Generation**: `self.content_generator.generate_suggestions(analysis, job_description)` | |
| 4. **Memory Storage**: `self.memory.store_session(linkedin_url, session_data)` | |
| 5. **Output Formatting**: `self._format_output(analysis, suggestions)` | |
| **Key Features**: | |
| - **Error Recovery**: Comprehensive exception handling | |
| - **Cache Management**: Force refresh capabilities | |
| - **URL Validation**: Ensures data consistency | |
| - **Progress Tracking**: Detailed logging for debugging | |
| ### **agents/scraper_agent.py** - LinkedIn Data Extraction | |
| **Purpose**: Extracts profile data using Apify's LinkedIn scraper | |
| **API Integration**: Apify REST API with `dev_fusion~linkedin-profile-scraper` actor | |
| **Key Methods**: | |
| ```python | |
| def extract_profile_data(self, linkedin_url: str) -> Dict[str, Any]: | |
| # Main extraction method with comprehensive error handling | |
| # Returns: Structured profile data with 20+ fields | |
| def test_apify_connection(self) -> bool: | |
| # Tests API connectivity and authentication | |
| def _process_apify_data(self, raw_data: Dict, url: str) -> Dict[str, Any]: | |
| # Converts raw Apify response to standardized format | |
| ``` | |
| **Data Processing Pipeline**: | |
| 1. **URL Validation**: Clean and normalize LinkedIn URLs | |
| 2. **API Configuration**: Set up Apify run parameters | |
| 3. **Data Extraction**: POST request to Apify API with timeout handling | |
| 4. **Response Processing**: Convert raw data to standardized format | |
| 5. **Quality Validation**: Ensure data completeness and accuracy | |
| **Extracted Data Fields**: | |
| - **Basic Info**: name, headline, location, about, connections, followers | |
| - **Professional**: job_title, company_name, company_industry, company_size | |
| - **Experience**: Array of positions with titles, companies, durations, descriptions | |
| - **Education**: Array of degrees with schools, fields, years, grades | |
| - **Skills**: Array of skills with endorsement data | |
| - **Additional**: certifications, languages, volunteer experience, honors | |
| **Error Handling**: | |
| - **401 Unauthorized**: Invalid API token guidance | |
| - **404 Not Found**: Actor availability issues | |
| - **429 Rate Limited**: Too many requests handling | |
| - **Timeout**: Long scraping operation management | |
| ### **agents/analyzer_agent.py** - Profile Analysis Engine | |
| **Purpose**: Analyzes profile data and calculates various performance metrics | |
| **Analysis Domains**: Completeness, content quality, job matching, keyword optimization | |
| **Core Analysis Methods**: | |
| ```python | |
| def analyze_profile(self, profile_data: Dict, job_description: str = "") -> Dict[str, Any]: | |
| # Main analysis orchestrator | |
| def _calculate_completeness(self, profile_data: Dict) -> float: | |
| # Weighted scoring: Profile(20%) + About(25%) + Experience(25%) + Skills(15%) + Education(15%) | |
| def _calculate_job_match(self, profile_data: Dict, job_description: str) -> float: | |
| # Multi-factor job compatibility analysis | |
| def _analyze_keywords(self, profile_data: Dict, job_description: str) -> Dict: | |
| # Keyword extraction and optimization analysis | |
| def _assess_content_quality(self, profile_data: Dict) -> Dict: | |
| # Content quality metrics using action words and professional language | |
| ``` | |
| **Scoring Algorithms**: | |
| **Completeness Scoring** (0-100%): | |
| ```python | |
| weights = { | |
| 'basic_info': 0.20, # name, headline, location | |
| 'about_section': 0.25, # professional summary | |
| 'experience': 0.25, # work history | |
| 'skills': 0.15, # technical/professional skills | |
| 'education': 0.15 # educational background | |
| } | |
| ``` | |
| **Job Match Scoring** (0-100%): | |
| - **Skills Overlap**: Compare profile skills with job requirements | |
| - **Experience Relevance**: Analyze work history against job needs | |
| - **Keyword Density**: Match professional terminology | |
| - **Industry Alignment**: Assess sector compatibility | |
| **Content Quality Assessment**: | |
| - **Action Words**: Count of impact verbs (led, managed, developed, etc.) | |
| - **Quantifiable Results**: Presence of metrics and achievements | |
| - **Professional Language**: Industry-appropriate terminology | |
| - **Description Completeness**: Adequate detail in experience descriptions | |
| ### **agents/content_agent.py** - AI Content Generation | |
| **Purpose**: Generates enhanced content suggestions using OpenAI GPT-4o-mini | |
| **AI Integration**: OpenAI API with structured prompt engineering | |
| **Content Generation Pipeline**: | |
| ```python | |
| def generate_suggestions(self, analysis: Dict, job_description: str = "") -> Dict[str, Any]: | |
| # Orchestrates all content generation tasks | |
| def _generate_ai_content(self, analysis: Dict, job_description: str) -> Dict: | |
| # AI-powered content creation using OpenAI | |
| def _generate_headlines(self, profile_data: Dict, job_description: str) -> List[str]: | |
| # Creates 3-5 alternative professional headlines | |
| def _generate_about_section(self, profile_data: Dict, job_description: str) -> str: | |
| # Creates compelling professional summary | |
| ``` | |
| **AI Content Types**: | |
| 1. **Professional Headlines**: 3-5 optimized alternatives (120 char limit) | |
| 2. **Enhanced About Sections**: Compelling narrative with value proposition | |
| 3. **Experience Descriptions**: Action-oriented bullet points | |
| 4. **Skills Optimization**: Industry-relevant skill suggestions | |
| 5. **Keyword Integration**: SEO-optimized professional terminology | |
| **Prompt Engineering Strategy**: | |
| - **Context Awareness**: Include profile data and target job requirements | |
| - **Output Structure**: Consistent formatting for easy parsing | |
| - **Token Optimization**: Cost-effective prompt design | |
| - **Quality Control**: Guidelines for professional, appropriate content | |
| **OpenAI Configuration**: | |
| ```python | |
| model = "gpt-4o-mini" # Cost-effective, high-quality model | |
| max_tokens = 500 # Reasonable response length | |
| temperature = 0.7 # Balanced creativity vs consistency | |
| ``` | |
| --- | |
| ## π§ **Memory & Data Management** | |
| ### **memory/memory_manager.py** - Session & Persistence | |
| **Purpose**: Manages temporary session data and persistent storage | |
| **Storage Strategy**: Hybrid approach with session memory and JSON persistence | |
| **Key Capabilities**: | |
| ```python | |
| def store_session(self, profile_url: str, data: Dict[str, Any]) -> None: | |
| # Store temporary session data keyed by LinkedIn URL | |
| def get_session(self, profile_url: str) -> Optional[Dict[str, Any]]: | |
| # Retrieve cached session data | |
| def store_persistent(self, key: str, data: Any) -> None: | |
| # Store data permanently in JSON files | |
| def clear_session_cache(self, profile_url: str = None) -> None: | |
| # Clear cache for specific URL or all sessions | |
| ``` | |
| **Data Management Features**: | |
| - **Session Isolation**: Each LinkedIn URL has separate session data | |
| - **Automatic Timestamping**: Track data freshness and creation time | |
| - **Cache Invalidation**: Smart cache clearing based on URL changes | |
| - **Persistence Layer**: JSON-based storage for historical data | |
| - **Memory Optimization**: Configurable data retention policies | |
| **Storage Structure**: | |
| ```python | |
| session_data = { | |
| 'timestamp': '2025-01-XX XX:XX:XX', | |
| 'profile_url': 'https://linkedin.com/in/username', | |
| 'data': { | |
| 'profile_data': {...}, # Raw scraped data | |
| 'analysis': {...}, # Analysis results | |
| 'suggestions': {...}, # Enhancement suggestions | |
| 'job_description': '...' # Target job description | |
| } | |
| } | |
| ``` | |
| --- | |
| ## π οΈ **Utility Components** | |
| ### **utils/linkedin_parser.py** - Data Processing & Cleaning | |
| **Purpose**: Standardizes and cleans raw LinkedIn data | |
| **Processing Functions**: Text normalization, date parsing, skill categorization | |
| **Key Methods**: | |
| ```python | |
| def clean_profile_data(self, raw_data: Dict[str, Any]) -> Dict[str, Any]: | |
| # Main data cleaning orchestrator | |
| def _clean_experience_list(self, experience_list: List) -> List[Dict]: | |
| # Standardize work experience entries | |
| def _parse_date_range(self, date_string: str) -> Dict: | |
| # Parse various date formats to standardized structure | |
| def _categorize_skills(self, skills_list: List[str]) -> Dict: | |
| # Group skills by category (technical, management, marketing, design) | |
| ``` | |
| **Data Cleaning Operations**: | |
| - **Text Normalization**: Remove extra whitespace, special characters | |
| - **Date Standardization**: Parse various date formats to ISO standard | |
| - **Skill Categorization**: Group skills into technical, management, marketing, design | |
| - **Experience Timeline**: Calculate durations and identify current positions | |
| - **Education Parsing**: Extract degrees, fields of study, graduation years | |
| - **URL Validation**: Ensure proper LinkedIn URL formatting | |
| **Skill Categories**: | |
| ```python | |
| skill_categories = { | |
| 'technical': ['python', 'javascript', 'java', 'react', 'aws', 'docker'], | |
| 'management': ['leadership', 'project management', 'team management', 'agile'], | |
| 'marketing': ['seo', 'social media', 'content marketing', 'analytics'], | |
| 'design': ['ui/ux', 'photoshop', 'figma', 'adobe', 'design thinking'] | |
| } | |
| ``` | |
| ### **utils/job_matcher.py** - Job Compatibility Analysis | |
| **Purpose**: Advanced job matching algorithms with weighted scoring | |
| **Matching Strategy**: Multi-dimensional analysis with configurable weights | |
| **Scoring Configuration**: | |
| ```python | |
| weight_config = { | |
| 'skills': 0.4, # 40% - Technical and professional skills match | |
| 'experience': 0.3, # 30% - Relevant work experience | |
| 'keywords': 0.2, # 20% - Industry terminology alignment | |
| 'education': 0.1 # 10% - Educational background relevance | |
| } | |
| ``` | |
| **Key Algorithms**: | |
| ```python | |
| def calculate_match_score(self, profile_data: Dict, job_description: str) -> Dict[str, Any]: | |
| # Main job matching orchestrator with weighted scoring | |
| def _extract_job_requirements(self, job_description: str) -> Dict: | |
| # Parse job posting to extract skills, experience, education requirements | |
| def _calculate_skills_match(self, profile_skills: List, required_skills: List) -> float: | |
| # Skills compatibility with synonym matching | |
| def _analyze_experience_relevance(self, profile_exp: List, job_requirements: Dict) -> float: | |
| # Work experience relevance analysis | |
| ``` | |
| **Matching Features**: | |
| - **Synonym Recognition**: Handles skill variations (JavaScript/JS, Python/Django) | |
| - **Experience Weighting**: Recent experience valued higher | |
| - **Industry Context**: Sector-specific terminology matching | |
| - **Education Relevance**: Degree and field of study consideration | |
| - **Comprehensive Scoring**: Detailed breakdown of match factors | |
| --- | |
| ## π¬ **AI Prompt System** | |
| ### **prompts/agent_prompts.py** - Structured AI Prompts | |
| **Purpose**: Organized prompt engineering for consistent AI output | |
| **Structure**: Modular prompt classes for different content types | |
| **Prompt Categories**: | |
| ```python | |
| class ContentPrompts: | |
| def __init__(self): | |
| self.headline_prompts = HeadlinePrompts() # LinkedIn headline optimization | |
| self.about_prompts = AboutPrompts() # Professional summary creation | |
| self.experience_prompts = ExperiencePrompts() # Experience description enhancement | |
| self.general_prompts = GeneralPrompts() # General improvement suggestions | |
| ``` | |
| **Prompt Engineering Principles**: | |
| - **Context Inclusion**: Always provide relevant profile data | |
| - **Output Structure**: Specify desired format and length | |
| - **Constraint Definition**: Character limits, professional tone requirements | |
| - **Example Provision**: Include high-quality examples for reference | |
| - **Industry Adaptation**: Tailor prompts based on detected industry/role | |
| **Sample Prompt Structure**: | |
| ```python | |
| HEADLINE_ANALYSIS = """ | |
| Analyze this LinkedIn headline and provide improvement suggestions: | |
| Current headline: "{headline}" | |
| Target role: "{target_role}" | |
| Key skills: {skills} | |
| Consider: | |
| 1. Keyword optimization for the target role | |
| 2. Value proposition clarity | |
| 3. Professional branding | |
| 4. Character limit (120 chars max) | |
| 5. Industry-specific terms | |
| Provide 3-5 alternative headline suggestions. | |
| """ | |
| ``` | |
| --- | |
| ## π **Configuration & Documentation** | |
| ### **requirements.txt** - Dependency Management | |
| **Purpose**: Python package dependencies for the project | |
| **Key Dependencies**: | |
| ```txt | |
| streamlit>=1.25.0 # Web UI framework | |
| gradio>=3.35.0 # Alternative web UI | |
| openai>=1.0.0 # AI content generation | |
| requests>=2.31.0 # HTTP client for APIs | |
| python-dotenv>=1.0.0 # Environment variable management | |
| plotly>=5.15.0 # Data visualization | |
| pandas>=2.0.0 # Data manipulation | |
| Pillow>=10.0.0 # Image processing | |
| ``` | |
| ### **README.md** - Project Overview | |
| **Purpose**: High-level project documentation | |
| **Content**: Installation, usage, features, API requirements | |
| ### **CLEANUP_SUMMARY.md** - Development Notes | |
| **Purpose**: Code refactoring and cleanup documentation | |
| **Content**: Optimization history, technical debt resolution | |
| --- | |
| ## π **Data Storage Structure** | |
| ### **data/** Directory | |
| **Purpose**: Runtime data storage and caching | |
| **Contents**: | |
| - `persistent_data.json`: Long-term storage | |
| - Session cache files | |
| - Temporary processing data | |
| ### **Profile Analysis Outputs** | |
| **Generated Files**: `profile_analysis_[username]_[timestamp].md` | |
| **Purpose**: Permanent record of analysis results | |
| **Format**: Markdown reports with comprehensive insights | |
| --- | |
| ## π§ **Development & Testing** | |
| ### **Testing Capabilities** | |
| **Command Line Testing**: | |
| ```bash | |
| python app.py --test # Full API integration test | |
| python app.py --quick-test # Connectivity verification | |
| ``` | |
| **Test Coverage**: | |
| - **API Connectivity**: Apify and OpenAI authentication | |
| - **Data Extraction**: Profile scraping functionality | |
| - **Analysis Pipeline**: Scoring and assessment algorithms | |
| - **Content Generation**: AI suggestion quality | |
| - **End-to-End Workflow**: Complete enhancement process | |
| ### **Debugging Features** | |
| - **Comprehensive Logging**: Detailed operation tracking | |
| - **Progress Indicators**: Real-time status updates | |
| - **Error Messages**: Actionable failure guidance | |
| - **Data Validation**: Quality assurance at each step | |
| - **Performance Monitoring**: Processing time tracking | |
| --- | |
| ## π **Production Considerations** | |
| ### **Scalability Enhancements** | |
| - **Database Integration**: Replace JSON with PostgreSQL/MongoDB | |
| - **Queue System**: Implement Celery for background processing | |
| - **Caching Layer**: Add Redis for improved performance | |
| - **Load Balancing**: Multi-instance deployment capability | |
| - **Monitoring**: Add comprehensive logging and alerting | |
| ### **Security Improvements** | |
| - **API Key Rotation**: Automated credential management | |
| - **Rate Limiting**: Per-user API usage controls | |
| - **Input Sanitization**: Enhanced validation and cleaning | |
| - **Audit Logging**: Security event tracking | |
| - **Data Encryption**: Sensitive information protection | |
| This file-by-file breakdown provides deep technical insight into every component of the LinkedIn Profile Enhancer system, enabling comprehensive understanding for technical interviews and code reviews. | |