Model Card: Apollo Astralis 8B

Summary

8B reasoning model with a warm, collaborative assistant style; built on Qwen/Qwen3-8B via LoRA.
Demonstrates +36% overall improvement vs Base Qwen 8B in a lightweight standard suite (manual-verified): 93% (13/14).
VRRE semantic evaluation: 22% automated (answer extraction issue) and 89% manual-verified.
Conservative fine-tuning (292 examples) preserves reasoning quality; loss reduced 0.91 → 0.39.
Distributed as PEFT adapters; GGUF Q4_K_M also available for local/Ollama.

Key Specifications

Name: Apollo Astralis 8B
Base Model: Qwen/Qwen3-8B
Type: Causal LM with LoRA adapters (rank 16, alpha 32, ~67M trainable params)
Context Window: 40K (inherited from base)
License: Apache 2.0
Release: October 2025

Intended Use

Appropriate

Reasoning-intensive tasks with step-by-step explanations (math, logic, structured analysis)
Educational assistance, research support, code reasoning, tutoring

Out of Scope

Professional legal/medical/financial advice or high-stakes decisions without human oversight
Contexts requiring strictly formal/neutral tone throughout

Training Overview

Methodology: Conservative LoRA on a proven reasoning baseline (V3), layering personality without degrading capability.
Data: 292 curated examples spanning mathematical/logical reasoning and collaborative tone.
Hardware/Runtime: 1× RTX 3060 (12GB), ~2 hours, bfloat16.
Outcome: Stable convergence (0.91 → 0.39) and preserved reasoning behavior.

Evaluation Highlights

Lightweight benchmark suite (manual-verified; compared to Base Qwen 8B):

Benchmark	Base Qwen3 8B	Apollo	Δ
MMLU (5)	40% (2/5)	100% (5/5)	+60%
GSM8K (4)	75% (3/4)	100% (4/4)	+25%
HellaSwag (2)	50% (1/2)	50% (1/2)	0%
ARC (3)	67% (2/3)	100% (3/3)	+33%
Overall (14)	57% (8/14)	93% (13/14)	+36%

VRRE (VANTA Research Reasoning Evaluation)

Automated: 22% (2/9) due to answer extraction from blocks rather than final conclusions.
Manual-verified: 89% (8/9) with clear, step-by-step reasoning and consistent tone.

Note: Models that surface chain-of-thought often require tailored extraction to evaluate accurately.

Artifacts and Deployment

Adapters: PEFT (adapter_model.safetensors, adapter_config.json)
Quantization: GGUF Q4_K_M (~4.7GB) for local and Ollama deployment
Usage and Modelfiles: see README.md (quick starts, conservative/unlimited variants)
Programmatic and integrations: see USAGE_GUIDE.md (Transformers + PEFT, FastAPI/Gradio)
Merging and conversion: see MERGE_GUIDE.md (merge adapters, convert to GGUF)

Limitations

Explanatory style can be verbose; downstream systems should extract conclusions post-.
Optimized for English reasoning; not tuned for highly specialized domains or creative writing.
Educational intent; verification recommended for critical use cases; maintain human oversight.

Citation

@misc{apollo-astralis-8b-2025,
  title={Apollo Astralis 8B: Conservative LoRA Fine-tuning for Reasoning and Personality},
  author={VANTA Research},
  year={2025},
  url={https://huggingface.co/vanta-research/apollo-astralis-8b},
  note={Base: Qwen/Qwen3-8B}
}

Contact

[email protected]
Hugging Face: vanta-research/apollo-astralis-8b
GitHub: vanta-research/apollo-astralis-8b