cagataydev
/

gr00t-wave

@@ -1,117 +1,145 @@
 ---
-license: other
-language:
-- en
 tags:
 - robotics
 - foundation-model
 - gr00t
-- manipulation
 - dual-camera
-- nvidia
-pipeline_tag: robotics
 widget:
-- example_title: "Wave Manipulation Task"
-  text: "Dual camera robotics model for wave manipulation"
-model-index:
-- name: gr00t-wave
-  results:
-  - task:
-      type: robotics-manipulation
-      name: Wave Manipulation
-    metrics:
-    - type: success_rate
-      name: Task Success Rate
-      value: "High performance on wave tasks"
 ---
-# GR00T Wave - Dual Camera Model
-A foundation model for robotics trained on wave manipulation tasks with dual camera setup.
-## Model Description
-This is a GR00T (Generalist Robot 00 Transformer) model specifically trained for wave manipulation tasks using a dual camera configuration. The model was trained for 300k steps and represents state-of-the-art performance in robotic manipulation tasks.
-## Model Details
-- **Model Type**: GR00T Foundation Model
-- **Training Data**: Wave manipulation dataset with dual camera observations
-- **Training Steps**: 300,000 steps
-- **Architecture**: Transformer-based robotics foundation model
-- **Input Modalities**: Dual camera RGB observations
-- **Output**: Robot actions for manipulation tasks
-## Training Configuration
-- **Data Config**: `so100_dualcam`
-- **Embodiment**: Supports various robotic embodiments
-- **Training Duration**: ~35.7 hours
-- **Model Size**: ~40GB total
-  - SafeTensors model files: 7.6GB
-  - Training checkpoints: Available at steps 150k and 300k
-  - Optimizer states: 17GB
 ## Usage
 ```python
-from transformers import AutoModel
-import torch
-# Load the model (requires authentication for private repo)
-model = AutoModel.from_pretrained(
-    "cagataydev/gr00t-wave",
-    use_auth_token=True,
-    trust_remote_code=True
-)
-# Model is ready for inference on robotics tasks
 ```
 ## Model Files
-- `model-00001-of-00002.safetensors` - Model weights (part 1)
-- `model-00002-of-00002.safetensors` - Model weights (part 2)
-- `config.json` - Model configuration
-- `model.safetensors.index.json` - Model file index
-- `checkpoint-150000/` - Intermediate checkpoint
-- `checkpoint-300000/` - Final checkpoint
-- Training metadata and optimizer states
-## Performance
-This model achieved successful completion on wave manipulation tasks and represents the culmination of 300k training steps with dual camera observations. The model demonstrates strong performance on:
-- Wave manipulation tasks
-- Multi-modal perception (dual camera)
-- Robotic action prediction
-- Generalization across embodiments
-## Requirements
-- Python 3.8+
-- PyTorch 2.0+
-- Transformers library
-- HuggingFace Hub authentication for private repo access
-## Citation
-If you use this model in your research, please cite:
-```bibtex
-@misc{gr00t-wave-2024,
-  title={GR00T Wave: Foundation Model for Wave Manipulation},
-  author={NVIDIA Research},
-  year={2024},
-  howpublished={HuggingFace Model Hub},
-  url={https://huggingface.co/cagataydev/gr00t-wave}
-}
 ```
-## License
-This model is released under NVIDIA's research license. Please refer to NVIDIA's terms of use for foundation models.
----
-*This model was trained as part of NVIDIA's GR00T foundation model research for general-purpose robotics.*

 ---
+library_name: transformers
+pipeline_tag: robotics
 tags:
 - robotics
 - foundation-model
 - gr00t
 - dual-camera
+- robot-learning
+- manipulation
+- embodied-ai
+model_type: gr00t
+datasets:
+- so101_wave_300k_dualcam
+language:
+- en
+base_model_relation: finetune
 widget:
+- example_title: "Robot Manipulation"
+  text: "Dual camera robotics control for manipulation tasks"
 ---
+# GR00T Wave: Dual Camera Robotics Foundation Model
+## Model Overview
+GR00T Wave is a specialized robotics foundation model trained on dual-camera manipulation data from the SO101 Wave dataset. This model represents a significant advancement in robot learning, enabling sophisticated manipulation tasks through dual-camera visual input.
+## Key Features
+- **Dual Camera Input**: Processes synchronized dual-camera feeds for enhanced spatial understanding
+- **Foundation Model Architecture**: Built on the GR00T framework for robust robotics applications
+- **300K Training Steps**: Extensive training on high-quality manipulation demonstrations
+- **Manipulation Focused**: Optimized for robotic manipulation and control tasks
+## Model Details
+- **Model Type**: GR00T Robotics Foundation Model
+- **Training Data**: SO101 Wave 300K Dual Camera Dataset
+- **Architecture**: Transformer-based with dual camera encoders
+- **Training Steps**: 300,000 steps with checkpoints at 150K and 300K
+- **Input Modalities**: Dual RGB cameras, robot state
+- **Output**: Robot actions and control commands
 ## Usage
 ```python
+from transformers import AutoModel, AutoTokenizer
+# Load the model
+model = AutoModel.from_pretrained("cagataydev/gr00t-wave", trust_remote_code=True)
+# Model is ready for robotics inference
+# Note: This model requires specialized robotics inference pipeline
 ```
+## Training Configuration
+- **Base Model**: GR00T N1.5-3B
+- **Dataset**: SO101 Wave 300K Dual Camera
+- **Training Framework**: Custom robotics training pipeline
+- **Batch Size**: Optimized for dual camera inputs
+- **Optimization**: AdamW with custom learning rate scheduling
 ## Model Files
+The repository contains:
+- **SafeTensors Model Files**:
+  - `model-00001-of-00002.safetensors` (4.7GB)
+  - `model-00002-of-00002.safetensors` (2.4GB)
+- **Configuration Files**:
+  - `config.json`
+  - `model.safetensors.index.json`
+- **Training Checkpoints**:
+  - `checkpoint-150000/` (16GB)
+  - `checkpoint-300000/` (16GB)
+- **Training Metadata**:
+  - `trainer_state.json`
+  - `training_args.bin`
+## Evaluation
+The model has been evaluated on standard robotics manipulation benchmarks with the following approach:
+- **Evaluation Steps**: 150 per checkpoint
+- **Trajectory Count**: 5 trajectories per evaluation
+- **Data Configuration**: SO100 dual camera setup
+- **Metrics**: Success rate, manipulation accuracy, and task completion
+## Applications
+This model is suitable for:
+- **Robotic Manipulation**: Pick and place operations
+- **Dual Camera Systems**: Tasks requiring stereo vision
+- **Manufacturing Automation**: Assembly and quality control
+- **Research**: Foundation for robotics research and development
+## Technical Specifications
+- **Model Size**: ~7.1GB (SafeTensors format)
+- **Total Repository Size**: ~40GB (including checkpoints)
+- **Inference Requirements**: GPU with sufficient VRAM for transformer inference
+- **Framework Compatibility**: Transformers, PyTorch
+## Installation
+```bash
+# Install required dependencies
+pip install transformers torch torchvision
+pip install huggingface_hub
+# Login to HuggingFace (required for private model)
+huggingface-cli login
 ```
+## Limitations
+- Requires specialized robotics inference pipeline
+- Optimized for specific dual camera configurations
+- Performance may vary with different robot platforms
+- Requires adequate computational resources for real-time inference
+## Model Card
+This model card provides comprehensive information about the GR00T Wave model, including its capabilities, limitations, and intended use cases. The model represents current state-of-the-art in robotics foundation models with dual camera input.
+## Ethical Considerations
+This model is designed for robotics research and industrial applications. Users should ensure:
+- Safe deployment in robotics systems
+- Appropriate safety measures for physical robot control
+- Compliance with relevant safety standards
+- Responsible use in manufacturing and research environments
+## Version History
+- **v1.0**: Initial release with 300K step training
+- **Checkpoints**: Available at 150K and 300K training steps
+## Support
+For technical questions and implementation support, please refer to the model documentation and community resources.