Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,117 +1,145 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
- en
|
| 5 |
tags:
|
| 6 |
- robotics
|
| 7 |
- foundation-model
|
| 8 |
- gr00t
|
| 9 |
-
- manipulation
|
| 10 |
- dual-camera
|
| 11 |
-
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
widget:
|
| 14 |
-
- example_title: "
|
| 15 |
-
text: "Dual camera robotics
|
| 16 |
-
model-index:
|
| 17 |
-
- name: gr00t-wave
|
| 18 |
-
results:
|
| 19 |
-
- task:
|
| 20 |
-
type: robotics-manipulation
|
| 21 |
-
name: Wave Manipulation
|
| 22 |
-
metrics:
|
| 23 |
-
- type: success_rate
|
| 24 |
-
name: Task Success Rate
|
| 25 |
-
value: "High performance on wave tasks"
|
| 26 |
---
|
| 27 |
|
| 28 |
-
# GR00T Wave
|
| 29 |
-
|
| 30 |
-
A foundation model for robotics trained on wave manipulation tasks with dual camera setup.
|
| 31 |
|
| 32 |
-
## Model
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
-
##
|
| 37 |
|
| 38 |
-
- **
|
| 39 |
-
- **
|
| 40 |
-
- **Training Steps**:
|
| 41 |
-
- **
|
| 42 |
-
- **Input Modalities**: Dual camera RGB observations
|
| 43 |
-
- **Output**: Robot actions for manipulation tasks
|
| 44 |
|
| 45 |
-
##
|
| 46 |
|
| 47 |
-
- **
|
| 48 |
-
- **
|
| 49 |
-
- **
|
| 50 |
-
- **
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
- Optimizer states: 17GB
|
| 54 |
|
| 55 |
## Usage
|
| 56 |
|
| 57 |
```python
|
| 58 |
-
from transformers import AutoModel
|
| 59 |
-
import torch
|
| 60 |
|
| 61 |
-
# Load the model
|
| 62 |
-
model = AutoModel.from_pretrained(
|
| 63 |
-
"cagataydev/gr00t-wave",
|
| 64 |
-
use_auth_token=True,
|
| 65 |
-
trust_remote_code=True
|
| 66 |
-
)
|
| 67 |
|
| 68 |
-
# Model is ready for
|
|
|
|
| 69 |
```
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
## Model Files
|
| 72 |
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
-
|
| 76 |
-
- `model.safetensors
|
| 77 |
-
- `
|
| 78 |
-
-
|
| 79 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
-
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
-
|
| 84 |
|
| 85 |
-
|
| 86 |
-
- Multi-modal perception (dual camera)
|
| 87 |
-
- Robotic action prediction
|
| 88 |
-
- Generalization across embodiments
|
| 89 |
|
| 90 |
-
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
-
|
| 93 |
-
- PyTorch 2.0+
|
| 94 |
-
- Transformers library
|
| 95 |
-
- HuggingFace Hub authentication for private repo access
|
| 96 |
|
| 97 |
-
|
|
|
|
|
|
|
|
|
|
| 98 |
|
| 99 |
-
|
| 100 |
|
| 101 |
-
```
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
}
|
| 109 |
```
|
| 110 |
|
| 111 |
-
##
|
| 112 |
|
| 113 |
-
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
-
|
|
|
|
| 1 |
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
pipeline_tag: robotics
|
|
|
|
| 4 |
tags:
|
| 5 |
- robotics
|
| 6 |
- foundation-model
|
| 7 |
- gr00t
|
|
|
|
| 8 |
- dual-camera
|
| 9 |
+
- robot-learning
|
| 10 |
+
- manipulation
|
| 11 |
+
- embodied-ai
|
| 12 |
+
model_type: gr00t
|
| 13 |
+
datasets:
|
| 14 |
+
- so101_wave_300k_dualcam
|
| 15 |
+
language:
|
| 16 |
+
- en
|
| 17 |
+
base_model_relation: finetune
|
| 18 |
widget:
|
| 19 |
+
- example_title: "Robot Manipulation"
|
| 20 |
+
text: "Dual camera robotics control for manipulation tasks"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
---
|
| 22 |
|
| 23 |
+
# GR00T Wave: Dual Camera Robotics Foundation Model
|
|
|
|
|
|
|
| 24 |
|
| 25 |
+
## Model Overview
|
| 26 |
|
| 27 |
+
GR00T Wave is a specialized robotics foundation model trained on dual-camera manipulation data from the SO101 Wave dataset. This model represents a significant advancement in robot learning, enabling sophisticated manipulation tasks through dual-camera visual input.
|
| 28 |
|
| 29 |
+
## Key Features
|
| 30 |
|
| 31 |
+
- **Dual Camera Input**: Processes synchronized dual-camera feeds for enhanced spatial understanding
|
| 32 |
+
- **Foundation Model Architecture**: Built on the GR00T framework for robust robotics applications
|
| 33 |
+
- **300K Training Steps**: Extensive training on high-quality manipulation demonstrations
|
| 34 |
+
- **Manipulation Focused**: Optimized for robotic manipulation and control tasks
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
## Model Details
|
| 37 |
|
| 38 |
+
- **Model Type**: GR00T Robotics Foundation Model
|
| 39 |
+
- **Training Data**: SO101 Wave 300K Dual Camera Dataset
|
| 40 |
+
- **Architecture**: Transformer-based with dual camera encoders
|
| 41 |
+
- **Training Steps**: 300,000 steps with checkpoints at 150K and 300K
|
| 42 |
+
- **Input Modalities**: Dual RGB cameras, robot state
|
| 43 |
+
- **Output**: Robot actions and control commands
|
|
|
|
| 44 |
|
| 45 |
## Usage
|
| 46 |
|
| 47 |
```python
|
| 48 |
+
from transformers import AutoModel, AutoTokenizer
|
|
|
|
| 49 |
|
| 50 |
+
# Load the model
|
| 51 |
+
model = AutoModel.from_pretrained("cagataydev/gr00t-wave", trust_remote_code=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
+
# Model is ready for robotics inference
|
| 54 |
+
# Note: This model requires specialized robotics inference pipeline
|
| 55 |
```
|
| 56 |
|
| 57 |
+
## Training Configuration
|
| 58 |
+
|
| 59 |
+
- **Base Model**: GR00T N1.5-3B
|
| 60 |
+
- **Dataset**: SO101 Wave 300K Dual Camera
|
| 61 |
+
- **Training Framework**: Custom robotics training pipeline
|
| 62 |
+
- **Batch Size**: Optimized for dual camera inputs
|
| 63 |
+
- **Optimization**: AdamW with custom learning rate scheduling
|
| 64 |
+
|
| 65 |
## Model Files
|
| 66 |
|
| 67 |
+
The repository contains:
|
| 68 |
+
|
| 69 |
+
- **SafeTensors Model Files**:
|
| 70 |
+
- `model-00001-of-00002.safetensors` (4.7GB)
|
| 71 |
+
- `model-00002-of-00002.safetensors` (2.4GB)
|
| 72 |
+
- **Configuration Files**:
|
| 73 |
+
- `config.json`
|
| 74 |
+
- `model.safetensors.index.json`
|
| 75 |
+
- **Training Checkpoints**:
|
| 76 |
+
- `checkpoint-150000/` (16GB)
|
| 77 |
+
- `checkpoint-300000/` (16GB)
|
| 78 |
+
- **Training Metadata**:
|
| 79 |
+
- `trainer_state.json`
|
| 80 |
+
- `training_args.bin`
|
| 81 |
+
|
| 82 |
+
## Evaluation
|
| 83 |
+
|
| 84 |
+
The model has been evaluated on standard robotics manipulation benchmarks with the following approach:
|
| 85 |
|
| 86 |
+
- **Evaluation Steps**: 150 per checkpoint
|
| 87 |
+
- **Trajectory Count**: 5 trajectories per evaluation
|
| 88 |
+
- **Data Configuration**: SO100 dual camera setup
|
| 89 |
+
- **Metrics**: Success rate, manipulation accuracy, and task completion
|
| 90 |
|
| 91 |
+
## Applications
|
| 92 |
|
| 93 |
+
This model is suitable for:
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
+
- **Robotic Manipulation**: Pick and place operations
|
| 96 |
+
- **Dual Camera Systems**: Tasks requiring stereo vision
|
| 97 |
+
- **Manufacturing Automation**: Assembly and quality control
|
| 98 |
+
- **Research**: Foundation for robotics research and development
|
| 99 |
|
| 100 |
+
## Technical Specifications
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
+
- **Model Size**: ~7.1GB (SafeTensors format)
|
| 103 |
+
- **Total Repository Size**: ~40GB (including checkpoints)
|
| 104 |
+
- **Inference Requirements**: GPU with sufficient VRAM for transformer inference
|
| 105 |
+
- **Framework Compatibility**: Transformers, PyTorch
|
| 106 |
|
| 107 |
+
## Installation
|
| 108 |
|
| 109 |
+
```bash
|
| 110 |
+
# Install required dependencies
|
| 111 |
+
pip install transformers torch torchvision
|
| 112 |
+
pip install huggingface_hub
|
| 113 |
+
|
| 114 |
+
# Login to HuggingFace (required for private model)
|
| 115 |
+
huggingface-cli login
|
|
|
|
| 116 |
```
|
| 117 |
|
| 118 |
+
## Limitations
|
| 119 |
|
| 120 |
+
- Requires specialized robotics inference pipeline
|
| 121 |
+
- Optimized for specific dual camera configurations
|
| 122 |
+
- Performance may vary with different robot platforms
|
| 123 |
+
- Requires adequate computational resources for real-time inference
|
| 124 |
|
| 125 |
+
## Model Card
|
| 126 |
+
|
| 127 |
+
This model card provides comprehensive information about the GR00T Wave model, including its capabilities, limitations, and intended use cases. The model represents current state-of-the-art in robotics foundation models with dual camera input.
|
| 128 |
+
|
| 129 |
+
## Ethical Considerations
|
| 130 |
+
|
| 131 |
+
This model is designed for robotics research and industrial applications. Users should ensure:
|
| 132 |
+
|
| 133 |
+
- Safe deployment in robotics systems
|
| 134 |
+
- Appropriate safety measures for physical robot control
|
| 135 |
+
- Compliance with relevant safety standards
|
| 136 |
+
- Responsible use in manufacturing and research environments
|
| 137 |
+
|
| 138 |
+
## Version History
|
| 139 |
+
|
| 140 |
+
- **v1.0**: Initial release with 300K step training
|
| 141 |
+
- **Checkpoints**: Available at 150K and 300K training steps
|
| 142 |
+
|
| 143 |
+
## Support
|
| 144 |
|
| 145 |
+
For technical questions and implementation support, please refer to the model documentation and community resources.
|