cagataydev commited on
Commit
8adad2e
·
verified ·
1 Parent(s): fa56edc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +107 -79
README.md CHANGED
@@ -1,117 +1,145 @@
1
  ---
2
- license: other
3
- language:
4
- - en
5
  tags:
6
  - robotics
7
  - foundation-model
8
  - gr00t
9
- - manipulation
10
  - dual-camera
11
- - nvidia
12
- pipeline_tag: robotics
 
 
 
 
 
 
 
13
  widget:
14
- - example_title: "Wave Manipulation Task"
15
- text: "Dual camera robotics model for wave manipulation"
16
- model-index:
17
- - name: gr00t-wave
18
- results:
19
- - task:
20
- type: robotics-manipulation
21
- name: Wave Manipulation
22
- metrics:
23
- - type: success_rate
24
- name: Task Success Rate
25
- value: "High performance on wave tasks"
26
  ---
27
 
28
- # GR00T Wave - Dual Camera Model
29
-
30
- A foundation model for robotics trained on wave manipulation tasks with dual camera setup.
31
 
32
- ## Model Description
33
 
34
- This is a GR00T (Generalist Robot 00 Transformer) model specifically trained for wave manipulation tasks using a dual camera configuration. The model was trained for 300k steps and represents state-of-the-art performance in robotic manipulation tasks.
35
 
36
- ## Model Details
37
 
38
- - **Model Type**: GR00T Foundation Model
39
- - **Training Data**: Wave manipulation dataset with dual camera observations
40
- - **Training Steps**: 300,000 steps
41
- - **Architecture**: Transformer-based robotics foundation model
42
- - **Input Modalities**: Dual camera RGB observations
43
- - **Output**: Robot actions for manipulation tasks
44
 
45
- ## Training Configuration
46
 
47
- - **Data Config**: `so100_dualcam`
48
- - **Embodiment**: Supports various robotic embodiments
49
- - **Training Duration**: ~35.7 hours
50
- - **Model Size**: ~40GB total
51
- - SafeTensors model files: 7.6GB
52
- - Training checkpoints: Available at steps 150k and 300k
53
- - Optimizer states: 17GB
54
 
55
  ## Usage
56
 
57
  ```python
58
- from transformers import AutoModel
59
- import torch
60
 
61
- # Load the model (requires authentication for private repo)
62
- model = AutoModel.from_pretrained(
63
- "cagataydev/gr00t-wave",
64
- use_auth_token=True,
65
- trust_remote_code=True
66
- )
67
 
68
- # Model is ready for inference on robotics tasks
 
69
  ```
70
 
 
 
 
 
 
 
 
 
71
  ## Model Files
72
 
73
- - `model-00001-of-00002.safetensors` - Model weights (part 1)
74
- - `model-00002-of-00002.safetensors` - Model weights (part 2)
75
- - `config.json` - Model configuration
76
- - `model.safetensors.index.json` - Model file index
77
- - `checkpoint-150000/` - Intermediate checkpoint
78
- - `checkpoint-300000/` - Final checkpoint
79
- - Training metadata and optimizer states
 
 
 
 
 
 
 
 
 
 
 
80
 
81
- ## Performance
 
 
 
82
 
83
- This model achieved successful completion on wave manipulation tasks and represents the culmination of 300k training steps with dual camera observations. The model demonstrates strong performance on:
84
 
85
- - Wave manipulation tasks
86
- - Multi-modal perception (dual camera)
87
- - Robotic action prediction
88
- - Generalization across embodiments
89
 
90
- ## Requirements
 
 
 
91
 
92
- - Python 3.8+
93
- - PyTorch 2.0+
94
- - Transformers library
95
- - HuggingFace Hub authentication for private repo access
96
 
97
- ## Citation
 
 
 
98
 
99
- If you use this model in your research, please cite:
100
 
101
- ```bibtex
102
- @misc{gr00t-wave-2024,
103
- title={GR00T Wave: Foundation Model for Wave Manipulation},
104
- author={NVIDIA Research},
105
- year={2024},
106
- howpublished={HuggingFace Model Hub},
107
- url={https://huggingface.co/cagataydev/gr00t-wave}
108
- }
109
  ```
110
 
111
- ## License
112
 
113
- This model is released under NVIDIA's research license. Please refer to NVIDIA's terms of use for foundation models.
 
 
 
114
 
115
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
 
117
- *This model was trained as part of NVIDIA's GR00T foundation model research for general-purpose robotics.*
 
1
  ---
2
+ library_name: transformers
3
+ pipeline_tag: robotics
 
4
  tags:
5
  - robotics
6
  - foundation-model
7
  - gr00t
 
8
  - dual-camera
9
+ - robot-learning
10
+ - manipulation
11
+ - embodied-ai
12
+ model_type: gr00t
13
+ datasets:
14
+ - so101_wave_300k_dualcam
15
+ language:
16
+ - en
17
+ base_model_relation: finetune
18
  widget:
19
+ - example_title: "Robot Manipulation"
20
+ text: "Dual camera robotics control for manipulation tasks"
 
 
 
 
 
 
 
 
 
 
21
  ---
22
 
23
+ # GR00T Wave: Dual Camera Robotics Foundation Model
 
 
24
 
25
+ ## Model Overview
26
 
27
+ GR00T Wave is a specialized robotics foundation model trained on dual-camera manipulation data from the SO101 Wave dataset. This model represents a significant advancement in robot learning, enabling sophisticated manipulation tasks through dual-camera visual input.
28
 
29
+ ## Key Features
30
 
31
+ - **Dual Camera Input**: Processes synchronized dual-camera feeds for enhanced spatial understanding
32
+ - **Foundation Model Architecture**: Built on the GR00T framework for robust robotics applications
33
+ - **300K Training Steps**: Extensive training on high-quality manipulation demonstrations
34
+ - **Manipulation Focused**: Optimized for robotic manipulation and control tasks
 
 
35
 
36
+ ## Model Details
37
 
38
+ - **Model Type**: GR00T Robotics Foundation Model
39
+ - **Training Data**: SO101 Wave 300K Dual Camera Dataset
40
+ - **Architecture**: Transformer-based with dual camera encoders
41
+ - **Training Steps**: 300,000 steps with checkpoints at 150K and 300K
42
+ - **Input Modalities**: Dual RGB cameras, robot state
43
+ - **Output**: Robot actions and control commands
 
44
 
45
  ## Usage
46
 
47
  ```python
48
+ from transformers import AutoModel, AutoTokenizer
 
49
 
50
+ # Load the model
51
+ model = AutoModel.from_pretrained("cagataydev/gr00t-wave", trust_remote_code=True)
 
 
 
 
52
 
53
+ # Model is ready for robotics inference
54
+ # Note: This model requires specialized robotics inference pipeline
55
  ```
56
 
57
+ ## Training Configuration
58
+
59
+ - **Base Model**: GR00T N1.5-3B
60
+ - **Dataset**: SO101 Wave 300K Dual Camera
61
+ - **Training Framework**: Custom robotics training pipeline
62
+ - **Batch Size**: Optimized for dual camera inputs
63
+ - **Optimization**: AdamW with custom learning rate scheduling
64
+
65
  ## Model Files
66
 
67
+ The repository contains:
68
+
69
+ - **SafeTensors Model Files**:
70
+ - `model-00001-of-00002.safetensors` (4.7GB)
71
+ - `model-00002-of-00002.safetensors` (2.4GB)
72
+ - **Configuration Files**:
73
+ - `config.json`
74
+ - `model.safetensors.index.json`
75
+ - **Training Checkpoints**:
76
+ - `checkpoint-150000/` (16GB)
77
+ - `checkpoint-300000/` (16GB)
78
+ - **Training Metadata**:
79
+ - `trainer_state.json`
80
+ - `training_args.bin`
81
+
82
+ ## Evaluation
83
+
84
+ The model has been evaluated on standard robotics manipulation benchmarks with the following approach:
85
 
86
+ - **Evaluation Steps**: 150 per checkpoint
87
+ - **Trajectory Count**: 5 trajectories per evaluation
88
+ - **Data Configuration**: SO100 dual camera setup
89
+ - **Metrics**: Success rate, manipulation accuracy, and task completion
90
 
91
+ ## Applications
92
 
93
+ This model is suitable for:
 
 
 
94
 
95
+ - **Robotic Manipulation**: Pick and place operations
96
+ - **Dual Camera Systems**: Tasks requiring stereo vision
97
+ - **Manufacturing Automation**: Assembly and quality control
98
+ - **Research**: Foundation for robotics research and development
99
 
100
+ ## Technical Specifications
 
 
 
101
 
102
+ - **Model Size**: ~7.1GB (SafeTensors format)
103
+ - **Total Repository Size**: ~40GB (including checkpoints)
104
+ - **Inference Requirements**: GPU with sufficient VRAM for transformer inference
105
+ - **Framework Compatibility**: Transformers, PyTorch
106
 
107
+ ## Installation
108
 
109
+ ```bash
110
+ # Install required dependencies
111
+ pip install transformers torch torchvision
112
+ pip install huggingface_hub
113
+
114
+ # Login to HuggingFace (required for private model)
115
+ huggingface-cli login
 
116
  ```
117
 
118
+ ## Limitations
119
 
120
+ - Requires specialized robotics inference pipeline
121
+ - Optimized for specific dual camera configurations
122
+ - Performance may vary with different robot platforms
123
+ - Requires adequate computational resources for real-time inference
124
 
125
+ ## Model Card
126
+
127
+ This model card provides comprehensive information about the GR00T Wave model, including its capabilities, limitations, and intended use cases. The model represents current state-of-the-art in robotics foundation models with dual camera input.
128
+
129
+ ## Ethical Considerations
130
+
131
+ This model is designed for robotics research and industrial applications. Users should ensure:
132
+
133
+ - Safe deployment in robotics systems
134
+ - Appropriate safety measures for physical robot control
135
+ - Compliance with relevant safety standards
136
+ - Responsible use in manufacturing and research environments
137
+
138
+ ## Version History
139
+
140
+ - **v1.0**: Initial release with 300K step training
141
+ - **Checkpoints**: Available at 150K and 300K training steps
142
+
143
+ ## Support
144
 
145
+ For technical questions and implementation support, please refer to the model documentation and community resources.