Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files- README.md +43 -49
- server/RLVE_Gym_environment.py +4 -4
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
emoji: π‘
|
| 4 |
colorFrom: purple
|
| 5 |
colorTo: blue
|
|
@@ -11,13 +11,13 @@ tags:
|
|
| 11 |
- openenv
|
| 12 |
---
|
| 13 |
|
| 14 |
-
#
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
## Quick Start
|
| 19 |
|
| 20 |
-
The simplest way to use
|
| 21 |
|
| 22 |
```python
|
| 23 |
from RLVE_Gym import RlveGymAction, RlveGymEnv
|
|
@@ -28,17 +28,21 @@ try:
|
|
| 28 |
|
| 29 |
# Reset
|
| 30 |
result = RLVE_Gymenv.reset()
|
| 31 |
-
print(f"
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
finally:
|
| 44 |
# Always clean up
|
|
@@ -117,29 +121,37 @@ The deployed space includes:
|
|
| 117 |
|
| 118 |
## Environment Details
|
| 119 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
### Action
|
| 121 |
**RlveGymAction**: Contains a single field
|
| 122 |
-
- `
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
|
| 124 |
### Observation
|
| 125 |
-
**RlveGymObservation**:
|
| 126 |
-
- `
|
| 127 |
-
- `
|
| 128 |
-
- `
|
| 129 |
-
- `
|
| 130 |
-
- `
|
| 131 |
-
|
| 132 |
-
### Reward
|
| 133 |
-
The reward is calculated as: `message_length Γ 0.1`
|
| 134 |
-
- "Hi" β reward: 0.2
|
| 135 |
-
- "Hello, World!" β reward: 1.3
|
| 136 |
-
- Empty message β reward: 0.0
|
| 137 |
|
| 138 |
## Advanced Usage
|
| 139 |
|
| 140 |
### Connecting to an Existing Server
|
| 141 |
|
| 142 |
-
If you already have
|
| 143 |
|
| 144 |
```python
|
| 145 |
from RLVE_Gym import RlveGymEnv
|
|
@@ -149,7 +161,7 @@ RLVE_Gymenv = RlveGymEnv(base_url="<ENV_HTTP_URL_HERE>")
|
|
| 149 |
|
| 150 |
# Use as normal
|
| 151 |
result = RLVE_Gymenv.reset()
|
| 152 |
-
result = RLVE_Gymenv.step(RlveGymAction(
|
| 153 |
```
|
| 154 |
|
| 155 |
Note: When connecting to an existing server, `RLVE_Gymenv.close()` will NOT stop the server.
|
|
@@ -177,22 +189,4 @@ Run the server locally for development:
|
|
| 177 |
|
| 178 |
```bash
|
| 179 |
uvicorn server.app:app --reload
|
| 180 |
-
```
|
| 181 |
-
|
| 182 |
-
## Project Structure
|
| 183 |
-
|
| 184 |
-
```
|
| 185 |
-
RLVE_Gym/
|
| 186 |
-
βββ __init__.py # Module exports
|
| 187 |
-
βββ README.md # This file
|
| 188 |
-
βββ openenv.yaml # OpenEnv manifest
|
| 189 |
-
βββ pyproject.toml # Project metadata and dependencies
|
| 190 |
-
βββ uv.lock # Locked dependencies (generated)
|
| 191 |
-
βββ client.py # RlveGymEnv client implementation
|
| 192 |
-
βββ models.py # Action and Observation models
|
| 193 |
-
βββ server/
|
| 194 |
-
βββ __init__.py # Server module exports
|
| 195 |
-
βββ RLVE_Gym_environment.py # Core environment logic
|
| 196 |
-
βββ app.py # FastAPI application
|
| 197 |
-
βββ Dockerfile # Container image definition
|
| 198 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
+
title: RlveGym Environment Server
|
| 3 |
emoji: π‘
|
| 4 |
colorFrom: purple
|
| 5 |
colorTo: blue
|
|
|
|
| 11 |
- openenv
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# RlveGym Environment
|
| 15 |
|
| 16 |
+
This package contains a collection of 400 verifiable environments from RLVE-Gym, introduced by the paper [*RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments*](https://arxiv.org/abs/2511.07317) (original GitHub repository is [here](https://github.com/Zhiyuan-Zeng/RLVE)).
|
| 17 |
|
| 18 |
## Quick Start
|
| 19 |
|
| 20 |
+
The simplest way to use RlveGym environment is through the `RlveGymEnv` class:
|
| 21 |
|
| 22 |
```python
|
| 23 |
from RLVE_Gym import RlveGymAction, RlveGymEnv
|
|
|
|
| 28 |
|
| 29 |
# Reset
|
| 30 |
result = RLVE_Gymenv.reset()
|
| 31 |
+
print(f"Problem Prompt: {result.observation.problem_input}")
|
| 32 |
+
# Or:
|
| 33 |
+
print(f"Problem Prompt (from the environment's state): {RLVE_Gymenv.state().problem_input}")
|
| 34 |
+
|
| 35 |
+
# Send multiple outputs
|
| 36 |
+
outputs = [
|
| 37 |
+
"Wrong Format",
|
| 38 |
+
r"<answer>0</answer>", # Wrong Answer
|
| 39 |
+
r"<answer>" + str(RLVE_Gymenv.problem.parameter["reference_answer"]) + r"</answer>", # Correct Answer
|
| 40 |
+
]
|
| 41 |
+
|
| 42 |
+
for output in outputs:
|
| 43 |
+
result = RLVE_Gymenv.step(RlveGymAction(output = output))
|
| 44 |
+
print(f"Sent: '{output}'")
|
| 45 |
+
print(f"Result: `{result}`")
|
| 46 |
|
| 47 |
finally:
|
| 48 |
# Always clean up
|
|
|
|
| 121 |
|
| 122 |
## Environment Details
|
| 123 |
|
| 124 |
+
### Environment Initialization
|
| 125 |
+
|
| 126 |
+
Please check [here](server/RLVE_Gym_environment.py) for detailed usage:
|
| 127 |
+
- `environment_identifier` (str) - The environment's identifier. Check [here](server/Gym/environments/__init__.py) for detailed usage.
|
| 128 |
+
- `difficulty` (int) - The difficulty of generated problems.
|
| 129 |
+
- `answer_markers` (Tuple[str] of length 2) - How the environment extracts the final answer from a model output.
|
| 130 |
+
- `seed` (int) - The initial seed to use when generating the first problem. Whenever `reset()` is called, the seed will be incremented by 1.
|
| 131 |
+
|
| 132 |
### Action
|
| 133 |
**RlveGymAction**: Contains a single field
|
| 134 |
+
- `output` (str) - The model's output to get verified.
|
| 135 |
+
|
| 136 |
+
### State
|
| 137 |
+
**RlveGymState**:
|
| 138 |
+
- `seed` (int) - The seed to use when running `reset()`.
|
| 139 |
+
- `problem_input` (Optional[str]) - The input of the problem; if it is `None`, it means that the problem generation has not been run, or it failed.
|
| 140 |
+
- `num_samples` (int) and `sum_accuracy` (int) - The statistics of the result of `step(action)` so far for the current problem (the number of outputs sent to the verifier and the number of correct ones).
|
| 141 |
|
| 142 |
### Observation
|
| 143 |
+
**RlveGymObservation**:
|
| 144 |
+
- `problem_input` (Optional[str]) - The input of the problem; if it is `None`, it means that the problem generation has not been run, or it failed.
|
| 145 |
+
- `verifier_result` (Optional[dict]) - Contains `reward` as the raw reward, `accuracy` as the 0/1 correctness, and `format_score` as the 0/1 format correctness.
|
| 146 |
+
- `success` (bool) - `True` or `False` indicates whether the operation succeeds.
|
| 147 |
+
- `message` (str) - The explanation of `success`.
|
| 148 |
+
- `reward` (Optional[float]) - The value is `verifier_result["reward"]`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
|
| 150 |
## Advanced Usage
|
| 151 |
|
| 152 |
### Connecting to an Existing Server
|
| 153 |
|
| 154 |
+
If you already have an RlveGymEnv server running, you can connect directly:
|
| 155 |
|
| 156 |
```python
|
| 157 |
from RLVE_Gym import RlveGymEnv
|
|
|
|
| 161 |
|
| 162 |
# Use as normal
|
| 163 |
result = RLVE_Gymenv.reset()
|
| 164 |
+
result = RLVE_Gymenv.step(RlveGymAction(output="Hello!"))
|
| 165 |
```
|
| 166 |
|
| 167 |
Note: When connecting to an existing server, `RLVE_Gymenv.close()` will NOT stop the server.
|
|
|
|
| 189 |
|
| 190 |
```bash
|
| 191 |
uvicorn server.app:app --reload
|
| 192 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
server/RLVE_Gym_environment.py
CHANGED
|
@@ -54,8 +54,8 @@ class RlveGymEnvironment(Environment):
|
|
| 54 |
Returns:
|
| 55 |
problem_input: The generated problem input string (or None if generation failed)
|
| 56 |
verifier_result: None
|
| 57 |
-
success: Boolean indicating
|
| 58 |
-
message:
|
| 59 |
"""
|
| 60 |
if (self.environment_identifier not in identifier2environment) or (
|
| 61 |
self.environment_identifier not in identifier2controller
|
|
@@ -140,8 +140,8 @@ class RlveGymEnvironment(Environment):
|
|
| 140 |
Returns:
|
| 141 |
problem_input: The problem input string from the current state
|
| 142 |
verifier_result: Result of the verification containing accuracy and other metrics
|
| 143 |
-
success: Boolean indicating
|
| 144 |
-
message:
|
| 145 |
"""
|
| 146 |
if self.problem is None:
|
| 147 |
return RlveGymObservation(
|
|
|
|
| 54 |
Returns:
|
| 55 |
problem_input: The generated problem input string (or None if generation failed)
|
| 56 |
verifier_result: None
|
| 57 |
+
success: Boolean indicating whether the reset was successful
|
| 58 |
+
message: The result of the reset
|
| 59 |
"""
|
| 60 |
if (self.environment_identifier not in identifier2environment) or (
|
| 61 |
self.environment_identifier not in identifier2controller
|
|
|
|
| 140 |
Returns:
|
| 141 |
problem_input: The problem input string from the current state
|
| 142 |
verifier_result: Result of the verification containing accuracy and other metrics
|
| 143 |
+
success: Boolean indicating whether the step (verification) was successful
|
| 144 |
+
message: The result of the step
|
| 145 |
"""
|
| 146 |
if self.problem is None:
|
| 147 |
return RlveGymObservation(
|