ZhiyuanZeng commited on
Commit
eae0874
Β·
verified Β·
1 Parent(s): 1cbc0f5

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +43 -49
  2. server/RLVE_Gym_environment.py +4 -4
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Rlve Gym Environment Server
3
  emoji: πŸ“‘
4
  colorFrom: purple
5
  colorTo: blue
@@ -11,13 +11,13 @@ tags:
11
  - openenv
12
  ---
13
 
14
- # Rlve Gym Environment
15
 
16
- A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
17
 
18
  ## Quick Start
19
 
20
- The simplest way to use the Rlve Gym environment is through the `RlveGymEnv` class:
21
 
22
  ```python
23
  from RLVE_Gym import RlveGymAction, RlveGymEnv
@@ -28,17 +28,21 @@ try:
28
 
29
  # Reset
30
  result = RLVE_Gymenv.reset()
31
- print(f"Reset: {result.observation.echoed_message}")
32
-
33
- # Send multiple messages
34
- messages = ["Hello, World!", "Testing echo", "Final message"]
35
-
36
- for msg in messages:
37
- result = RLVE_Gymenv.step(RlveGymAction(message=msg))
38
- print(f"Sent: '{msg}'")
39
- print(f" β†’ Echoed: '{result.observation.echoed_message}'")
40
- print(f" β†’ Length: {result.observation.message_length}")
41
- print(f" β†’ Reward: {result.reward}")
 
 
 
 
42
 
43
  finally:
44
  # Always clean up
@@ -117,29 +121,37 @@ The deployed space includes:
117
 
118
  ## Environment Details
119
 
 
 
 
 
 
 
 
 
120
  ### Action
121
  **RlveGymAction**: Contains a single field
122
- - `message` (str) - The message to echo back
 
 
 
 
 
 
123
 
124
  ### Observation
125
- **RlveGymObservation**: Contains the echo response and metadata
126
- - `echoed_message` (str) - The message echoed back
127
- - `message_length` (int) - Length of the message
128
- - `reward` (float) - Reward based on message length (length Γ— 0.1)
129
- - `done` (bool) - Always False for echo environment
130
- - `metadata` (dict) - Additional info like step count
131
-
132
- ### Reward
133
- The reward is calculated as: `message_length Γ— 0.1`
134
- - "Hi" β†’ reward: 0.2
135
- - "Hello, World!" β†’ reward: 1.3
136
- - Empty message β†’ reward: 0.0
137
 
138
  ## Advanced Usage
139
 
140
  ### Connecting to an Existing Server
141
 
142
- If you already have a Rlve Gym environment server running, you can connect directly:
143
 
144
  ```python
145
  from RLVE_Gym import RlveGymEnv
@@ -149,7 +161,7 @@ RLVE_Gymenv = RlveGymEnv(base_url="<ENV_HTTP_URL_HERE>")
149
 
150
  # Use as normal
151
  result = RLVE_Gymenv.reset()
152
- result = RLVE_Gymenv.step(RlveGymAction(message="Hello!"))
153
  ```
154
 
155
  Note: When connecting to an existing server, `RLVE_Gymenv.close()` will NOT stop the server.
@@ -177,22 +189,4 @@ Run the server locally for development:
177
 
178
  ```bash
179
  uvicorn server.app:app --reload
180
- ```
181
-
182
- ## Project Structure
183
-
184
- ```
185
- RLVE_Gym/
186
- β”œβ”€β”€ __init__.py # Module exports
187
- β”œβ”€β”€ README.md # This file
188
- β”œβ”€β”€ openenv.yaml # OpenEnv manifest
189
- β”œβ”€β”€ pyproject.toml # Project metadata and dependencies
190
- β”œβ”€β”€ uv.lock # Locked dependencies (generated)
191
- β”œβ”€β”€ client.py # RlveGymEnv client implementation
192
- β”œβ”€β”€ models.py # Action and Observation models
193
- └── server/
194
- β”œβ”€β”€ __init__.py # Server module exports
195
- β”œβ”€β”€ RLVE_Gym_environment.py # Core environment logic
196
- β”œβ”€β”€ app.py # FastAPI application
197
- └── Dockerfile # Container image definition
198
- ```
 
1
  ---
2
+ title: RlveGym Environment Server
3
  emoji: πŸ“‘
4
  colorFrom: purple
5
  colorTo: blue
 
11
  - openenv
12
  ---
13
 
14
+ # RlveGym Environment
15
 
16
+ This package contains a collection of 400 verifiable environments from RLVE-Gym, introduced by the paper [*RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments*](https://arxiv.org/abs/2511.07317) (original GitHub repository is [here](https://github.com/Zhiyuan-Zeng/RLVE)).
17
 
18
  ## Quick Start
19
 
20
+ The simplest way to use RlveGym environment is through the `RlveGymEnv` class:
21
 
22
  ```python
23
  from RLVE_Gym import RlveGymAction, RlveGymEnv
 
28
 
29
  # Reset
30
  result = RLVE_Gymenv.reset()
31
+ print(f"Problem Prompt: {result.observation.problem_input}")
32
+ # Or:
33
+ print(f"Problem Prompt (from the environment's state): {RLVE_Gymenv.state().problem_input}")
34
+
35
+ # Send multiple outputs
36
+ outputs = [
37
+ "Wrong Format",
38
+ r"<answer>0</answer>", # Wrong Answer
39
+ r"<answer>" + str(RLVE_Gymenv.problem.parameter["reference_answer"]) + r"</answer>", # Correct Answer
40
+ ]
41
+
42
+ for output in outputs:
43
+ result = RLVE_Gymenv.step(RlveGymAction(output = output))
44
+ print(f"Sent: '{output}'")
45
+ print(f"Result: `{result}`")
46
 
47
  finally:
48
  # Always clean up
 
121
 
122
  ## Environment Details
123
 
124
+ ### Environment Initialization
125
+
126
+ Please check [here](server/RLVE_Gym_environment.py) for detailed usage:
127
+ - `environment_identifier` (str) - The environment's identifier. Check [here](server/Gym/environments/__init__.py) for detailed usage.
128
+ - `difficulty` (int) - The difficulty of generated problems.
129
+ - `answer_markers` (Tuple[str] of length 2) - How the environment extracts the final answer from a model output.
130
+ - `seed` (int) - The initial seed to use when generating the first problem. Whenever `reset()` is called, the seed will be incremented by 1.
131
+
132
  ### Action
133
  **RlveGymAction**: Contains a single field
134
+ - `output` (str) - The model's output to get verified.
135
+
136
+ ### State
137
+ **RlveGymState**:
138
+ - `seed` (int) - The seed to use when running `reset()`.
139
+ - `problem_input` (Optional[str]) - The input of the problem; if it is `None`, it means that the problem generation has not been run, or it failed.
140
+ - `num_samples` (int) and `sum_accuracy` (int) - The statistics of the result of `step(action)` so far for the current problem (the number of outputs sent to the verifier and the number of correct ones).
141
 
142
  ### Observation
143
+ **RlveGymObservation**:
144
+ - `problem_input` (Optional[str]) - The input of the problem; if it is `None`, it means that the problem generation has not been run, or it failed.
145
+ - `verifier_result` (Optional[dict]) - Contains `reward` as the raw reward, `accuracy` as the 0/1 correctness, and `format_score` as the 0/1 format correctness.
146
+ - `success` (bool) - `True` or `False` indicates whether the operation succeeds.
147
+ - `message` (str) - The explanation of `success`.
148
+ - `reward` (Optional[float]) - The value is `verifier_result["reward"]`.
 
 
 
 
 
 
149
 
150
  ## Advanced Usage
151
 
152
  ### Connecting to an Existing Server
153
 
154
+ If you already have an RlveGymEnv server running, you can connect directly:
155
 
156
  ```python
157
  from RLVE_Gym import RlveGymEnv
 
161
 
162
  # Use as normal
163
  result = RLVE_Gymenv.reset()
164
+ result = RLVE_Gymenv.step(RlveGymAction(output="Hello!"))
165
  ```
166
 
167
  Note: When connecting to an existing server, `RLVE_Gymenv.close()` will NOT stop the server.
 
189
 
190
  ```bash
191
  uvicorn server.app:app --reload
192
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
server/RLVE_Gym_environment.py CHANGED
@@ -54,8 +54,8 @@ class RlveGymEnvironment(Environment):
54
  Returns:
55
  problem_input: The generated problem input string (or None if generation failed)
56
  verifier_result: None
57
- success: Boolean indicating if the reset was successful
58
- message: Message indicating the result of the reset
59
  """
60
  if (self.environment_identifier not in identifier2environment) or (
61
  self.environment_identifier not in identifier2controller
@@ -140,8 +140,8 @@ class RlveGymEnvironment(Environment):
140
  Returns:
141
  problem_input: The problem input string from the current state
142
  verifier_result: Result of the verification containing accuracy and other metrics
143
- success: Boolean indicating if the step was successful
144
- message: Message indicating the result of the step
145
  """
146
  if self.problem is None:
147
  return RlveGymObservation(
 
54
  Returns:
55
  problem_input: The generated problem input string (or None if generation failed)
56
  verifier_result: None
57
+ success: Boolean indicating whether the reset was successful
58
+ message: The result of the reset
59
  """
60
  if (self.environment_identifier not in identifier2environment) or (
61
  self.environment_identifier not in identifier2controller
 
140
  Returns:
141
  problem_input: The problem input string from the current state
142
  verifier_result: Result of the verification containing accuracy and other metrics
143
+ success: Boolean indicating whether the step (verification) was successful
144
+ message: The result of the step
145
  """
146
  if self.problem is None:
147
  return RlveGymObservation(