Spaces:

ZhiyuanZeng
/

RLVE_Gym

Sleeping

App Files Files Community

ZhiyuanZeng commited on 8 days ago

Commit

eae0874

verified ·

1 Parent(s): 1cbc0f5

Upload folder using huggingface_hub

Browse files

Files changed (2) hide show

README.md +43 -49
server/RLVE_Gym_environment.py +4 -4

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Rlve Gym Environment Server
 emoji: 📡
 colorFrom: purple
 colorTo: blue
@@ -11,13 +11,13 @@ tags:
   - openenv
 ---
-# Rlve Gym Environment
-A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
 ## Quick Start
-The simplest way to use the Rlve Gym environment is through the `RlveGymEnv` class:
 ```python
 from RLVE_Gym import RlveGymAction, RlveGymEnv
@@ -28,17 +28,21 @@ try:
     # Reset
     result = RLVE_Gymenv.reset()
-    print(f"Reset: {result.observation.echoed_message}")
-    # Send multiple messages
-    messages = ["Hello, World!", "Testing echo", "Final message"]
-    for msg in messages:
-        result = RLVE_Gymenv.step(RlveGymAction(message=msg))
-        print(f"Sent: '{msg}'")
-        print(f"  → Echoed: '{result.observation.echoed_message}'")
-        print(f"  → Length: {result.observation.message_length}")
-        print(f"  → Reward: {result.reward}")
 finally:
     # Always clean up
@@ -117,29 +121,37 @@ The deployed space includes:
 ## Environment Details
 ### Action
 **RlveGymAction**: Contains a single field
-- `message` (str) - The message to echo back
 ### Observation
-**RlveGymObservation**: Contains the echo response and metadata
-- `echoed_message` (str) - The message echoed back
-- `message_length` (int) - Length of the message
-- `reward` (float) - Reward based on message length (length × 0.1)
-- `done` (bool) - Always False for echo environment
-- `metadata` (dict) - Additional info like step count
-### Reward
-The reward is calculated as: `message_length × 0.1`
-- "Hi" → reward: 0.2
-- "Hello, World!" → reward: 1.3
-- Empty message → reward: 0.0
 ## Advanced Usage
 ### Connecting to an Existing Server
-If you already have a Rlve Gym environment server running, you can connect directly:
 ```python
 from RLVE_Gym import RlveGymEnv
@@ -149,7 +161,7 @@ RLVE_Gymenv = RlveGymEnv(base_url="<ENV_HTTP_URL_HERE>")
 # Use as normal
 result = RLVE_Gymenv.reset()
-result = RLVE_Gymenv.step(RlveGymAction(message="Hello!"))
 ```
 Note: When connecting to an existing server, `RLVE_Gymenv.close()` will NOT stop the server.
@@ -177,22 +189,4 @@ Run the server locally for development:
 ```bash
 uvicorn server.app:app --reload
-```
-## Project Structure
-```
-RLVE_Gym/
-├── __init__.py            # Module exports
-├── README.md              # This file
-├── openenv.yaml           # OpenEnv manifest
-├── pyproject.toml         # Project metadata and dependencies
-├── uv.lock                # Locked dependencies (generated)
-├── client.py              # RlveGymEnv client implementation
-├── models.py              # Action and Observation models
-└── server/
-    ├── __init__.py        # Server module exports
-    ├── RLVE_Gym_environment.py  # Core environment logic
-    ├── app.py             # FastAPI application
-    └── Dockerfile         # Container image definition
-```

 ---
+title: RlveGym Environment Server
 emoji: 📡
 colorFrom: purple
 colorTo: blue
   - openenv
 ---
+# RlveGym Environment
+This package contains a collection of 400 verifiable environments from RLVE-Gym, introduced by the paper [*RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments*](https://arxiv.org/abs/2511.07317) (original GitHub repository is [here](https://github.com/Zhiyuan-Zeng/RLVE)).
 ## Quick Start
+The simplest way to use RlveGym environment is through the `RlveGymEnv` class:
 ```python
 from RLVE_Gym import RlveGymAction, RlveGymEnv
     # Reset
     result = RLVE_Gymenv.reset()
+    print(f"Problem Prompt: {result.observation.problem_input}")
+    # Or:
+    print(f"Problem Prompt (from the environment's state): {RLVE_Gymenv.state().problem_input}")
+    # Send multiple outputs
+    outputs = [
+        "Wrong Format",
+        r"<answer>0</answer>", # Wrong Answer
+        r"<answer>" + str(RLVE_Gymenv.problem.parameter["reference_answer"]) + r"</answer>", # Correct Answer
+    ]
+    for output in outputs:
+        result = RLVE_Gymenv.step(RlveGymAction(output = output))
+        print(f"Sent: '{output}'")
+        print(f"Result: `{result}`")
 finally:
     # Always clean up
 ## Environment Details
+### Environment Initialization
+Please check [here](server/RLVE_Gym_environment.py) for detailed usage:
+- `environment_identifier` (str) - The environment's identifier. Check [here](server/Gym/environments/__init__.py) for detailed usage.
+- `difficulty` (int) - The difficulty of generated problems.
+- `answer_markers` (Tuple[str] of length 2) - How the environment extracts the final answer from a model output.
+- `seed` (int) - The initial seed to use when generating the first problem. Whenever `reset()` is called, the seed will be incremented by 1.
 ### Action
 **RlveGymAction**: Contains a single field
+- `output` (str) - The model's output to get verified.
+### State
+**RlveGymState**:
+- `seed` (int) - The seed to use when running `reset()`.
+- `problem_input` (Optional[str]) - The input of the problem; if it is `None`, it means that the problem generation has not been run, or it failed.
+- `num_samples` (int) and `sum_accuracy` (int) - The statistics of the result of `step(action)` so far for the current problem (the number of outputs sent to the verifier and the number of correct ones).
 ### Observation
+**RlveGymObservation**:
+- `problem_input` (Optional[str]) - The input of the problem; if it is `None`, it means that the problem generation has not been run, or it failed.
+- `verifier_result` (Optional[dict]) - Contains `reward` as the raw reward, `accuracy` as the 0/1 correctness, and `format_score` as the 0/1 format correctness.
+- `success` (bool) - `True` or `False` indicates whether the operation succeeds.
+- `message` (str) - The explanation of `success`.
+- `reward` (Optional[float]) - The value is `verifier_result["reward"]`.
 ## Advanced Usage
 ### Connecting to an Existing Server
+If you already have an RlveGymEnv server running, you can connect directly:
 ```python
 from RLVE_Gym import RlveGymEnv
 # Use as normal
 result = RLVE_Gymenv.reset()
+result = RLVE_Gymenv.step(RlveGymAction(output="Hello!"))
 ```
 Note: When connecting to an existing server, `RLVE_Gymenv.close()` will NOT stop the server.
 ```bash
 uvicorn server.app:app --reload
+```

server/RLVE_Gym_environment.py CHANGED Viewed

@@ -54,8 +54,8 @@ class RlveGymEnvironment(Environment):
         Returns:
             problem_input: The generated problem input string (or None if generation failed)
             verifier_result: None
-            success: Boolean indicating if the reset was successful
-            message: Message indicating the result of the reset
         """
         if (self.environment_identifier not in identifier2environment) or (
             self.environment_identifier not in identifier2controller
@@ -140,8 +140,8 @@ class RlveGymEnvironment(Environment):
         Returns:
             problem_input: The problem input string from the current state
             verifier_result: Result of the verification containing accuracy and other metrics
-            success: Boolean indicating if the step was successful
-            message: Message indicating the result of the step
         """
         if self.problem is None:
             return RlveGymObservation(

         Returns:
             problem_input: The generated problem input string (or None if generation failed)
             verifier_result: None
+            success: Boolean indicating whether the reset was successful
+            message: The result of the reset
         """
         if (self.environment_identifier not in identifier2environment) or (
             self.environment_identifier not in identifier2controller
         Returns:
             problem_input: The problem input string from the current state
             verifier_result: Result of the verification containing accuracy and other metrics
+            success: Boolean indicating whether the step (verification) was successful
+            message: The result of the step
         """
         if self.problem is None:
             return RlveGymObservation(