Spaces:

ZhiyuanZeng
/

RLVE_Gym

Sleeping

ZhiyuanZeng commited on 7 days ago

Commit

59b6e0f

1 Parent(s): 56405c9

misc

Files changed (1) hide show

README.md CHANGED Viewed

@@ -162,11 +162,11 @@ RLVE_Gymenv = RlveGymEnv.from_docker_image(
 ### Observation
 **RlveGymObservation**:
-- `problem_input` (Optional[str]) - The input of the problem; if it is `None`, it means that the problem generation has not been run, or it failed.
-- `verifier_result` (Optional[dict]) - Contains `reward` as the raw reward, `accuracy` as the 0/1 correctness, and `format_score` as the 0/1 format correctness.
-- `success` (bool) - `True` or `False` indicates whether the operation succeeds.
 - `message` (str) - The explanation of `success`.
-- `reward` (Optional[float]) - The value is `verifier_result["reward"]`.
 ## Advanced Usage

 ### Observation
 **RlveGymObservation**:
+- `problem_input` (Optional[str]) - The input of the problem; if it is `None`, it means that the problem generation has not been run or has failed.
+- `verifier_result` (Optional[dict]) - Contains `reward` as the raw reward, `accuracy` as the 0/1 correctness, and `format_score` as the 0/1 format correctness; if it is `None`, it means that the verification has failed.
+- `success` (bool) - `True` or `False` indicates whether the operation succeeded.
 - `message` (str) - The explanation of `success`.
+- `reward` (Optional[float]) - The value is `verifier_result["reward"]` when `verifier_result` is not `None` (otherwise, `reward` is also `None`).
 ## Advanced Usage