Spaces:
Sleeping
Sleeping
| title: BrowserGym Environment Server | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| app_port: 8000 | |
| base_path: /web | |
| tags: | |
| - openenv | |
| - browsergym | |
| - web-automation | |
| - reinforcement-learning | |
| # BrowserGym Environment | |
| BrowserGym is a unified framework for web-based agent tasks that provides access to multiple benchmarks under a single Gymnasium-compatible API. This integration brings the complete training-to-evaluation pipeline for web agents into OpenEnv. | |
| ## Why BrowserGym? | |
| BrowserGym provides a complete pipeline for developing web agents: train on simple tasks, then evaluate on realistic websites. | |
| **What are these benchmarks?** | |
| - **MiniWoB++ (Training)**: 100+ synthetic web tasks like "click this button", "fill out this form", "select from dropdown". Each task is a simple webpage with a clear objective. Fast resets, randomized variations, dense rewards. Perfect for learning basic web navigation skills. **No external setup needed** - tasks run in isolated browser sessions. | |
| - **WebArena (Evaluation)**: 812 tasks on real websites (e-commerce, forums, GitLab, Wikipedia). Tasks like "find the cheapest laptop and add to cart" or "create a merge request for bug #123". Multistep, requires reasoning, sparse rewards. Tests if your agent can handle actual websites. **Requires running 7 backend services** (shopping site, GitLab instance, etc.). | |
| - **VisualWebArena**: Similar to WebArena but requires visual understanding - agents need to interpret images, identify UI elements visually, handle multimodal content. | |
| - **WorkArena**: Enterprise software tasks (CRM, project management, business workflows). Tests automation on corporate-style applications. | |
| **The training β evaluation pipeline:** | |
| 1. Train on MiniWoB (simple, controlled, fast iterations) | |
| 2. Evaluate on WebArena (complex, realistic, measures real-world capability) | |
| **Key advantage**: You can start training immediately with MiniWoB. No need to set up infrastructure just to test if your code works. | |
| ## Quick Start - Training (MiniWoB) | |
| ### No Setup Required! π | |
| ```python | |
| from envs.browsergym_env import BrowserGymEnv, BrowserGymAction | |
| # Create environment for MiniWoB training task | |
| env = BrowserGymEnv.from_docker_image( | |
| "ghcr.io/openenv/browsergym-env:latest", | |
| environment={ | |
| "BROWSERGYM_BENCHMARK": "miniwob", | |
| "BROWSERGYM_TASK_NAME": "click-test", # or "click-button", "click-dialog", etc. | |
| } | |
| ) | |
| # Train your agent! | |
| for episode in range(1000): | |
| result = env.reset() | |
| print(f"Goal: {result.observation.goal}") | |
| done = False | |
| while not done: | |
| # Your agent decides what to do | |
| action_str = agent.get_action(result.observation.text) | |
| action = BrowserGymAction(action_str=action_str) | |
| result = env.step(action) | |
| done = result.done | |
| print(f"Reward: {result.reward}") | |
| env.close() | |
| ``` | |
| ### Available Tasks by Benchmark | |
| #### MiniWoB++ Tasks (Training - 100+ tasks) | |
| MiniWoB tasks are organized by difficulty and type. Here are the main categories: | |
| **Click Tasks** (Basic interaction) | |
| | Task Name | Description | Difficulty | | |
| |-----------|-------------|------------| | |
| | `click-test` | Click a single button | β Easy | | |
| | `click-button` | Click button with specific text | β Easy | | |
| | `click-button-sequence` | Click buttons in order | ββ Medium | | |
| | `click-checkboxes` | Select specific checkboxes | ββ Medium | | |
| | `click-checkboxes-soft` | Select checkboxes (multiple valid) | ββ Medium | | |
| | `click-checkboxes-large` | Many checkboxes to select from | ββ Medium | | |
| | `click-checkboxes-transfer` | Transfer learning variation | ββ Medium | | |
| | `click-dialog` | Click correct button in dialog | β Easy | | |
| | `click-dialog-2` | More complex dialog | ββ Medium | | |
| | `click-link` | Click on a link | β Easy | | |
| | `click-option` | Select from dropdown | ββ Medium | | |
| | `click-pie` | Click on pie chart slice | ββ Medium | | |
| | `click-scroll-list` | Click item in scrollable list | βββ Hard | | |
| | `click-shades` | Click on specific color shade | ββ Medium | | |
| | `click-shape` | Click on specific shape | ββ Medium | | |
| | `click-tab` | Switch between tabs | ββ Medium | | |
| | `click-tab-2` | More complex tab switching | βββ Hard | | |
| | `click-widget` | Click on UI widget | ββ Medium | | |
| **Text Entry Tasks** (Typing and forms) | |
| | Task Name | Description | Difficulty | | |
| |-----------|-------------|------------| | |
| | `enter-text` | Type text into input field | β Easy | | |
| | `enter-text-dynamic` | Dynamic text entry | ββ Medium | | |
| | `enter-text-2` | Multiple text fields | ββ Medium | | |
| | `enter-password` | Fill password field | β Easy | | |
| | `enter-date` | Enter a date | ββ Medium | | |
| | `enter-time` | Enter a time | ββ Medium | | |
| | `login-user` | Complete login form | ββ Medium | | |
| | `login-user-popup` | Login via popup | βββ Hard | | |
| **Navigation Tasks** (Multi-step interaction) | |
| | Task Name | Description | Difficulty | | |
| |-----------|-------------|------------| | |
| | `navigate-tree` | Navigate through tree structure | βββ Hard | | |
| | `search-engine` | Use search interface | ββ Medium | | |
| | `use-autocomplete` | Interact with autocomplete | βββ Hard | | |
| | `book-flight` | Book a flight (complex form) | ββββ Very Hard | | |
| | `choose-date` | Pick date from calendar | βββ Hard | | |
| | `choose-date-easy` | Simplified date picker | ββ Medium | | |
| | `choose-date-medium` | Medium difficulty date picker | βββ Hard | | |
| | `choose-list` | Select from long list | ββ Medium | | |
| **Visual/Spatial Tasks** (Requires visual understanding) | |
| | Task Name | Description | Difficulty | | |
| |-----------|-------------|------------| | |
| | `count-sides` | Count sides of shape | ββ Medium | | |
| | `count-shape` | Count specific shapes | ββ Medium | | |
| | `find-word` | Find word in text | ββ Medium | | |
| | `focus-text` | Focus on text element | β Easy | | |
| | `focus-text-2` | More complex focus task | ββ Medium | | |
| | `grid-coordinate` | Click grid coordinate | ββ Medium | | |
| | `guess-number` | Guess a number game | βββ Hard | | |
| | `identify-shape` | Identify shape type | ββ Medium | | |
| | `read-table` | Extract info from table | βββ Hard | | |
| | `read-table-2` | More complex table reading | βββ Hard | | |
| **Email/Social Tasks** (Realistic scenarios) | |
| | Task Name | Description | Difficulty | | |
| |-----------|-------------|------------| | |
| | `email-inbox` | Manage email inbox | ββββ Very Hard | | |
| | `email-inbox-forward` | Forward emails | ββββ Very Hard | | |
| | `email-inbox-nl` | Natural language email task | ββββ Very Hard | | |
| | `email-inbox-star-reply` | Star and reply to emails | ββββ Very Hard | | |
| | `social-media` | Social media interaction | ββββ Very Hard | | |
| | `social-media-some` | Partial social media task | βββ Hard | | |
| **Total:** 100+ tasks across all categories | |
| **Usage:** | |
| ```python | |
| # Easy task for quick testing | |
| env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-test"}) | |
| # Medium difficulty for training | |
| env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-checkboxes"}) | |
| # Hard task for evaluation | |
| env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "email-inbox"}) | |
| ``` | |
| #### WebArena Tasks (Evaluation - 812 tasks) | |
| WebArena tasks are organized by website and difficulty. Tasks are numbered 0-811. | |
| **By Website:** | |
| | Website | Task Count | Description | Example Tasks | | |
| |---------|------------|-------------|---------------| | |
| | Shopping | ~200 | E-commerce site | Search products, add to cart, checkout | | |
| | Shopping Admin | ~150 | Admin panel | Manage products, orders, customers | | |
| | Reddit | ~150 | Forum/social | Post, comment, search discussions | | |
| | GitLab | ~200 | Code repository | Create issues, merge requests, review code | | |
| | Wikipedia | ~100 | Knowledge base | Search, read, extract information | | |
| | Map | ~12 | Location service | Find places, get directions | | |
| **By Difficulty:** | |
| | Difficulty | Task Count | Steps Required | Example | | |
| |------------|------------|----------------|---------| | |
| | Easy | ~200 | 1-5 steps | "Find the price of product X" | | |
| | Medium | ~400 | 5-15 steps | "Add cheapest laptop to cart" | | |
| | Hard | ~212 | 15+ steps | "Create merge request for bug fix" | | |
| **Usage:** | |
| ```python | |
| # Task 0 (usually easy) | |
| env = BrowserGymEnv(environment={ | |
| "BROWSERGYM_BENCHMARK": "webarena", | |
| "BROWSERGYM_TASK_NAME": "0", | |
| "SHOPPING": "http://your-server:7770", | |
| # ... other URLs | |
| }) | |
| # Task 156 (GitLab merge request) | |
| env = BrowserGymEnv(environment={ | |
| "BROWSERGYM_BENCHMARK": "webarena", | |
| "BROWSERGYM_TASK_NAME": "156", | |
| # ... URLs | |
| }) | |
| ``` | |
| **Note:** WebArena tasks require the full backend infrastructure. See [WebArena setup guide](https://github.com/web-arena-x/webarena/tree/main/environment_docker). | |
| #### VisualWebArena Tasks (910 tasks) | |
| Similar to WebArena but requires visual understanding. Tasks involve: | |
| - Image-based reasoning | |
| - Visual element identification | |
| - Multimodal interaction (text + images) | |
| #### WorkArena Tasks | |
| Enterprise software automation tasks: | |
| - CRM operations | |
| - Project management | |
| - Business workflows | |
| **Full task lists:** | |
| - [MiniWoB++ tasks](https://github.com/Farama-Foundation/miniwob-plusplus/tree/master/miniwob/environment) | |
| - [WebArena tasks](https://github.com/web-arena-x/webarena/blob/main/config_files/) | |
| - [BrowserGym documentation](https://github.com/ServiceNow/BrowserGym) | |
| ## Evaluation (WebArena) | |
| ### Prerequisites | |
| WebArena requires setting up backend infrastructure. See the [WebArena documentation](https://github.com/web-arena-x/webarena/tree/main/environment_docker). | |
| ### Usage | |
| ```python | |
| from envs.browsergym_env import BrowserGymEnv, BrowserGymAction | |
| # Create environment for WebArena evaluation | |
| env = BrowserGymEnv.from_docker_image( | |
| "ghcr.io/openenv/browsergym-env:latest", | |
| environment={ | |
| "BROWSERGYM_BENCHMARK": "webarena", | |
| "BROWSERGYM_TASK_NAME": "0", # Task ID | |
| # WebArena backend URLs (required) | |
| "SHOPPING": "http://your-server:7770", | |
| "SHOPPING_ADMIN": "http://your-server:7780/admin", | |
| "REDDIT": "http://your-server:9999", | |
| "GITLAB": "http://your-server:8023", | |
| "MAP": "http://your-server:3000", | |
| "WIKIPEDIA": "http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing", | |
| "HOMEPAGE": "http://your-server:4399", | |
| } | |
| ) | |
| # Evaluate your trained agent | |
| result = env.reset() | |
| while not result.done: | |
| action_str = agent.get_action(result.observation) | |
| action = BrowserGymAction(action_str=action_str) | |
| result = env.step(action) | |
| print(f"Success: {result.reward}") | |
| env.close() | |
| ``` | |
| ## Building the Docker Image | |
| ### Prerequisites | |
| 1. **Base Image**: Build the OpenEnv base image first: | |
| ```bash | |
| # From the OpenEnv repository root | |
| docker build -t openenv-base:latest -f src/core/containers/images/Dockerfile . | |
| ``` | |
| ### Build the BrowserGym Environment | |
| ```bash | |
| # From the OpenEnv repository root | |
| docker build -t browsergym-env:latest -f src/envs/browsergym_env/server/Dockerfile . | |
| ``` | |
| ### Run the Server | |
| #### For MiniWoB (Training): | |
| ```bash | |
| docker run -p 8000:8000 \ | |
| -e BROWSERGYM_BENCHMARK="miniwob" \ | |
| -e BROWSERGYM_TASK_NAME="click-test" \ | |
| browsergym-env:latest | |
| ``` | |
| #### For WebArena (Evaluation): | |
| ```bash | |
| docker run -p 8000:8000 \ | |
| -e BROWSERGYM_BENCHMARK="webarena" \ | |
| -e BROWSERGYM_TASK_NAME="0" \ | |
| -e SHOPPING="http://your-server:7770" \ | |
| -e SHOPPING_ADMIN="http://your-server:7780/admin" \ | |
| -e REDDIT="http://your-server:9999" \ | |
| -e GITLAB="http://your-server:8023" \ | |
| -e MAP="http://your-server:3000" \ | |
| -e WIKIPEDIA="http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing" \ | |
| -e HOMEPAGE="http://your-server:4399" \ | |
| browsergym-env:latest | |
| ``` | |
| ## Environment Details | |
| ### Action | |
| Actions in BrowserGym are natural language strings that describe browser operations: | |
| ```python | |
| from envs.browsergym_env import BrowserGymAction | |
| # Click actions | |
| action = BrowserGymAction(action_str="click('Submit button')") | |
| action = BrowserGymAction(action_str="click('element_id_123')") | |
| # Type actions | |
| action = BrowserGymAction(action_str="fill('username', '[email protected]')") | |
| action = BrowserGymAction(action_str="fill('password', 'secret123')") | |
| # Navigate actions | |
| action = BrowserGymAction(action_str="goto('https://example.com')") | |
| # Keyboard actions | |
| action = BrowserGymAction(action_str="press('Enter')") | |
| action = BrowserGymAction(action_str="press('Tab')") | |
| # Scroll actions | |
| action = BrowserGymAction(action_str="scroll('down')") | |
| ``` | |
| ### Observation | |
| Observations contain multiple modalities: | |
| ```python | |
| result = env.step(action) | |
| obs = result.observation | |
| # Text observations | |
| print(obs.text) # Primary text representation (AXTree or DOM) | |
| print(obs.axtree_txt) # Accessibility tree | |
| print(obs.pruned_html) # Pruned HTML (interactive elements only) | |
| # Page metadata | |
| print(obs.url) # Current URL | |
| print(obs.goal) # Task goal/instruction | |
| # Visual (if enabled) | |
| if obs.screenshot is not None: | |
| print(obs.screenshot.shape) # [height, width, channels] | |
| # Error handling | |
| if obs.last_action_error: | |
| print(f"Action failed: {obs.error}") | |
| # Episode status | |
| print(obs.done) # True if episode ended | |
| print(obs.reward) # Reward for the step | |
| # Access full BrowserGym data (includes timestamps, etc.) | |
| print(obs.metadata["browsergym_obs"]) # Full observation dict from BrowserGym | |
| print(obs.metadata["browsergym_info"]) # Full info dict (timestamps, page state, etc.) | |
| ``` | |
| #### Advanced: Accessing Raw BrowserGym Data | |
| For VisualWebArena or custom training, you may need additional data like timestamps or browser state. The full BrowserGym observation and info dicts are preserved in `metadata`: | |
| ```python | |
| result = env.step(action) | |
| # Access timestamps (if available) | |
| info = result.observation.metadata["browsergym_info"] | |
| if "timestamp" in info: | |
| print(f"Action timestamp: {info['timestamp']}") | |
| # Access additional observation fields | |
| obs_dict = result.observation.metadata["browsergym_obs"] | |
| if "dom_object" in obs_dict: | |
| dom = obs_dict["dom_object"] | |
| # Work with raw DOM object | |
| # Access page performance data | |
| if "performance" in info: | |
| print(f"Page load time: {info['performance']}") | |
| ``` | |
| ### State | |
| The environment state tracks progress: | |
| ```python | |
| state = env.state() | |
| print(f"Benchmark: {state.benchmark}") # 'miniwob', 'webarena', etc. | |
| print(f"Task: {state.task_name}") # Task name/ID | |
| print(f"Episode: {state.episode_id}") # Unique episode ID | |
| print(f"Steps: {state.step_count}") # Number of steps taken | |
| print(f"Total Reward: {state.cum_reward}") # Cumulative reward | |
| print(f"Goal: {state.goal}") # Task instruction | |
| print(f"URL: {state.current_url}") # Current page URL | |
| ``` | |
| ## Configuration | |
| Environment variables: | |
| ### Common Settings | |
| - `BROWSERGYM_BENCHMARK`: Benchmark to use (`miniwob`, `webarena`, `visualwebarena`, `workarena`) | |
| - `BROWSERGYM_TASK_NAME`: Specific task name (optional, will use first available if not set) | |
| - `BROWSERGYM_HEADLESS`: Run browser in headless mode (default: `true`) | |
| - `BROWSERGYM_VIEWPORT_WIDTH`: Browser viewport width (default: `1280`) | |
| - `BROWSERGYM_VIEWPORT_HEIGHT`: Browser viewport height (default: `720`) | |
| - `BROWSERGYM_TIMEOUT`: Action timeout in milliseconds (default: `10000`) | |
| ### WebArena-Specific (only needed for WebArena benchmark) | |
| - `SHOPPING`: Shopping website URL | |
| - `SHOPPING_ADMIN`: Shopping admin panel URL | |
| - `REDDIT`: Reddit-like forum URL | |
| - `GITLAB`: GitLab instance URL | |
| - `MAP`: Map service URL | |
| - `WIKIPEDIA`: Wikipedia instance URL | |
| - `HOMEPAGE`: Homepage URL | |
| ## Supported Benchmarks | |
| ### 1. MiniWoB++ (Training) β Recommended for Training | |
| - **100+ tasks** ranging from simple (click buttons) to complex (form filling, navigation) | |
| - **Fast**: Instant resets, quick episodes | |
| - **Randomized**: Task variations for generalization | |
| - **No setup**: Works out-of-the-box | |
| - **Dense rewards**: Immediate feedback for learning | |
| **Use Case**: Train agents on fundamental web navigation skills | |
| ### 2. WebArena (Evaluation) π Benchmark | |
| - **812 realistic tasks** across 6 websites | |
| - **Complex**: Multi-step reasoning, real web interfaces | |
| - **Requires setup**: Need to run 7 backend services | |
| - **Sparse rewards**: Binary success/failure | |
| - **Evaluation-focused**: Test real-world performance | |
| **Use Case**: Evaluate agents on realistic web tasks | |
| ### 3. VisualWebArena (Evaluation) ποΈ Visual Benchmark | |
| - **910 tasks** requiring visual understanding | |
| - **Multimodal**: Both text and visual observations | |
| - **Requires setup**: Similar to WebArena | |
| - **Challenging**: Requires visual reasoning | |
| **Use Case**: Test visual web navigation capabilities | |
| ### 4. WorkArena (Evaluation) πΌ Enterprise Benchmark | |
| - **Enterprise tasks**: CRM, project management, etc. | |
| - **Realistic workflows**: Real enterprise software | |
| - **Requires setup**: Enterprise software instances | |
| **Use Case**: Evaluate on business automation tasks | |
| ## Typical Training Pipeline | |
| ```python | |
| from envs.browsergym_env import BrowserGymEnv, BrowserGymAction | |
| # Stage 1: Train on MiniWoB (simple tasks, fast) | |
| train_env = BrowserGymEnv.from_docker_image( | |
| "browsergym-env:latest", | |
| environment={ | |
| "BROWSERGYM_BENCHMARK": "miniwob", | |
| "BROWSERGYM_TASK_NAME": "click-button", | |
| } | |
| ) | |
| # Train your agent (RL, imitation learning, etc.) | |
| agent.train(train_env, num_episodes=10000) | |
| train_env.close() | |
| # Stage 2: Evaluate on WebArena (complex tasks, realistic) | |
| eval_env = BrowserGymEnv.from_docker_image( | |
| "browsergym-env:latest", | |
| environment={ | |
| "BROWSERGYM_BENCHMARK": "webarena", | |
| "BROWSERGYM_TASK_NAME": "0", | |
| # ... WebArena URLs | |
| } | |
| ) | |
| # Test performance | |
| success_rate = agent.evaluate(eval_env, num_tasks=812) | |
| print(f"WebArena Success Rate: {success_rate:.2%}") | |
| eval_env.close() | |
| ``` | |
| ## Development & Testing | |
| ### Running Tests | |
| ```bash | |
| # From the OpenEnv repository root | |
| pytest tests/envs/test_browsergym_env.py | |
| ``` | |
| ### Local Development | |
| ```bash | |
| # Install in development mode | |
| cd /path/to/OpenEnv | |
| pip install -e . | |
| # Install BrowserGym | |
| pip install browsergym browsergym-miniwob browsergym-webarena | |
| # Run the server locally | |
| cd src/envs/browsergym_env/server | |
| export BROWSERGYM_BENCHMARK=miniwob | |
| export BROWSERGYM_TASK_NAME=click-test | |
| python app.py | |
| ``` | |
| ## Project Structure | |
| ``` | |
| browsergym_env/ | |
| βββ __init__.py # Module exports | |
| βββ models.py # Action, Observation, State dataclasses | |
| βββ client.py # HTTPEnvClient implementation | |
| βββ README.md # This file | |
| βββ server/ | |
| βββ __init__.py | |
| βββ app.py # FastAPI application | |
| βββ browsergym_environment.py # Environment implementation | |
| βββ Dockerfile # Container specification | |
| βββ requirements.txt # Python dependencies | |
| ``` | |
| ## References | |
| - [BrowserGym GitHub](https://github.com/ServiceNow/BrowserGym) | |
| - [MiniWoB++ Paper](https://arxiv.org/abs/1802.08802) | |
| - [WebArena Paper](https://arxiv.org/abs/2307.13854) | |
| - [WebArena Website](https://webarena.dev/) | |
| - [VisualWebArena Paper](https://jykoh.com/vwa) | |
| - [OpenEnv Documentation](https://github.com/meta-pytorch/OpenEnv) | |