ekurtic's picture
Upload folder using huggingface_hub
2ff4418 verified
INFO 10-21 08:52:02 [__init__.py:225] Automatically detected platform cuda.
[2025-10-21 08:52:06] INFO __main__.py:429: Passed `--trust_remote_code`, setting environment variable `HF_DATASETS_TRUST_REMOTE_CODE=true`
[2025-10-21 08:52:06] INFO __main__.py:446: Selected Tasks: ['gsm8k']
[2025-10-21 08:52:06] INFO evaluator.py:202: Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
[2025-10-21 08:52:06] INFO evaluator.py:240: Initializing vllm model, with arguments: {'pretrained': '/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue',
'tensor_parallel_size': 1, 'trust_remote_code': True}
INFO 10-21 08:52:06 [utils.py:243] non-default args: {'trust_remote_code': True, 'seed': 1234, 'disable_log_stats': True, 'model': '/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue'}
INFO 10-21 08:52:06 [model.py:663] Resolved architecture: NemotronHForCausalLM
INFO 10-21 08:52:06 [model.py:1751] Using max model len 131072
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
INFO 10-21 08:52:07 [scheduler.py:225] Chunked prefill is enabled with max_num_batched_tokens=16384.
INFO 10-21 08:52:07 [config.py:324] Disabling cascade attention since it is not supported for hybrid models.
INFO 10-21 08:52:07 [config.py:440] Setting attention block size to 672 tokens to ensure that attention page size is >= mamba page size.
INFO 10-21 08:52:07 [config.py:464] Padding mamba page size by 2.13% to ensure that mamba page size and attention page size are exactly equal.
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:08 [core.py:730] Waiting for init message from front-end.
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:08 [core.py:97] Initializing a V1 LLM engine (v0.11.1rc2.dev191+g80e945298) with config: model='/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue', speculative_config=None, tokenizer='/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=1234, served_model_name=/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue, enable_prefix_caching=False, chunked_prefill_enabled=True, pooler_config=None, compilation_config={'level': None, 'mode': 3, 'debug_dump_path': None, 'cache_dir': '', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention', 'vllm::sparse_attn_indexer'], 'use_inductor': None, 'compile_sizes': [], 'inductor_compile_config': {'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'use_cudagraph': True, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [512, 504, 496, 488, 480, 472, 464, 456, 448, 440, 432, 424, 416, 408, 400, 392, 384, 376, 368, 360, 352, 344, 336, 328, 320, 312, 304, 296, 288, 280, 272, 264, 256, 248, 240, 232, 224, 216, 208, 200, 192, 184, 176, 168, 160, 152, 144, 136, 128, 120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 4, 2, 1], 'cudagraph_copy_inputs': False, 'full_cuda_graph': True, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {}, 'max_capture_size': 512, 'local_cache_dir': None}
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:10 [parallel_state.py:1325] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:12 [gpu_model_runner.py:2860] Starting to load model /mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue...
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:12 [compressed_tensors_wNa16.py:108] Using MacheteLinearKernel for CompressedTensorsWNA16
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:12 [compressed_tensors_wNa16.py:108] Using MarlinLinearKernel for CompressedTensorsWNA16
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:12 [cuda.py:403] Using Flash Attention backend on V1 engine.
(EngineCore_DP0 pid=1751394) Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
(EngineCore_DP0 pid=1751394) Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:02<00:02, 2.02s/it]
(EngineCore_DP0 pid=1751394) Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00, 1.28s/it]
(EngineCore_DP0 pid=1751394) Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00, 1.39s/it]
(EngineCore_DP0 pid=1751394)
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:15 [default_loader.py:314] Loading weights took 2.83 seconds
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:16 [gpu_model_runner.py:2921] Model loading took 6.0475 GiB and 3.570261 seconds
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:19 [backends.py:609] Using cache directory: /home/eldar/.cache/vllm/torch_compile_cache/5a2b29b2c9/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:19 [backends.py:623] Dynamo bytecode transform time: 2.31 s
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:20 [backends.py:248] Cache the graph for dynamic shape for later use
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:26 [backends.py:275] Compiling a graph for dynamic shape takes 6.35 s
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:28 [monitor.py:34] torch.compile takes 8.66 s in total
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:30 [gpu_worker.py:316] Available KV cache memory: 60.15 GiB
(EngineCore_DP0 pid=1751394) WARNING 10-21 08:52:30 [kv_cache_utils.py:951] Add 1 padding layers, may waste at most 3.70% KV cache memory
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:30 [kv_cache_utils.py:1201] GPU KV cache size: 492,576 tokens
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:30 [kv_cache_utils.py:1206] Maximum concurrency for 131,072 tokens per request: 28.90x
(EngineCore_DP0 pid=1751394) 2025-10-21 08:52:30,487 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ...
(EngineCore_DP0 pid=1751394) 2025-10-21 08:52:31,099 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends
(EngineCore_DP0 pid=1751394) Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/67 [00:00<?, ?it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 3%|β–Ž | 2/67 [00:00<00:03, 19.77it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 7%|β–‹ | 5/67 [00:00<00:02, 21.16it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 12%|β–ˆβ– | 8/67 [00:00<00:02, 21.46it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 16%|β–ˆβ–‹ | 11/67 [00:00<00:02, 21.87it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 21%|β–ˆβ–ˆ | 14/67 [00:00<00:02, 22.16it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 25%|β–ˆβ–ˆβ–Œ | 17/67 [00:00<00:02, 22.39it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 30%|β–ˆβ–ˆβ–‰ | 20/67 [00:00<00:02, 22.53it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 34%|β–ˆβ–ˆβ–ˆβ– | 23/67 [00:01<00:01, 22.63it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 26/67 [00:01<00:01, 22.80it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 29/67 [00:01<00:01, 22.90it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 32/67 [00:01<00:01, 23.26it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 35/67 [00:01<00:01, 24.66it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 39/67 [00:01<00:01, 26.62it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 43/67 [00:01<00:00, 28.14it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 46/67 [00:02<00:01, 11.87it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 50/67 [00:02<00:01, 15.16it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 53/67 [00:02<00:00, 17.44it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 57/67 [00:02<00:00, 20.57it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 61/67 [00:02<00:00, 22.91it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 65/67 [00:03<00:00, 25.25it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 67/67 [00:03<00:00, 21.65it/s]
(EngineCore_DP0 pid=1751394) Capturing CUDA graphs (decode, FULL): 0%| | 0/67 [00:00<?, ?it/s] Capturing CUDA graphs (decode, FULL): 1%|▏ | 1/67 [00:00<00:51, 1.27it/s] Capturing CUDA graphs (decode, FULL): 3%|β–Ž | 2/67 [00:01<00:35, 1.83it/s] Capturing CUDA graphs (decode, FULL): 4%|▍ | 3/67 [00:01<00:21, 2.91it/s] Capturing CUDA graphs (decode, FULL): 6%|β–Œ | 4/67 [00:01<00:28, 2.25it/s] Capturing CUDA graphs (decode, FULL): 9%|β–‰ | 6/67 [00:02<00:15, 3.84it/s] Capturing CUDA graphs (decode, FULL): 12%|β–ˆβ– | 8/67 [00:02<00:11, 5.24it/s] Capturing CUDA graphs (decode, FULL): 15%|β–ˆβ– | 10/67 [00:02<00:08, 6.50it/s] Capturing CUDA graphs (decode, FULL): 18%|β–ˆβ–Š | 12/67 [00:02<00:07, 7.56it/s] Capturing CUDA graphs (decode, FULL): 21%|β–ˆβ–ˆ | 14/67 [00:02<00:06, 8.50it/s] Capturing CUDA graphs (decode, FULL): 24%|β–ˆβ–ˆβ– | 16/67 [00:02<00:05, 9.27it/s] Capturing CUDA graphs (decode, FULL): 27%|β–ˆβ–ˆβ–‹ | 18/67 [00:03<00:04, 9.92it/s] Capturing CUDA graphs (decode, FULL): 30%|β–ˆβ–ˆβ–‰ | 20/67 [00:03<00:04, 10.48it/s] Capturing CUDA graphs (decode, FULL): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 22/67 [00:03<00:04, 10.91it/s] Capturing CUDA graphs (decode, FULL): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 24/67 [00:03<00:03, 11.35it/s] Capturing CUDA graphs (decode, FULL): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 26/67 [00:03<00:03, 11.75it/s] Capturing CUDA graphs (decode, FULL): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 28/67 [00:04<00:06, 6.13it/s] Capturing CUDA graphs (decode, FULL): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 30/67 [00:04<00:05, 7.33it/s] Capturing CUDA graphs (decode, FULL): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 32/67 [00:04<00:04, 8.55it/s] Capturing CUDA graphs (decode, FULL): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 34/67 [00:04<00:03, 9.93it/s] Capturing CUDA graphs (decode, FULL): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 36/67 [00:05<00:02, 11.27it/s] Capturing CUDA graphs (decode, FULL): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 38/67 [00:05<00:02, 12.52it/s] Capturing CUDA graphs (decode, FULL): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 40/67 [00:05<00:01, 13.72it/s] Capturing CUDA graphs (decode, FULL): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 42/67 [00:05<00:01, 14.88it/s] Capturing CUDA graphs (decode, FULL): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 44/67 [00:05<00:01, 15.87it/s] Capturing CUDA graphs (decode, FULL): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 47/67 [00:05<00:01, 17.31it/s] Capturing CUDA graphs (decode, FULL): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 50/67 [00:05<00:00, 18.76it/s] Capturing CUDA graphs (decode, FULL): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 53/67 [00:05<00:00, 20.02it/s] Capturing CUDA graphs (decode, FULL): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 56/67 [00:06<00:00, 21.07it/s] Capturing CUDA graphs (decode, FULL): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 59/67 [00:06<00:00, 21.88it/s] Capturing CUDA graphs (decode, FULL): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 62/67 [00:06<00:00, 22.38it/s] Capturing CUDA graphs (decode, FULL): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 65/67 [00:06<00:00, 22.51it/s] Capturing CUDA graphs (decode, FULL): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 67/67 [00:07<00:00, 8.89it/s]
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:42 [gpu_model_runner.py:3848] Graph capturing finished in 11 secs, took -1.97 GiB
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:42 [core.py:243] init engine (profile, create kv cache, warmup model) took 25.63 seconds
(EngineCore_DP0 pid=1751394) INFO 10-21 08:52:43 [gc_utils.py:40] GC Debug Config. enabled:False,top_objects:-1
INFO 10-21 08:52:43 [llm.py:333] Supported tasks: ['generate']
[2025-10-21 08:52:45] INFO evaluator.py:305: gsm8k: Using gen_kwargs: {'until': ['Question:', '</s>', '<|im_end|>'], 'do_sample': False, 'temperature': 0.0}
[2025-10-21 08:52:45] WARNING evaluator.py:324: Overwriting default num_fewshot of gsm8k from 5 to 5
[2025-10-21 08:52:45] INFO task.py:434: Building contexts for gsm8k on rank 0...
0%| | 0/1319 [00:00<?, ?it/s] 3%|β–Ž | 42/1319 [00:00<00:03, 416.10it/s] 6%|β–‹ | 84/1319 [00:00<00:03, 403.31it/s] 10%|β–‰ | 126/1319 [00:00<00:02, 408.66it/s] 13%|β–ˆβ–Ž | 169/1319 [00:00<00:02, 415.16it/s] 16%|β–ˆβ–Œ | 211/1319 [00:00<00:02, 415.10it/s] 19%|β–ˆβ–‰ | 254/1319 [00:00<00:02, 417.21it/s] 23%|β–ˆβ–ˆβ–Ž | 297/1319 [00:00<00:02, 418.93it/s] 26%|β–ˆβ–ˆβ–Œ | 340/1319 [00:00<00:02, 419.83it/s] 29%|β–ˆβ–ˆβ–‰ | 383/1319 [00:00<00:02, 420.69it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 426/1319 [00:01<00:02, 421.90it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 469/1319 [00:01<00:02, 423.85it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 512/1319 [00:01<00:01, 417.74it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 555/1319 [00:01<00:01, 420.04it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 598/1319 [00:01<00:01, 420.71it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 641/1319 [00:01<00:01, 417.01it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 683/1319 [00:01<00:01, 417.24it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 726/1319 [00:01<00:01, 418.53it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 769/1319 [00:01<00:01, 419.04it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 812/1319 [00:01<00:01, 419.40it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 855/1319 [00:02<00:01, 419.67it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 898/1319 [00:02<00:01, 420.28it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 941/1319 [00:02<00:00, 421.04it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 984/1319 [00:02<00:00, 421.21it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1027/1319 [00:02<00:00, 421.01it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1070/1319 [00:02<00:00, 416.37it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1113/1319 [00:02<00:00, 418.38it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1156/1319 [00:02<00:00, 419.79it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1198/1319 [00:02<00:00, 419.33it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1240/1319 [00:02<00:00, 419.28it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1283/1319 [00:03<00:00, 420.72it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1319/1319 [00:03<00:00, 419.00it/s]
[2025-10-21 08:52:48] INFO evaluator.py:574: Running generate_until requests
Running generate_until requests: 0%| | 0/1319 [00:00<?, ?it/s]
Adding requests: 0%| | 0/1319 [00:00<?, ?it/s] Adding requests: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1319/1319 [00:00<00:00, 13469.99it/s]
Processed prompts: 0%| | 0/1319 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 0%| | 1/1319 [01:10<25:48:00, 70.47s/it, est. speed input: 17.77 toks/s, output: 0.68 toks/s]
Processed prompts: 0%| | 2/1319 [01:10<10:41:55, 29.24s/it, est. speed input: 36.62 toks/s, output: 1.45 toks/s]
Processed prompts: 0%| | 3/1319 [01:11<5:53:20, 16.11s/it, est. speed input: 54.43 toks/s, output: 2.26 toks/s] 
Processed prompts: 0%| | 4/1319 [01:11<3:36:52, 9.90s/it, est. speed input: 73.48 toks/s, output: 3.12 toks/s]
Processed prompts: 0%| | 6/1319 [01:12<1:44:11, 4.76s/it, est. speed input: 104.20 toks/s, output: 4.38 toks/s]
Processed prompts: 1%| | 7/1319 [01:12<1:17:05, 3.53s/it, est. speed input: 123.86 toks/s, output: 5.32 toks/s]
Processed prompts: 1%| | 9/1319 [01:12<43:52, 2.01s/it, est. speed input: 158.37 toks/s, output: 7.04 toks/s] 
Processed prompts: 1%| | 10/1319 [01:12<35:30, 1.63s/it, est. speed input: 173.50 toks/s, output: 7.81 toks/s]
Processed prompts: 1%| | 11/1319 [01:13<27:17, 1.25s/it, est. speed input: 191.70 toks/s, output: 8.77 toks/s]
Processed prompts: 1%| | 14/1319 [01:14<17:49, 1.22it/s, est. speed input: 233.86 toks/s, output: 10.66 toks/s]
Processed prompts: 1%| | 15/1319 [01:14<14:42, 1.48it/s, est. speed input: 249.59 toks/s, output: 11.52 toks/s]
Processed prompts: 1%| | 16/1319 [01:14<12:11, 1.78it/s, est. speed input: 263.71 toks/s, output: 12.19 toks/s]
Processed prompts: 1%|▏ | 17/1319 [01:14<10:03, 2.16it/s, est. speed input: 283.41 toks/s, output: 13.22 toks/s]
Processed prompts: 1%|▏ | 18/1319 [01:15<09:11, 2.36it/s, est. speed input: 299.79 toks/s, output: 14.20 toks/s]
Processed prompts: 2%|▏ | 20/1319 [01:15<05:48, 3.73it/s, est. speed input: 329.62 toks/s, output: 15.81 toks/s]
Processed prompts: 2%|▏ | 22/1319 [01:15<04:25, 4.88it/s, est. speed input: 360.02 toks/s, output: 17.60 toks/s]
Processed prompts: 2%|▏ | 23/1319 [01:15<04:34, 4.73it/s, est. speed input: 374.08 toks/s, output: 18.43 toks/s]
Processed prompts: 2%|▏ | 26/1319 [01:15<02:45, 7.79it/s, est. speed input: 421.70 toks/s, output: 21.16 toks/s]
Processed prompts: 2%|▏ | 32/1319 [01:16<02:05, 10.28it/s, est. speed input: 517.58 toks/s, output: 26.90 toks/s]
Processed prompts: 3%|β–Ž | 34/1319 [01:16<02:11, 9.77it/s, est. speed input: 548.28 toks/s, output: 28.87 toks/s]
Processed prompts: 3%|β–Ž | 37/1319 [01:16<02:01, 10.55it/s, est. speed input: 593.48 toks/s, output: 31.65 toks/s]
Processed prompts: 3%|β–Ž | 39/1319 [01:16<01:56, 11.02it/s, est. speed input: 629.02 toks/s, output: 33.78 toks/s]
Processed prompts: 3%|β–Ž | 45/1319 [01:17<01:35, 13.29it/s, est. speed input: 720.27 toks/s, output: 39.50 toks/s]
Processed prompts: 4%|▍ | 50/1319 [01:17<01:29, 14.13it/s, est. speed input: 794.92 toks/s, output: 44.19 toks/s]
Processed prompts: 4%|▍ | 53/1319 [01:17<01:32, 13.75it/s, est. speed input: 834.43 toks/s, output: 46.54 toks/s]
Processed prompts: 4%|▍ | 56/1319 [01:18<01:37, 12.90it/s, est. speed input: 874.01 toks/s, output: 48.93 toks/s]
Processed prompts: 4%|▍ | 58/1319 [01:18<01:37, 12.87it/s, est. speed input: 903.84 toks/s, output: 50.81 toks/s]
Processed prompts: 5%|▍ | 62/1319 [01:18<01:28, 14.14it/s, est. speed input: 960.15 toks/s, output: 54.54 toks/s]
Processed prompts: 5%|▍ | 64/1319 [01:18<01:40, 12.45it/s, est. speed input: 992.49 toks/s, output: 56.77 toks/s]
Processed prompts: 5%|β–Œ | 67/1319 [01:18<01:44, 11.97it/s, est. speed input: 1034.09 toks/s, output: 59.73 toks/s]
Processed prompts: 6%|β–Œ | 74/1319 [01:19<01:19, 15.58it/s, est. speed input: 1132.72 toks/s, output: 66.57 toks/s]
Processed prompts: 6%|β–Œ | 80/1319 [01:19<01:07, 18.36it/s, est. speed input: 1213.72 toks/s, output: 72.02 toks/s]
Processed prompts: 6%|β–‹ | 83/1319 [01:19<01:29, 13.87it/s, est. speed input: 1251.41 toks/s, output: 74.78 toks/s]
Processed prompts: 6%|β–‹ | 85/1319 [01:20<01:37, 12.59it/s, est. speed input: 1277.96 toks/s, output: 76.77 toks/s]
Processed prompts: 7%|β–‹ | 92/1319 [01:20<01:21, 15.08it/s, est. speed input: 1366.13 toks/s, output: 82.64 toks/s]
Processed prompts: 7%|β–‹ | 96/1319 [01:20<01:28, 13.80it/s, est. speed input: 1415.09 toks/s, output: 86.10 toks/s]
Processed prompts: 8%|β–Š | 102/1319 [01:21<01:10, 17.37it/s, est. speed input: 1494.44 toks/s, output: 91.58 toks/s]
Processed prompts: 8%|β–Š | 105/1319 [01:21<01:25, 14.25it/s, est. speed input: 1531.74 toks/s, output: 94.45 toks/s]
Processed prompts: 8%|β–Š | 108/1319 [01:21<01:19, 15.20it/s, est. speed input: 1570.42 toks/s, output: 97.36 toks/s]
Processed prompts: 8%|β–Š | 111/1319 [01:21<01:22, 14.58it/s, est. speed input: 1607.85 toks/s, output: 100.23 toks/s]
Processed prompts: 9%|β–Š | 115/1319 [01:22<01:30, 13.24it/s, est. speed input: 1655.09 toks/s, output: 103.89 toks/s]
Processed prompts: 9%|β–‰ | 117/1319 [01:22<01:35, 12.56it/s, est. speed input: 1677.84 toks/s, output: 105.69 toks/s]
Processed prompts: 9%|β–‰ | 123/1319 [01:22<01:13, 16.26it/s, est. speed input: 1752.07 toks/s, output: 111.35 toks/s]
Processed prompts: 10%|β–‰ | 128/1319 [01:22<01:23, 14.24it/s, est. speed input: 1809.22 toks/s, output: 115.66 toks/s]
Processed prompts: 10%|β–‰ | 131/1319 [01:23<01:21, 14.53it/s, est. speed input: 1843.95 toks/s, output: 118.39 toks/s]
Processed prompts: 10%|β–ˆ | 133/1319 [01:23<01:35, 12.40it/s, est. speed input: 1865.03 toks/s, output: 120.25 toks/s]
Processed prompts: 10%|β–ˆ | 136/1319 [01:23<01:30, 13.12it/s, est. speed input: 1899.95 toks/s, output: 123.10 toks/s]
Processed prompts: 11%|β–ˆ | 141/1319 [01:23<01:15, 15.57it/s, est. speed input: 1960.25 toks/s, output: 127.84 toks/s]
Processed prompts: 11%|β–ˆ | 145/1319 [01:24<01:16, 15.32it/s, est. speed input: 2006.44 toks/s, output: 131.80 toks/s]
Processed prompts: 11%|β–ˆ | 148/1319 [01:24<01:24, 13.91it/s, est. speed input: 2040.74 toks/s, output: 134.79 toks/s]
Processed prompts: 12%|β–ˆβ– | 157/1319 [01:24<01:03, 18.29it/s, est. speed input: 2147.91 toks/s, output: 143.77 toks/s]
Processed prompts: 12%|β–ˆβ– | 161/1319 [01:25<01:06, 17.31it/s, est. speed input: 2191.84 toks/s, output: 147.36 toks/s]
Processed prompts: 13%|β–ˆβ–Ž | 167/1319 [01:25<01:12, 15.92it/s, est. speed input: 2256.79 toks/s, output: 153.04 toks/s]
Processed prompts: 13%|β–ˆβ–Ž | 170/1319 [01:25<01:12, 15.87it/s, est. speed input: 2293.66 toks/s, output: 156.20 toks/s]
Processed prompts: 13%|β–ˆβ–Ž | 177/1319 [01:26<01:11, 16.06it/s, est. speed input: 2373.31 toks/s, output: 163.37 toks/s]
Processed prompts: 14%|β–ˆβ– | 183/1319 [01:26<01:04, 17.69it/s, est. speed input: 2444.95 toks/s, output: 169.88 toks/s]
Processed prompts: 14%|β–ˆβ– | 188/1319 [01:26<01:03, 17.95it/s, est. speed input: 2506.09 toks/s, output: 175.46 toks/s]
Processed prompts: 14%|β–ˆβ– | 190/1319 [01:26<01:21, 13.91it/s, est. speed input: 2522.54 toks/s, output: 177.21 toks/s]
Processed prompts: 15%|β–ˆβ– | 193/1319 [01:27<01:25, 13.17it/s, est. speed input: 2557.21 toks/s, output: 180.25 toks/s]
Processed prompts: 15%|β–ˆβ–Œ | 201/1319 [01:27<00:52, 21.32it/s, est. speed input: 2653.40 toks/s, output: 188.90 toks/s]
Processed prompts: 16%|β–ˆβ–Œ | 205/1319 [01:27<01:11, 15.53it/s, est. speed input: 2691.38 toks/s, output: 192.74 toks/s]
Processed prompts: 16%|β–ˆβ–Œ | 210/1319 [01:28<01:13, 15.17it/s, est. speed input: 2743.94 toks/s, output: 197.89 toks/s]
Processed prompts: 16%|β–ˆβ–‹ | 215/1319 [01:28<01:05, 16.78it/s, est. speed input: 2799.65 toks/s, output: 203.41 toks/s]
Processed prompts: 17%|β–ˆβ–‹ | 218/1319 [01:28<01:15, 14.53it/s, est. speed input: 2831.05 toks/s, output: 206.57 toks/s]
Processed prompts: 17%|β–ˆβ–‹ | 225/1319 [01:28<00:59, 18.33it/s, est. speed input: 2915.51 toks/s, output: 215.04 toks/s]
Processed prompts: 18%|β–ˆβ–Š | 232/1319 [01:29<01:04, 16.85it/s, est. speed input: 2984.72 toks/s, output: 221.81 toks/s]
Processed prompts: 18%|β–ˆβ–Š | 240/1319 [01:29<00:55, 19.45it/s, est. speed input: 3075.50 toks/s, output: 231.09 toks/s]
Processed prompts: 19%|β–ˆβ–‰ | 250/1319 [01:30<01:02, 16.97it/s, est. speed input: 3180.17 toks/s, output: 242.14 toks/s]
Processed prompts: 20%|β–ˆβ–‰ | 258/1319 [01:30<00:58, 18.05it/s, est. speed input: 3266.12 toks/s, output: 250.53 toks/s]
Processed prompts: 20%|β–ˆβ–‰ | 260/1319 [01:31<01:09, 15.35it/s, est. speed input: 3281.94 toks/s, output: 252.36 toks/s]
Processed prompts: 20%|β–ˆβ–ˆ | 268/1319 [01:31<00:56, 18.73it/s, est. speed input: 3366.20 toks/s, output: 260.99 toks/s]
Processed prompts: 21%|β–ˆβ–ˆ | 271/1319 [01:31<01:13, 14.31it/s, est. speed input: 3385.44 toks/s, output: 262.91 toks/s]
Processed prompts: 21%|β–ˆβ–ˆ | 276/1319 [01:32<01:14, 13.92it/s, est. speed input: 3430.18 toks/s, output: 267.83 toks/s]
Processed prompts: 21%|β–ˆβ–ˆβ– | 281/1319 [01:32<01:18, 13.17it/s, est. speed input: 3476.18 toks/s, output: 272.80 toks/s]
Processed prompts: 21%|β–ˆβ–ˆβ– | 283/1319 [01:32<01:21, 12.72it/s, est. speed input: 3494.15 toks/s, output: 274.97 toks/s]
Processed prompts: 22%|β–ˆβ–ˆβ– | 290/1319 [01:33<01:11, 14.40it/s, est. speed input: 3562.58 toks/s, output: 282.76 toks/s]
Processed prompts: 22%|β–ˆβ–ˆβ– | 296/1319 [01:33<01:07, 15.14it/s, est. speed input: 3621.84 toks/s, output: 289.66 toks/s]
Processed prompts: 23%|β–ˆβ–ˆβ–Ž | 304/1319 [01:33<01:01, 16.49it/s, est. speed input: 3703.97 toks/s, output: 299.21 toks/s]
Processed prompts: 23%|β–ˆβ–ˆβ–Ž | 307/1319 [01:34<01:04, 15.79it/s, est. speed input: 3730.36 toks/s, output: 302.44 toks/s]
Processed prompts: 24%|β–ˆβ–ˆβ–Ž | 313/1319 [01:34<00:57, 17.64it/s, est. speed input: 3788.96 toks/s, output: 309.20 toks/s]
Processed prompts: 24%|β–ˆβ–ˆβ– | 315/1319 [01:34<01:15, 13.38it/s, est. speed input: 3796.49 toks/s, output: 310.49 toks/s]
Processed prompts: 24%|β–ˆβ–ˆβ– | 318/1319 [01:35<01:12, 13.89it/s, est. speed input: 3822.60 toks/s, output: 313.65 toks/s]
Processed prompts: 24%|β–ˆβ–ˆβ– | 320/1319 [01:35<01:18, 12.66it/s, est. speed input: 3839.43 toks/s, output: 315.66 toks/s]
Processed prompts: 25%|β–ˆβ–ˆβ– | 326/1319 [01:35<01:01, 16.14it/s, est. speed input: 3899.50 toks/s, output: 321.62 toks/s]
Processed prompts: 25%|β–ˆβ–ˆβ– | 328/1319 [01:35<01:12, 13.62it/s, est. speed input: 3910.89 toks/s, output: 323.27 toks/s]
Processed prompts: 25%|β–ˆβ–ˆβ–Œ | 331/1319 [01:35<01:10, 14.08it/s, est. speed input: 3935.78 toks/s, output: 326.39 toks/s]
Processed prompts: 26%|β–ˆβ–ˆβ–Œ | 337/1319 [01:36<00:57, 16.96it/s, est. speed input: 3993.16 toks/s, output: 332.44 toks/s]
Processed prompts: 26%|β–ˆβ–ˆβ–Œ | 343/1319 [01:36<01:01, 15.85it/s, est. speed input: 4044.36 toks/s, output: 339.14 toks/s]
Processed prompts: 26%|β–ˆβ–ˆβ–‹ | 348/1319 [01:36<01:00, 16.14it/s, est. speed input: 4087.46 toks/s, output: 343.86 toks/s]
Processed prompts: 27%|β–ˆβ–ˆβ–‹ | 355/1319 [01:37<00:52, 18.40it/s, est. speed input: 4155.69 toks/s, output: 350.68 toks/s]
Processed prompts: 27%|β–ˆβ–ˆβ–‹ | 359/1319 [01:37<01:05, 14.70it/s, est. speed input: 4188.13 toks/s, output: 354.94 toks/s]
Processed prompts: 28%|β–ˆβ–ˆβ–Š | 368/1319 [01:38<00:54, 17.60it/s, est. speed input: 4274.36 toks/s, output: 365.42 toks/s]
Processed prompts: 28%|β–ˆβ–ˆβ–Š | 373/1319 [01:38<00:58, 16.30it/s, est. speed input: 4314.24 toks/s, output: 370.46 toks/s]
Processed prompts: 29%|β–ˆβ–ˆβ–Š | 376/1319 [01:38<01:00, 15.68it/s, est. speed input: 4339.02 toks/s, output: 373.30 toks/s]
Processed prompts: 29%|β–ˆβ–ˆβ–‰ | 380/1319 [01:38<00:57, 16.23it/s, est. speed input: 4378.54 toks/s, output: 378.59 toks/s]
Processed prompts: 29%|β–ˆβ–ˆβ–‰ | 386/1319 [01:39<00:59, 15.80it/s, est. speed input: 4426.61 toks/s, output: 385.35 toks/s]
Processed prompts: 30%|β–ˆβ–ˆβ–‰ | 390/1319 [01:39<00:59, 15.70it/s, est. speed input: 4459.40 toks/s, output: 389.99 toks/s]
Processed prompts: 30%|β–ˆβ–ˆβ–‰ | 394/1319 [01:39<00:56, 16.27it/s, est. speed input: 4492.10 toks/s, output: 394.01 toks/s]
Processed prompts: 30%|β–ˆβ–ˆβ–ˆ | 399/1319 [01:40<00:53, 17.15it/s, est. speed input: 4536.46 toks/s, output: 400.19 toks/s]
Processed prompts: 30%|β–ˆβ–ˆβ–ˆ | 401/1319 [01:40<01:03, 14.43it/s, est. speed input: 4545.76 toks/s, output: 401.85 toks/s]
Processed prompts: 31%|β–ˆβ–ˆβ–ˆ | 403/1319 [01:40<01:04, 14.19it/s, est. speed input: 4560.55 toks/s, output: 403.89 toks/s]
Processed prompts: 31%|β–ˆβ–ˆβ–ˆ | 408/1319 [01:40<00:59, 15.23it/s, est. speed input: 4602.00 toks/s, output: 409.13 toks/s]
Processed prompts: 31%|β–ˆβ–ˆβ–ˆβ– | 413/1319 [01:40<00:52, 17.25it/s, est. speed input: 4647.03 toks/s, output: 414.78 toks/s]
Processed prompts: 32%|β–ˆβ–ˆβ–ˆβ– | 418/1319 [01:41<01:07, 13.41it/s, est. speed input: 4676.81 toks/s, output: 419.80 toks/s]
Processed prompts: 32%|β–ˆβ–ˆβ–ˆβ– | 424/1319 [01:41<00:57, 15.52it/s, est. speed input: 4733.23 toks/s, output: 428.01 toks/s]
Processed prompts: 32%|β–ˆβ–ˆβ–ˆβ– | 427/1319 [01:41<01:01, 14.55it/s, est. speed input: 4753.76 toks/s, output: 431.30 toks/s]
Processed prompts: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 430/1319 [01:42<01:02, 14.33it/s, est. speed input: 4777.76 toks/s, output: 434.80 toks/s]
Processed prompts: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 438/1319 [01:42<00:53, 16.35it/s, est. speed input: 4842.53 toks/s, output: 443.51 toks/s]
Processed prompts: 34%|β–ˆβ–ˆβ–ˆβ–Ž | 444/1319 [01:42<00:51, 16.98it/s, est. speed input: 4885.74 toks/s, output: 448.57 toks/s]
Processed prompts: 34%|β–ˆβ–ˆβ–ˆβ– | 447/1319 [01:43<00:53, 16.15it/s, est. speed input: 4905.26 toks/s, output: 450.70 toks/s]
Processed prompts: 34%|β–ˆβ–ˆβ–ˆβ– | 451/1319 [01:43<00:56, 15.29it/s, est. speed input: 4931.71 toks/s, output: 453.98 toks/s]
Processed prompts: 35%|β–ˆβ–ˆβ–ˆβ– | 456/1319 [01:43<00:54, 15.87it/s, est. speed input: 4971.13 toks/s, output: 459.57 toks/s]
Processed prompts: 35%|β–ˆβ–ˆβ–ˆβ–Œ | 464/1319 [01:44<00:45, 18.71it/s, est. speed input: 5038.69 toks/s, output: 468.68 toks/s]
Processed prompts: 35%|β–ˆβ–ˆβ–ˆβ–Œ | 466/1319 [01:44<00:53, 15.83it/s, est. speed input: 5049.13 toks/s, output: 470.69 toks/s]
Processed prompts: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 470/1319 [01:44<00:51, 16.42it/s, est. speed input: 5080.60 toks/s, output: 474.89 toks/s]
Processed prompts: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 475/1319 [01:44<00:50, 16.69it/s, est. speed input: 5117.02 toks/s, output: 479.88 toks/s]
Processed prompts: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 482/1319 [01:45<00:45, 18.40it/s, est. speed input: 5167.75 toks/s, output: 485.71 toks/s]
Processed prompts: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 486/1319 [01:45<00:50, 16.44it/s, est. speed input: 5190.69 toks/s, output: 488.95 toks/s]
Processed prompts: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 492/1319 [01:45<00:48, 16.91it/s, est. speed input: 5233.88 toks/s, output: 494.97 toks/s]
Processed prompts: 38%|β–ˆβ–ˆβ–ˆβ–Š | 496/1319 [01:46<00:47, 17.18it/s, est. speed input: 5261.86 toks/s, output: 498.83 toks/s]
Processed prompts: 38%|β–ˆβ–ˆβ–ˆβ–Š | 499/1319 [01:46<00:51, 15.80it/s, est. speed input: 5277.85 toks/s, output: 500.96 toks/s]
Processed prompts: 38%|β–ˆβ–ˆβ–ˆβ–Š | 502/1319 [01:46<00:51, 15.90it/s, est. speed input: 5298.04 toks/s, output: 503.33 toks/s]
Processed prompts: 38%|β–ˆβ–ˆβ–ˆβ–Š | 506/1319 [01:46<00:53, 15.31it/s, est. speed input: 5321.93 toks/s, output: 506.31 toks/s]
Processed prompts: 39%|β–ˆβ–ˆβ–ˆβ–Š | 510/1319 [01:46<00:47, 16.91it/s, est. speed input: 5353.39 toks/s, output: 510.18 toks/s]
Processed prompts: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 513/1319 [01:47<00:54, 14.66it/s, est. speed input: 5369.22 toks/s, output: 513.41 toks/s]
Processed prompts: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 516/1319 [01:47<00:56, 14.32it/s, est. speed input: 5388.97 toks/s, output: 516.03 toks/s]
Processed prompts: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 519/1319 [01:47<00:54, 14.69it/s, est. speed input: 5409.40 toks/s, output: 518.93 toks/s]
Processed prompts: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 528/1319 [01:48<00:45, 17.56it/s, est. speed input: 5475.04 toks/s, output: 528.29 toks/s]
Processed prompts: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 532/1319 [01:48<00:46, 17.06it/s, est. speed input: 5498.86 toks/s, output: 531.46 toks/s]
Processed prompts: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 538/1319 [01:48<00:44, 17.52it/s, est. speed input: 5536.96 toks/s, output: 536.34 toks/s]
Processed prompts: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 540/1319 [01:48<00:50, 15.52it/s, est. speed input: 5545.92 toks/s, output: 538.46 toks/s]
Processed prompts: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 543/1319 [01:49<00:49, 15.78it/s, est. speed input: 5565.19 toks/s, output: 541.84 toks/s]
Processed prompts: 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 547/1319 [01:49<00:48, 15.90it/s, est. speed input: 5589.30 toks/s, output: 544.92 toks/s]
Processed prompts: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 552/1319 [01:49<00:44, 17.22it/s, est. speed input: 5622.22 toks/s, output: 549.03 toks/s]
Processed prompts: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 561/1319 [01:49<00:38, 19.71it/s, est. speed input: 5685.94 toks/s, output: 558.24 toks/s]
Processed prompts: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 563/1319 [01:50<00:47, 15.98it/s, est. speed input: 5689.23 toks/s, output: 558.88 toks/s]
Processed prompts: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 565/1319 [01:50<00:48, 15.57it/s, est. speed input: 5702.64 toks/s, output: 560.77 toks/s]
Processed prompts: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 572/1319 [01:50<00:43, 17.31it/s, est. speed input: 5747.61 toks/s, output: 567.09 toks/s]
Processed prompts: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 574/1319 [01:50<00:50, 14.82it/s, est. speed input: 5754.02 toks/s, output: 568.26 toks/s]
Processed prompts: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 579/1319 [01:51<00:45, 16.44it/s, est. speed input: 5785.70 toks/s, output: 572.28 toks/s]
Processed prompts: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 583/1319 [01:51<00:44, 16.50it/s, est. speed input: 5810.34 toks/s, output: 576.10 toks/s]
Processed prompts: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 588/1319 [01:51<00:44, 16.37it/s, est. speed input: 5842.46 toks/s, output: 581.13 toks/s]
Processed prompts: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 592/1319 [01:51<00:42, 17.01it/s, est. speed input: 5867.91 toks/s, output: 585.48 toks/s]
Processed prompts: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 597/1319 [01:52<00:38, 18.85it/s, est. speed input: 5899.85 toks/s, output: 589.15 toks/s]
Processed prompts: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 599/1319 [01:52<00:44, 16.36it/s, est. speed input: 5907.98 toks/s, output: 591.34 toks/s]
Processed prompts: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 602/1319 [01:52<00:45, 15.84it/s, est. speed input: 5922.60 toks/s, output: 593.31 toks/s]
Processed prompts: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 604/1319 [01:52<00:51, 14.00it/s, est. speed input: 5929.95 toks/s, output: 594.57 toks/s]
Processed prompts: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 614/1319 [01:53<00:36, 19.47it/s, est. speed input: 5996.40 toks/s, output: 603.29 toks/s]
Processed prompts: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 618/1319 [01:53<00:37, 18.72it/s, est. speed input: 6020.03 toks/s, output: 606.93 toks/s]
Processed prompts: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 623/1319 [01:53<00:40, 17.09it/s, est. speed input: 6045.52 toks/s, output: 610.77 toks/s]
Processed prompts: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 628/1319 [01:53<00:38, 18.15it/s, est. speed input: 6078.12 toks/s, output: 615.61 toks/s]
Processed prompts: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 633/1319 [01:54<00:41, 16.70it/s, est. speed input: 6105.43 toks/s, output: 621.21 toks/s]
Processed prompts: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 638/1319 [01:54<00:38, 17.82it/s, est. speed input: 6137.06 toks/s, output: 626.28 toks/s]
Processed prompts: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 641/1319 [01:54<00:38, 17.56it/s, est. speed input: 6154.27 toks/s, output: 628.62 toks/s]
Processed prompts: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 647/1319 [01:55<00:36, 18.35it/s, est. speed input: 6195.11 toks/s, output: 635.38 toks/s]
Processed prompts: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 652/1319 [01:55<00:39, 16.73it/s, est. speed input: 6217.92 toks/s, output: 639.09 toks/s]
Processed prompts: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 655/1319 [01:55<00:38, 17.40it/s, est. speed input: 6236.09 toks/s, output: 641.52 toks/s]
Processed prompts: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 663/1319 [01:55<00:30, 21.52it/s, est. speed input: 6290.21 toks/s, output: 648.94 toks/s]
Processed prompts: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 668/1319 [01:56<00:32, 20.20it/s, est. speed input: 6316.19 toks/s, output: 652.65 toks/s]
Processed prompts: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 673/1319 [01:56<00:32, 20.06it/s, est. speed input: 6343.14 toks/s, output: 655.99 toks/s]
Processed prompts: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 680/1319 [01:56<00:27, 23.06it/s, est. speed input: 6389.74 toks/s, output: 661.92 toks/s]
Processed prompts: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 683/1319 [01:56<00:26, 23.85it/s, est. speed input: 6413.43 toks/s, output: 666.35 toks/s]
Processed prompts: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 686/1319 [01:56<00:25, 24.63it/s, est. speed input: 6434.69 toks/s, output: 670.38 toks/s]
Processed prompts: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 690/1319 [01:56<00:23, 27.32it/s, est. speed input: 6460.36 toks/s, output: 673.14 toks/s]
Processed prompts: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 695/1319 [01:57<00:19, 31.86it/s, est. speed input: 6494.02 toks/s, output: 676.79 toks/s]
Processed prompts: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 699/1319 [01:57<00:18, 33.40it/s, est. speed input: 6520.10 toks/s, output: 679.88 toks/s]
Processed prompts: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 703/1319 [01:57<00:17, 34.55it/s, est. speed input: 6545.89 toks/s, output: 682.86 toks/s]
Processed prompts: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 707/1319 [01:57<00:17, 35.49it/s, est. speed input: 6573.65 toks/s, output: 686.28 toks/s]
Processed prompts: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 714/1319 [01:57<00:13, 44.26it/s, est. speed input: 6626.77 toks/s, output: 693.98 toks/s]
Processed prompts: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 719/1319 [01:57<00:13, 45.46it/s, est. speed input: 6667.65 toks/s, output: 701.16 toks/s]
Processed prompts: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 725/1319 [01:57<00:12, 49.17it/s, est. speed input: 6710.58 toks/s, output: 707.32 toks/s]
Processed prompts: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 731/1319 [01:57<00:11, 52.01it/s, est. speed input: 6753.49 toks/s, output: 712.88 toks/s]
Processed prompts: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 737/1319 [01:57<00:13, 42.15it/s, est. speed input: 6795.82 toks/s, output: 720.41 toks/s]
Processed prompts: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 744/1319 [01:58<00:11, 48.88it/s, est. speed input: 6848.99 toks/s, output: 728.26 toks/s]
Processed prompts: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 754/1319 [01:58<00:11, 49.91it/s, est. speed input: 6916.61 toks/s, output: 736.96 toks/s]
Processed prompts: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 760/1319 [01:58<00:12, 43.06it/s, est. speed input: 6954.45 toks/s, output: 743.49 toks/s]
Processed prompts: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 767/1319 [01:58<00:13, 40.87it/s, est. speed input: 7000.38 toks/s, output: 750.43 toks/s]
Processed prompts: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 777/1319 [01:58<00:12, 44.74it/s, est. speed input: 7068.54 toks/s, output: 760.08 toks/s]
Processed prompts: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 787/1319 [01:58<00:11, 47.54it/s, est. speed input: 7139.59 toks/s, output: 771.52 toks/s]
Processed prompts: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 796/1319 [01:59<00:10, 48.04it/s, est. speed input: 7209.06 toks/s, output: 781.73 toks/s]
Processed prompts: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 807/1319 [01:59<00:09, 51.87it/s, est. speed input: 7287.04 toks/s, output: 793.33 toks/s]
Processed prompts: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 821/1319 [01:59<00:08, 60.41it/s, est. speed input: 7401.99 toks/s, output: 812.34 toks/s]
Processed prompts: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 829/1319 [01:59<00:08, 56.63it/s, est. speed input: 7455.26 toks/s, output: 820.40 toks/s]
Processed prompts: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 840/1319 [01:59<00:08, 59.32it/s, est. speed input: 7530.53 toks/s, output: 830.64 toks/s]
Processed prompts: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 849/1319 [02:00<00:08, 57.93it/s, est. speed input: 7592.88 toks/s, output: 841.15 toks/s]
Processed prompts: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 867/1319 [02:00<00:06, 73.63it/s, est. speed input: 7726.84 toks/s, output: 862.10 toks/s]
Processed prompts: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 875/1319 [02:00<00:06, 67.64it/s, est. speed input: 7780.16 toks/s, output: 870.49 toks/s]
Processed prompts: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 886/1319 [02:00<00:06, 68.87it/s, est. speed input: 7856.19 toks/s, output: 882.66 toks/s]
Processed prompts: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 899/1319 [02:00<00:05, 74.41it/s, est. speed input: 7952.33 toks/s, output: 899.72 toks/s]
Processed prompts: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 908/1319 [02:00<00:05, 71.04it/s, est. speed input: 8013.17 toks/s, output: 909.80 toks/s]
Processed prompts: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 916/1319 [02:00<00:06, 66.95it/s, est. speed input: 8066.71 toks/s, output: 919.66 toks/s]
Processed prompts: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 925/1319 [02:01<00:05, 66.48it/s, est. speed input: 8121.69 toks/s, output: 927.49 toks/s]
Processed prompts: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 932/1319 [02:01<00:06, 62.19it/s, est. speed input: 8167.02 toks/s, output: 936.12 toks/s]
Processed prompts: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 944/1319 [02:01<00:05, 69.94it/s, est. speed input: 8252.94 toks/s, output: 952.82 toks/s]
Processed prompts: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 952/1319 [02:01<00:05, 67.19it/s, est. speed input: 8309.59 toks/s, output: 965.65 toks/s]
Processed prompts: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 962/1319 [02:01<00:05, 70.23it/s, est. speed input: 8373.41 toks/s, output: 976.18 toks/s]
Processed prompts: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 978/1319 [02:01<00:03, 86.56it/s, est. speed input: 8487.65 toks/s, output: 997.80 toks/s]
Processed prompts: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 990/1319 [02:01<00:03, 89.91it/s, est. speed input: 8566.37 toks/s, output: 1012.00 toks/s]
Processed prompts: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1002/1319 [02:01<00:03, 92.76it/s, est. speed input: 8645.27 toks/s, output: 1026.34 toks/s]
Processed prompts: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1012/1319 [02:02<00:03, 80.14it/s, est. speed input: 8708.44 toks/s, output: 1039.18 toks/s]
Processed prompts: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1021/1319 [02:02<00:03, 80.20it/s, est. speed input: 8768.10 toks/s, output: 1052.25 toks/s]
Processed prompts: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1030/1319 [02:02<00:03, 80.70it/s, est. speed input: 8826.95 toks/s, output: 1064.08 toks/s]
Processed prompts: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1039/1319 [02:02<00:03, 81.68it/s, est. speed input: 8881.95 toks/s, output: 1072.61 toks/s]
Processed prompts: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1048/1319 [02:02<00:03, 82.55it/s, est. speed input: 8936.01 toks/s, output: 1080.66 toks/s]
Processed prompts: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1057/1319 [02:02<00:03, 73.93it/s, est. speed input: 8987.91 toks/s, output: 1089.55 toks/s]
Processed prompts: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1065/1319 [02:02<00:03, 67.65it/s, est. speed input: 9032.34 toks/s, output: 1097.17 toks/s]
Processed prompts: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1078/1319 [02:02<00:03, 77.81it/s, est. speed input: 9114.40 toks/s, output: 1111.32 toks/s]
Processed prompts: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1099/1319 [02:03<00:02, 105.51it/s, est. speed input: 9245.91 toks/s, output: 1131.81 toks/s]
Processed prompts: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1110/1319 [02:03<00:02, 102.71it/s, est. speed input: 9314.36 toks/s, output: 1143.97 toks/s]
Processed prompts: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1121/1319 [02:03<00:02, 86.01it/s, est. speed input: 9378.74 toks/s, output: 1157.19 toks/s] 
Processed prompts: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1131/1319 [02:03<00:02, 80.77it/s, est. speed input: 9438.42 toks/s, output: 1169.76 toks/s]
Processed prompts: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1140/1319 [02:03<00:02, 82.90it/s, est. speed input: 9496.21 toks/s, output: 1182.14 toks/s]
Processed prompts: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1149/1319 [02:03<00:02, 84.54it/s, est. speed input: 9546.21 toks/s, output: 1191.71 toks/s]
Processed prompts: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1158/1319 [02:03<00:01, 80.78it/s, est. speed input: 9597.11 toks/s, output: 1201.64 toks/s]
Processed prompts: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1167/1319 [02:04<00:02, 68.87it/s, est. speed input: 9642.27 toks/s, output: 1210.96 toks/s]
Processed prompts: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1175/1319 [02:04<00:02, 68.88it/s, est. speed input: 9685.77 toks/s, output: 1220.31 toks/s]
Processed prompts: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1183/1319 [02:04<00:02, 52.20it/s, est. speed input: 9723.85 toks/s, output: 1231.32 toks/s]
Processed prompts: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1190/1319 [02:04<00:02, 54.87it/s, est. speed input: 9764.47 toks/s, output: 1241.35 toks/s]
Processed prompts: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1203/1319 [02:04<00:01, 69.12it/s, est. speed input: 9846.83 toks/s, output: 1261.52 toks/s]
Processed prompts: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1211/1319 [02:04<00:01, 69.87it/s, est. speed input: 9895.21 toks/s, output: 1274.43 toks/s]
Processed prompts: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1219/1319 [02:04<00:01, 71.03it/s, est. speed input: 9939.65 toks/s, output: 1285.92 toks/s]
Processed prompts: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1228/1319 [02:04<00:01, 75.79it/s, est. speed input: 9993.26 toks/s, output: 1300.13 toks/s]
Processed prompts: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1236/1319 [02:05<00:01, 70.97it/s, est. speed input: 10037.30 toks/s, output: 1312.30 toks/s]
Processed prompts: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1244/1319 [02:05<00:01, 69.07it/s, est. speed input: 10081.58 toks/s, output: 1324.74 toks/s]
Processed prompts: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1252/1319 [02:05<00:01, 63.18it/s, est. speed input: 10126.69 toks/s, output: 1339.11 toks/s]
Processed prompts: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1259/1319 [02:05<00:00, 64.17it/s, est. speed input: 10165.61 toks/s, output: 1350.83 toks/s]
Processed prompts: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1268/1319 [02:05<00:00, 69.59it/s, est. speed input: 10212.27 toks/s, output: 1364.76 toks/s]
Processed prompts: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1278/1319 [02:05<00:00, 75.43it/s, est. speed input: 10270.46 toks/s, output: 1382.98 toks/s]
Processed prompts: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1292/1319 [02:05<00:00, 91.83it/s, est. speed input: 10353.68 toks/s, output: 1408.87 toks/s]
Processed prompts: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1302/1319 [02:05<00:00, 83.46it/s, est. speed input: 10406.57 toks/s, output: 1427.13 toks/s]
Processed prompts: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1311/1319 [02:06<00:00, 81.22it/s, est. speed input: 10453.11 toks/s, output: 1443.89 toks/s]
Processed prompts: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1319/1319 [02:06<00:00, 81.22it/s, est. speed input: 10493.62 toks/s, output: 1459.18 toks/s] Processed prompts: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1319/1319 [02:06<00:00, 10.46it/s, est. speed input: 10493.62 toks/s, output: 1459.18 toks/s]
Running generate_until requests: 0%| | 1/1319 [02:06<46:13:33, 126.26s/it] Running generate_until requests: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1319/1319 [02:06<00:00, 10.45it/s]
fatal: not a git repository (or any of the parent directories): .git
[2025-10-21 08:54:59] INFO evaluation_tracker.py:280: Output path not provided, skipping saving results aggregated
vllm (pretrained=/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue,tensor_parallel_size=1,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.8014|Β± |0.0110|
| | |strict-match | 5|exact_match|↑ |0.8522|Β± |0.0098|