| INFO 10-21 08:52:02 [__init__.py:225] Automatically detected platform cuda. | |
| [2025-10-21 08:52:06] INFO __main__.py:429: Passed `--trust_remote_code`, setting environment variable `HF_DATASETS_TRUST_REMOTE_CODE=true` | |
| [2025-10-21 08:52:06] INFO __main__.py:446: Selected Tasks: ['gsm8k'] | |
| [2025-10-21 08:52:06] INFO evaluator.py:202: Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 | |
| [2025-10-21 08:52:06] INFO evaluator.py:240: Initializing vllm model, with arguments: {'pretrained': '/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue', | |
| 'tensor_parallel_size': 1, 'trust_remote_code': True} | |
| INFO 10-21 08:52:06 [utils.py:243] non-default args: {'trust_remote_code': True, 'seed': 1234, 'disable_log_stats': True, 'model': '/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue'} | |
| INFO 10-21 08:52:06 [model.py:663] Resolved architecture: NemotronHForCausalLM | |
| INFO 10-21 08:52:06 [model.py:1751] Using max model len 131072 | |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. | |
| INFO 10-21 08:52:07 [scheduler.py:225] Chunked prefill is enabled with max_num_batched_tokens=16384. | |
| INFO 10-21 08:52:07 [config.py:324] Disabling cascade attention since it is not supported for hybrid models. | |
| INFO 10-21 08:52:07 [config.py:440] Setting attention block size to 672 tokens to ensure that attention page size is >= mamba page size. | |
| INFO 10-21 08:52:07 [config.py:464] Padding mamba page size by 2.13% to ensure that mamba page size and attention page size are exactly equal. | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:08 [core.py:730] Waiting for init message from front-end. | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:08 [core.py:97] Initializing a V1 LLM engine (v0.11.1rc2.dev191+g80e945298) with config: model='/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue', speculative_config=None, tokenizer='/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=1234, served_model_name=/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue, enable_prefix_caching=False, chunked_prefill_enabled=True, pooler_config=None, compilation_config={'level': None, 'mode': 3, 'debug_dump_path': None, 'cache_dir': '', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention', 'vllm::sparse_attn_indexer'], 'use_inductor': None, 'compile_sizes': [], 'inductor_compile_config': {'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'use_cudagraph': True, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [512, 504, 496, 488, 480, 472, 464, 456, 448, 440, 432, 424, 416, 408, 400, 392, 384, 376, 368, 360, 352, 344, 336, 328, 320, 312, 304, 296, 288, 280, 272, 264, 256, 248, 240, 232, 224, 216, 208, 200, 192, 184, 176, 168, 160, 152, 144, 136, 128, 120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 4, 2, 1], 'cudagraph_copy_inputs': False, 'full_cuda_graph': True, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {}, 'max_capture_size': 512, 'local_cache_dir': None} | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0 | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:10 [parallel_state.py:1325] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:12 [gpu_model_runner.py:2860] Starting to load model /mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue... | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:12 [compressed_tensors_wNa16.py:108] Using MacheteLinearKernel for CompressedTensorsWNA16 | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:12 [compressed_tensors_wNa16.py:108] Using MarlinLinearKernel for CompressedTensorsWNA16 | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:12 [cuda.py:403] Using Flash Attention backend on V1 engine. | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s] | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:02<00:02, 2.02s/it] | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00, 1.28s/it] | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00, 1.39s/it] | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:15 [default_loader.py:314] Loading weights took 2.83 seconds | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:16 [gpu_model_runner.py:2921] Model loading took 6.0475 GiB and 3.570261 seconds | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:19 [backends.py:609] Using cache directory: /home/eldar/.cache/vllm/torch_compile_cache/5a2b29b2c9/rank_0_0/backbone for vLLM's torch.compile | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:19 [backends.py:623] Dynamo bytecode transform time: 2.31 s | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:20 [backends.py:248] Cache the graph for dynamic shape for later use | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:26 [backends.py:275] Compiling a graph for dynamic shape takes 6.35 s | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:28 [monitor.py:34] torch.compile takes 8.66 s in total | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:30 [gpu_worker.py:316] Available KV cache memory: 60.15 GiB | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m WARNING 10-21 08:52:30 [kv_cache_utils.py:951] Add 1 padding layers, may waste at most 3.70% KV cache memory | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:30 [kv_cache_utils.py:1201] GPU KV cache size: 492,576 tokens | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:30 [kv_cache_utils.py:1206] Maximum concurrency for 131,072 tokens per request: 28.90x | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m 2025-10-21 08:52:30,487 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ... | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m 2025-10-21 08:52:31,099 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/67 [00:00<?, ?it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 3%|β | 2/67 [00:00<00:03, 19.77it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 7%|β | 5/67 [00:00<00:02, 21.16it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 12%|ββ | 8/67 [00:00<00:02, 21.46it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 16%|ββ | 11/67 [00:00<00:02, 21.87it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 21%|ββ | 14/67 [00:00<00:02, 22.16it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 25%|βββ | 17/67 [00:00<00:02, 22.39it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 30%|βββ | 20/67 [00:00<00:02, 22.53it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 34%|ββββ | 23/67 [00:01<00:01, 22.63it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 39%|ββββ | 26/67 [00:01<00:01, 22.80it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 43%|βββββ | 29/67 [00:01<00:01, 22.90it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 48%|βββββ | 32/67 [00:01<00:01, 23.26it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 52%|ββββββ | 35/67 [00:01<00:01, 24.66it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 58%|ββββββ | 39/67 [00:01<00:01, 26.62it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 64%|βββββββ | 43/67 [00:01<00:00, 28.14it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 69%|βββββββ | 46/67 [00:02<00:01, 11.87it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 75%|ββββββββ | 50/67 [00:02<00:01, 15.16it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 79%|ββββββββ | 53/67 [00:02<00:00, 17.44it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 85%|βββββββββ | 57/67 [00:02<00:00, 20.57it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 91%|βββββββββ | 61/67 [00:02<00:00, 22.91it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 97%|ββββββββββ| 65/67 [00:03<00:00, 25.25it/s] Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|ββββββββββ| 67/67 [00:03<00:00, 21.65it/s] | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m Capturing CUDA graphs (decode, FULL): 0%| | 0/67 [00:00<?, ?it/s] Capturing CUDA graphs (decode, FULL): 1%|β | 1/67 [00:00<00:51, 1.27it/s] Capturing CUDA graphs (decode, FULL): 3%|β | 2/67 [00:01<00:35, 1.83it/s] Capturing CUDA graphs (decode, FULL): 4%|β | 3/67 [00:01<00:21, 2.91it/s] Capturing CUDA graphs (decode, FULL): 6%|β | 4/67 [00:01<00:28, 2.25it/s] Capturing CUDA graphs (decode, FULL): 9%|β | 6/67 [00:02<00:15, 3.84it/s] Capturing CUDA graphs (decode, FULL): 12%|ββ | 8/67 [00:02<00:11, 5.24it/s] Capturing CUDA graphs (decode, FULL): 15%|ββ | 10/67 [00:02<00:08, 6.50it/s] Capturing CUDA graphs (decode, FULL): 18%|ββ | 12/67 [00:02<00:07, 7.56it/s] Capturing CUDA graphs (decode, FULL): 21%|ββ | 14/67 [00:02<00:06, 8.50it/s] Capturing CUDA graphs (decode, FULL): 24%|βββ | 16/67 [00:02<00:05, 9.27it/s] Capturing CUDA graphs (decode, FULL): 27%|βββ | 18/67 [00:03<00:04, 9.92it/s] Capturing CUDA graphs (decode, FULL): 30%|βββ | 20/67 [00:03<00:04, 10.48it/s] Capturing CUDA graphs (decode, FULL): 33%|ββββ | 22/67 [00:03<00:04, 10.91it/s] Capturing CUDA graphs (decode, FULL): 36%|ββββ | 24/67 [00:03<00:03, 11.35it/s] Capturing CUDA graphs (decode, FULL): 39%|ββββ | 26/67 [00:03<00:03, 11.75it/s] Capturing CUDA graphs (decode, FULL): 42%|βββββ | 28/67 [00:04<00:06, 6.13it/s] Capturing CUDA graphs (decode, FULL): 45%|βββββ | 30/67 [00:04<00:05, 7.33it/s] Capturing CUDA graphs (decode, FULL): 48%|βββββ | 32/67 [00:04<00:04, 8.55it/s] Capturing CUDA graphs (decode, FULL): 51%|βββββ | 34/67 [00:04<00:03, 9.93it/s] Capturing CUDA graphs (decode, FULL): 54%|ββββββ | 36/67 [00:05<00:02, 11.27it/s] Capturing CUDA graphs (decode, FULL): 57%|ββββββ | 38/67 [00:05<00:02, 12.52it/s] Capturing CUDA graphs (decode, FULL): 60%|ββββββ | 40/67 [00:05<00:01, 13.72it/s] Capturing CUDA graphs (decode, FULL): 63%|βββββββ | 42/67 [00:05<00:01, 14.88it/s] Capturing CUDA graphs (decode, FULL): 66%|βββββββ | 44/67 [00:05<00:01, 15.87it/s] Capturing CUDA graphs (decode, FULL): 70%|βββββββ | 47/67 [00:05<00:01, 17.31it/s] Capturing CUDA graphs (decode, FULL): 75%|ββββββββ | 50/67 [00:05<00:00, 18.76it/s] Capturing CUDA graphs (decode, FULL): 79%|ββββββββ | 53/67 [00:05<00:00, 20.02it/s] Capturing CUDA graphs (decode, FULL): 84%|βββββββββ | 56/67 [00:06<00:00, 21.07it/s] Capturing CUDA graphs (decode, FULL): 88%|βββββββββ | 59/67 [00:06<00:00, 21.88it/s] Capturing CUDA graphs (decode, FULL): 93%|ββββββββββ| 62/67 [00:06<00:00, 22.38it/s] Capturing CUDA graphs (decode, FULL): 97%|ββββββββββ| 65/67 [00:06<00:00, 22.51it/s] Capturing CUDA graphs (decode, FULL): 100%|ββββββββββ| 67/67 [00:07<00:00, 8.89it/s] | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:42 [gpu_model_runner.py:3848] Graph capturing finished in 11 secs, took -1.97 GiB | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:42 [core.py:243] init engine (profile, create kv cache, warmup model) took 25.63 seconds | |
| [1;36m(EngineCore_DP0 pid=1751394)[0;0m INFO 10-21 08:52:43 [gc_utils.py:40] GC Debug Config. enabled:False,top_objects:-1 | |
| INFO 10-21 08:52:43 [llm.py:333] Supported tasks: ['generate'] | |
| [2025-10-21 08:52:45] INFO evaluator.py:305: gsm8k: Using gen_kwargs: {'until': ['Question:', '</s>', '<|im_end|>'], 'do_sample': False, 'temperature': 0.0} | |
| [2025-10-21 08:52:45] WARNING evaluator.py:324: Overwriting default num_fewshot of gsm8k from 5 to 5 | |
| [2025-10-21 08:52:45] INFO task.py:434: Building contexts for gsm8k on rank 0... | |
| 0%| | 0/1319 [00:00<?, ?it/s] 3%|β | 42/1319 [00:00<00:03, 416.10it/s] 6%|β | 84/1319 [00:00<00:03, 403.31it/s] 10%|β | 126/1319 [00:00<00:02, 408.66it/s] 13%|ββ | 169/1319 [00:00<00:02, 415.16it/s] 16%|ββ | 211/1319 [00:00<00:02, 415.10it/s] 19%|ββ | 254/1319 [00:00<00:02, 417.21it/s] 23%|βββ | 297/1319 [00:00<00:02, 418.93it/s] 26%|βββ | 340/1319 [00:00<00:02, 419.83it/s] 29%|βββ | 383/1319 [00:00<00:02, 420.69it/s] 32%|ββββ | 426/1319 [00:01<00:02, 421.90it/s] 36%|ββββ | 469/1319 [00:01<00:02, 423.85it/s] 39%|ββββ | 512/1319 [00:01<00:01, 417.74it/s] 42%|βββββ | 555/1319 [00:01<00:01, 420.04it/s] 45%|βββββ | 598/1319 [00:01<00:01, 420.71it/s] 49%|βββββ | 641/1319 [00:01<00:01, 417.01it/s] 52%|ββββββ | 683/1319 [00:01<00:01, 417.24it/s] 55%|ββββββ | 726/1319 [00:01<00:01, 418.53it/s] 58%|ββββββ | 769/1319 [00:01<00:01, 419.04it/s] 62%|βββββββ | 812/1319 [00:01<00:01, 419.40it/s] 65%|βββββββ | 855/1319 [00:02<00:01, 419.67it/s] 68%|βββββββ | 898/1319 [00:02<00:01, 420.28it/s] 71%|ββββββββ | 941/1319 [00:02<00:00, 421.04it/s] 75%|ββββββββ | 984/1319 [00:02<00:00, 421.21it/s] 78%|ββββββββ | 1027/1319 [00:02<00:00, 421.01it/s] 81%|ββββββββ | 1070/1319 [00:02<00:00, 416.37it/s] 84%|βββββββββ | 1113/1319 [00:02<00:00, 418.38it/s] 88%|βββββββββ | 1156/1319 [00:02<00:00, 419.79it/s] 91%|βββββββββ | 1198/1319 [00:02<00:00, 419.33it/s] 94%|ββββββββββ| 1240/1319 [00:02<00:00, 419.28it/s] 97%|ββββββββββ| 1283/1319 [00:03<00:00, 420.72it/s] 100%|ββββββββββ| 1319/1319 [00:03<00:00, 419.00it/s] | |
| [2025-10-21 08:52:48] INFO evaluator.py:574: Running generate_until requests | |
| Running generate_until requests: 0%| | 0/1319 [00:00<?, ?it/s] | |
| Adding requests: 0%| | 0/1319 [00:00<?, ?it/s][A Adding requests: 100%|ββββββββββ| 1319/1319 [00:00<00:00, 13469.99it/s] | |
| Processed prompts: 0%| | 0/1319 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A | |
| Processed prompts: 0%| | 1/1319 [01:10<25:48:00, 70.47s/it, est. speed input: 17.77 toks/s, output: 0.68 toks/s][A | |
| Processed prompts: 0%| | 2/1319 [01:10<10:41:55, 29.24s/it, est. speed input: 36.62 toks/s, output: 1.45 toks/s][A | |
| Processed prompts: 0%| | 3/1319 [01:11<5:53:20, 16.11s/it, est. speed input: 54.43 toks/s, output: 2.26 toks/s] [A | |
| Processed prompts: 0%| | 4/1319 [01:11<3:36:52, 9.90s/it, est. speed input: 73.48 toks/s, output: 3.12 toks/s][A | |
| Processed prompts: 0%| | 6/1319 [01:12<1:44:11, 4.76s/it, est. speed input: 104.20 toks/s, output: 4.38 toks/s][A | |
| Processed prompts: 1%| | 7/1319 [01:12<1:17:05, 3.53s/it, est. speed input: 123.86 toks/s, output: 5.32 toks/s][A | |
| Processed prompts: 1%| | 9/1319 [01:12<43:52, 2.01s/it, est. speed input: 158.37 toks/s, output: 7.04 toks/s] [A | |
| Processed prompts: 1%| | 10/1319 [01:12<35:30, 1.63s/it, est. speed input: 173.50 toks/s, output: 7.81 toks/s][A | |
| Processed prompts: 1%| | 11/1319 [01:13<27:17, 1.25s/it, est. speed input: 191.70 toks/s, output: 8.77 toks/s][A | |
| Processed prompts: 1%| | 14/1319 [01:14<17:49, 1.22it/s, est. speed input: 233.86 toks/s, output: 10.66 toks/s][A | |
| Processed prompts: 1%| | 15/1319 [01:14<14:42, 1.48it/s, est. speed input: 249.59 toks/s, output: 11.52 toks/s][A | |
| Processed prompts: 1%| | 16/1319 [01:14<12:11, 1.78it/s, est. speed input: 263.71 toks/s, output: 12.19 toks/s][A | |
| Processed prompts: 1%|β | 17/1319 [01:14<10:03, 2.16it/s, est. speed input: 283.41 toks/s, output: 13.22 toks/s][A | |
| Processed prompts: 1%|β | 18/1319 [01:15<09:11, 2.36it/s, est. speed input: 299.79 toks/s, output: 14.20 toks/s][A | |
| Processed prompts: 2%|β | 20/1319 [01:15<05:48, 3.73it/s, est. speed input: 329.62 toks/s, output: 15.81 toks/s][A | |
| Processed prompts: 2%|β | 22/1319 [01:15<04:25, 4.88it/s, est. speed input: 360.02 toks/s, output: 17.60 toks/s][A | |
| Processed prompts: 2%|β | 23/1319 [01:15<04:34, 4.73it/s, est. speed input: 374.08 toks/s, output: 18.43 toks/s][A | |
| Processed prompts: 2%|β | 26/1319 [01:15<02:45, 7.79it/s, est. speed input: 421.70 toks/s, output: 21.16 toks/s][A | |
| Processed prompts: 2%|β | 32/1319 [01:16<02:05, 10.28it/s, est. speed input: 517.58 toks/s, output: 26.90 toks/s][A | |
| Processed prompts: 3%|β | 34/1319 [01:16<02:11, 9.77it/s, est. speed input: 548.28 toks/s, output: 28.87 toks/s][A | |
| Processed prompts: 3%|β | 37/1319 [01:16<02:01, 10.55it/s, est. speed input: 593.48 toks/s, output: 31.65 toks/s][A | |
| Processed prompts: 3%|β | 39/1319 [01:16<01:56, 11.02it/s, est. speed input: 629.02 toks/s, output: 33.78 toks/s][A | |
| Processed prompts: 3%|β | 45/1319 [01:17<01:35, 13.29it/s, est. speed input: 720.27 toks/s, output: 39.50 toks/s][A | |
| Processed prompts: 4%|β | 50/1319 [01:17<01:29, 14.13it/s, est. speed input: 794.92 toks/s, output: 44.19 toks/s][A | |
| Processed prompts: 4%|β | 53/1319 [01:17<01:32, 13.75it/s, est. speed input: 834.43 toks/s, output: 46.54 toks/s][A | |
| Processed prompts: 4%|β | 56/1319 [01:18<01:37, 12.90it/s, est. speed input: 874.01 toks/s, output: 48.93 toks/s][A | |
| Processed prompts: 4%|β | 58/1319 [01:18<01:37, 12.87it/s, est. speed input: 903.84 toks/s, output: 50.81 toks/s][A | |
| Processed prompts: 5%|β | 62/1319 [01:18<01:28, 14.14it/s, est. speed input: 960.15 toks/s, output: 54.54 toks/s][A | |
| Processed prompts: 5%|β | 64/1319 [01:18<01:40, 12.45it/s, est. speed input: 992.49 toks/s, output: 56.77 toks/s][A | |
| Processed prompts: 5%|β | 67/1319 [01:18<01:44, 11.97it/s, est. speed input: 1034.09 toks/s, output: 59.73 toks/s][A | |
| Processed prompts: 6%|β | 74/1319 [01:19<01:19, 15.58it/s, est. speed input: 1132.72 toks/s, output: 66.57 toks/s][A | |
| Processed prompts: 6%|β | 80/1319 [01:19<01:07, 18.36it/s, est. speed input: 1213.72 toks/s, output: 72.02 toks/s][A | |
| Processed prompts: 6%|β | 83/1319 [01:19<01:29, 13.87it/s, est. speed input: 1251.41 toks/s, output: 74.78 toks/s][A | |
| Processed prompts: 6%|β | 85/1319 [01:20<01:37, 12.59it/s, est. speed input: 1277.96 toks/s, output: 76.77 toks/s][A | |
| Processed prompts: 7%|β | 92/1319 [01:20<01:21, 15.08it/s, est. speed input: 1366.13 toks/s, output: 82.64 toks/s][A | |
| Processed prompts: 7%|β | 96/1319 [01:20<01:28, 13.80it/s, est. speed input: 1415.09 toks/s, output: 86.10 toks/s][A | |
| Processed prompts: 8%|β | 102/1319 [01:21<01:10, 17.37it/s, est. speed input: 1494.44 toks/s, output: 91.58 toks/s][A | |
| Processed prompts: 8%|β | 105/1319 [01:21<01:25, 14.25it/s, est. speed input: 1531.74 toks/s, output: 94.45 toks/s][A | |
| Processed prompts: 8%|β | 108/1319 [01:21<01:19, 15.20it/s, est. speed input: 1570.42 toks/s, output: 97.36 toks/s][A | |
| Processed prompts: 8%|β | 111/1319 [01:21<01:22, 14.58it/s, est. speed input: 1607.85 toks/s, output: 100.23 toks/s][A | |
| Processed prompts: 9%|β | 115/1319 [01:22<01:30, 13.24it/s, est. speed input: 1655.09 toks/s, output: 103.89 toks/s][A | |
| Processed prompts: 9%|β | 117/1319 [01:22<01:35, 12.56it/s, est. speed input: 1677.84 toks/s, output: 105.69 toks/s][A | |
| Processed prompts: 9%|β | 123/1319 [01:22<01:13, 16.26it/s, est. speed input: 1752.07 toks/s, output: 111.35 toks/s][A | |
| Processed prompts: 10%|β | 128/1319 [01:22<01:23, 14.24it/s, est. speed input: 1809.22 toks/s, output: 115.66 toks/s][A | |
| Processed prompts: 10%|β | 131/1319 [01:23<01:21, 14.53it/s, est. speed input: 1843.95 toks/s, output: 118.39 toks/s][A | |
| Processed prompts: 10%|β | 133/1319 [01:23<01:35, 12.40it/s, est. speed input: 1865.03 toks/s, output: 120.25 toks/s][A | |
| Processed prompts: 10%|β | 136/1319 [01:23<01:30, 13.12it/s, est. speed input: 1899.95 toks/s, output: 123.10 toks/s][A | |
| Processed prompts: 11%|β | 141/1319 [01:23<01:15, 15.57it/s, est. speed input: 1960.25 toks/s, output: 127.84 toks/s][A | |
| Processed prompts: 11%|β | 145/1319 [01:24<01:16, 15.32it/s, est. speed input: 2006.44 toks/s, output: 131.80 toks/s][A | |
| Processed prompts: 11%|β | 148/1319 [01:24<01:24, 13.91it/s, est. speed input: 2040.74 toks/s, output: 134.79 toks/s][A | |
| Processed prompts: 12%|ββ | 157/1319 [01:24<01:03, 18.29it/s, est. speed input: 2147.91 toks/s, output: 143.77 toks/s][A | |
| Processed prompts: 12%|ββ | 161/1319 [01:25<01:06, 17.31it/s, est. speed input: 2191.84 toks/s, output: 147.36 toks/s][A | |
| Processed prompts: 13%|ββ | 167/1319 [01:25<01:12, 15.92it/s, est. speed input: 2256.79 toks/s, output: 153.04 toks/s][A | |
| Processed prompts: 13%|ββ | 170/1319 [01:25<01:12, 15.87it/s, est. speed input: 2293.66 toks/s, output: 156.20 toks/s][A | |
| Processed prompts: 13%|ββ | 177/1319 [01:26<01:11, 16.06it/s, est. speed input: 2373.31 toks/s, output: 163.37 toks/s][A | |
| Processed prompts: 14%|ββ | 183/1319 [01:26<01:04, 17.69it/s, est. speed input: 2444.95 toks/s, output: 169.88 toks/s][A | |
| Processed prompts: 14%|ββ | 188/1319 [01:26<01:03, 17.95it/s, est. speed input: 2506.09 toks/s, output: 175.46 toks/s][A | |
| Processed prompts: 14%|ββ | 190/1319 [01:26<01:21, 13.91it/s, est. speed input: 2522.54 toks/s, output: 177.21 toks/s][A | |
| Processed prompts: 15%|ββ | 193/1319 [01:27<01:25, 13.17it/s, est. speed input: 2557.21 toks/s, output: 180.25 toks/s][A | |
| Processed prompts: 15%|ββ | 201/1319 [01:27<00:52, 21.32it/s, est. speed input: 2653.40 toks/s, output: 188.90 toks/s][A | |
| Processed prompts: 16%|ββ | 205/1319 [01:27<01:11, 15.53it/s, est. speed input: 2691.38 toks/s, output: 192.74 toks/s][A | |
| Processed prompts: 16%|ββ | 210/1319 [01:28<01:13, 15.17it/s, est. speed input: 2743.94 toks/s, output: 197.89 toks/s][A | |
| Processed prompts: 16%|ββ | 215/1319 [01:28<01:05, 16.78it/s, est. speed input: 2799.65 toks/s, output: 203.41 toks/s][A | |
| Processed prompts: 17%|ββ | 218/1319 [01:28<01:15, 14.53it/s, est. speed input: 2831.05 toks/s, output: 206.57 toks/s][A | |
| Processed prompts: 17%|ββ | 225/1319 [01:28<00:59, 18.33it/s, est. speed input: 2915.51 toks/s, output: 215.04 toks/s][A | |
| Processed prompts: 18%|ββ | 232/1319 [01:29<01:04, 16.85it/s, est. speed input: 2984.72 toks/s, output: 221.81 toks/s][A | |
| Processed prompts: 18%|ββ | 240/1319 [01:29<00:55, 19.45it/s, est. speed input: 3075.50 toks/s, output: 231.09 toks/s][A | |
| Processed prompts: 19%|ββ | 250/1319 [01:30<01:02, 16.97it/s, est. speed input: 3180.17 toks/s, output: 242.14 toks/s][A | |
| Processed prompts: 20%|ββ | 258/1319 [01:30<00:58, 18.05it/s, est. speed input: 3266.12 toks/s, output: 250.53 toks/s][A | |
| Processed prompts: 20%|ββ | 260/1319 [01:31<01:09, 15.35it/s, est. speed input: 3281.94 toks/s, output: 252.36 toks/s][A | |
| Processed prompts: 20%|ββ | 268/1319 [01:31<00:56, 18.73it/s, est. speed input: 3366.20 toks/s, output: 260.99 toks/s][A | |
| Processed prompts: 21%|ββ | 271/1319 [01:31<01:13, 14.31it/s, est. speed input: 3385.44 toks/s, output: 262.91 toks/s][A | |
| Processed prompts: 21%|ββ | 276/1319 [01:32<01:14, 13.92it/s, est. speed input: 3430.18 toks/s, output: 267.83 toks/s][A | |
| Processed prompts: 21%|βββ | 281/1319 [01:32<01:18, 13.17it/s, est. speed input: 3476.18 toks/s, output: 272.80 toks/s][A | |
| Processed prompts: 21%|βββ | 283/1319 [01:32<01:21, 12.72it/s, est. speed input: 3494.15 toks/s, output: 274.97 toks/s][A | |
| Processed prompts: 22%|βββ | 290/1319 [01:33<01:11, 14.40it/s, est. speed input: 3562.58 toks/s, output: 282.76 toks/s][A | |
| Processed prompts: 22%|βββ | 296/1319 [01:33<01:07, 15.14it/s, est. speed input: 3621.84 toks/s, output: 289.66 toks/s][A | |
| Processed prompts: 23%|βββ | 304/1319 [01:33<01:01, 16.49it/s, est. speed input: 3703.97 toks/s, output: 299.21 toks/s][A | |
| Processed prompts: 23%|βββ | 307/1319 [01:34<01:04, 15.79it/s, est. speed input: 3730.36 toks/s, output: 302.44 toks/s][A | |
| Processed prompts: 24%|βββ | 313/1319 [01:34<00:57, 17.64it/s, est. speed input: 3788.96 toks/s, output: 309.20 toks/s][A | |
| Processed prompts: 24%|βββ | 315/1319 [01:34<01:15, 13.38it/s, est. speed input: 3796.49 toks/s, output: 310.49 toks/s][A | |
| Processed prompts: 24%|βββ | 318/1319 [01:35<01:12, 13.89it/s, est. speed input: 3822.60 toks/s, output: 313.65 toks/s][A | |
| Processed prompts: 24%|βββ | 320/1319 [01:35<01:18, 12.66it/s, est. speed input: 3839.43 toks/s, output: 315.66 toks/s][A | |
| Processed prompts: 25%|βββ | 326/1319 [01:35<01:01, 16.14it/s, est. speed input: 3899.50 toks/s, output: 321.62 toks/s][A | |
| Processed prompts: 25%|βββ | 328/1319 [01:35<01:12, 13.62it/s, est. speed input: 3910.89 toks/s, output: 323.27 toks/s][A | |
| Processed prompts: 25%|βββ | 331/1319 [01:35<01:10, 14.08it/s, est. speed input: 3935.78 toks/s, output: 326.39 toks/s][A | |
| Processed prompts: 26%|βββ | 337/1319 [01:36<00:57, 16.96it/s, est. speed input: 3993.16 toks/s, output: 332.44 toks/s][A | |
| Processed prompts: 26%|βββ | 343/1319 [01:36<01:01, 15.85it/s, est. speed input: 4044.36 toks/s, output: 339.14 toks/s][A | |
| Processed prompts: 26%|βββ | 348/1319 [01:36<01:00, 16.14it/s, est. speed input: 4087.46 toks/s, output: 343.86 toks/s][A | |
| Processed prompts: 27%|βββ | 355/1319 [01:37<00:52, 18.40it/s, est. speed input: 4155.69 toks/s, output: 350.68 toks/s][A | |
| Processed prompts: 27%|βββ | 359/1319 [01:37<01:05, 14.70it/s, est. speed input: 4188.13 toks/s, output: 354.94 toks/s][A | |
| Processed prompts: 28%|βββ | 368/1319 [01:38<00:54, 17.60it/s, est. speed input: 4274.36 toks/s, output: 365.42 toks/s][A | |
| Processed prompts: 28%|βββ | 373/1319 [01:38<00:58, 16.30it/s, est. speed input: 4314.24 toks/s, output: 370.46 toks/s][A | |
| Processed prompts: 29%|βββ | 376/1319 [01:38<01:00, 15.68it/s, est. speed input: 4339.02 toks/s, output: 373.30 toks/s][A | |
| Processed prompts: 29%|βββ | 380/1319 [01:38<00:57, 16.23it/s, est. speed input: 4378.54 toks/s, output: 378.59 toks/s][A | |
| Processed prompts: 29%|βββ | 386/1319 [01:39<00:59, 15.80it/s, est. speed input: 4426.61 toks/s, output: 385.35 toks/s][A | |
| Processed prompts: 30%|βββ | 390/1319 [01:39<00:59, 15.70it/s, est. speed input: 4459.40 toks/s, output: 389.99 toks/s][A | |
| Processed prompts: 30%|βββ | 394/1319 [01:39<00:56, 16.27it/s, est. speed input: 4492.10 toks/s, output: 394.01 toks/s][A | |
| Processed prompts: 30%|βββ | 399/1319 [01:40<00:53, 17.15it/s, est. speed input: 4536.46 toks/s, output: 400.19 toks/s][A | |
| Processed prompts: 30%|βββ | 401/1319 [01:40<01:03, 14.43it/s, est. speed input: 4545.76 toks/s, output: 401.85 toks/s][A | |
| Processed prompts: 31%|βββ | 403/1319 [01:40<01:04, 14.19it/s, est. speed input: 4560.55 toks/s, output: 403.89 toks/s][A | |
| Processed prompts: 31%|βββ | 408/1319 [01:40<00:59, 15.23it/s, est. speed input: 4602.00 toks/s, output: 409.13 toks/s][A | |
| Processed prompts: 31%|ββββ | 413/1319 [01:40<00:52, 17.25it/s, est. speed input: 4647.03 toks/s, output: 414.78 toks/s][A | |
| Processed prompts: 32%|ββββ | 418/1319 [01:41<01:07, 13.41it/s, est. speed input: 4676.81 toks/s, output: 419.80 toks/s][A | |
| Processed prompts: 32%|ββββ | 424/1319 [01:41<00:57, 15.52it/s, est. speed input: 4733.23 toks/s, output: 428.01 toks/s][A | |
| Processed prompts: 32%|ββββ | 427/1319 [01:41<01:01, 14.55it/s, est. speed input: 4753.76 toks/s, output: 431.30 toks/s][A | |
| Processed prompts: 33%|ββββ | 430/1319 [01:42<01:02, 14.33it/s, est. speed input: 4777.76 toks/s, output: 434.80 toks/s][A | |
| Processed prompts: 33%|ββββ | 438/1319 [01:42<00:53, 16.35it/s, est. speed input: 4842.53 toks/s, output: 443.51 toks/s][A | |
| Processed prompts: 34%|ββββ | 444/1319 [01:42<00:51, 16.98it/s, est. speed input: 4885.74 toks/s, output: 448.57 toks/s][A | |
| Processed prompts: 34%|ββββ | 447/1319 [01:43<00:53, 16.15it/s, est. speed input: 4905.26 toks/s, output: 450.70 toks/s][A | |
| Processed prompts: 34%|ββββ | 451/1319 [01:43<00:56, 15.29it/s, est. speed input: 4931.71 toks/s, output: 453.98 toks/s][A | |
| Processed prompts: 35%|ββββ | 456/1319 [01:43<00:54, 15.87it/s, est. speed input: 4971.13 toks/s, output: 459.57 toks/s][A | |
| Processed prompts: 35%|ββββ | 464/1319 [01:44<00:45, 18.71it/s, est. speed input: 5038.69 toks/s, output: 468.68 toks/s][A | |
| Processed prompts: 35%|ββββ | 466/1319 [01:44<00:53, 15.83it/s, est. speed input: 5049.13 toks/s, output: 470.69 toks/s][A | |
| Processed prompts: 36%|ββββ | 470/1319 [01:44<00:51, 16.42it/s, est. speed input: 5080.60 toks/s, output: 474.89 toks/s][A | |
| Processed prompts: 36%|ββββ | 475/1319 [01:44<00:50, 16.69it/s, est. speed input: 5117.02 toks/s, output: 479.88 toks/s][A | |
| Processed prompts: 37%|ββββ | 482/1319 [01:45<00:45, 18.40it/s, est. speed input: 5167.75 toks/s, output: 485.71 toks/s][A | |
| Processed prompts: 37%|ββββ | 486/1319 [01:45<00:50, 16.44it/s, est. speed input: 5190.69 toks/s, output: 488.95 toks/s][A | |
| Processed prompts: 37%|ββββ | 492/1319 [01:45<00:48, 16.91it/s, est. speed input: 5233.88 toks/s, output: 494.97 toks/s][A | |
| Processed prompts: 38%|ββββ | 496/1319 [01:46<00:47, 17.18it/s, est. speed input: 5261.86 toks/s, output: 498.83 toks/s][A | |
| Processed prompts: 38%|ββββ | 499/1319 [01:46<00:51, 15.80it/s, est. speed input: 5277.85 toks/s, output: 500.96 toks/s][A | |
| Processed prompts: 38%|ββββ | 502/1319 [01:46<00:51, 15.90it/s, est. speed input: 5298.04 toks/s, output: 503.33 toks/s][A | |
| Processed prompts: 38%|ββββ | 506/1319 [01:46<00:53, 15.31it/s, est. speed input: 5321.93 toks/s, output: 506.31 toks/s][A | |
| Processed prompts: 39%|ββββ | 510/1319 [01:46<00:47, 16.91it/s, est. speed input: 5353.39 toks/s, output: 510.18 toks/s][A | |
| Processed prompts: 39%|ββββ | 513/1319 [01:47<00:54, 14.66it/s, est. speed input: 5369.22 toks/s, output: 513.41 toks/s][A | |
| Processed prompts: 39%|ββββ | 516/1319 [01:47<00:56, 14.32it/s, est. speed input: 5388.97 toks/s, output: 516.03 toks/s][A | |
| Processed prompts: 39%|ββββ | 519/1319 [01:47<00:54, 14.69it/s, est. speed input: 5409.40 toks/s, output: 518.93 toks/s][A | |
| Processed prompts: 40%|ββββ | 528/1319 [01:48<00:45, 17.56it/s, est. speed input: 5475.04 toks/s, output: 528.29 toks/s][A | |
| Processed prompts: 40%|ββββ | 532/1319 [01:48<00:46, 17.06it/s, est. speed input: 5498.86 toks/s, output: 531.46 toks/s][A | |
| Processed prompts: 41%|ββββ | 538/1319 [01:48<00:44, 17.52it/s, est. speed input: 5536.96 toks/s, output: 536.34 toks/s][A | |
| Processed prompts: 41%|ββββ | 540/1319 [01:48<00:50, 15.52it/s, est. speed input: 5545.92 toks/s, output: 538.46 toks/s][A | |
| Processed prompts: 41%|ββββ | 543/1319 [01:49<00:49, 15.78it/s, est. speed input: 5565.19 toks/s, output: 541.84 toks/s][A | |
| Processed prompts: 41%|βββββ | 547/1319 [01:49<00:48, 15.90it/s, est. speed input: 5589.30 toks/s, output: 544.92 toks/s][A | |
| Processed prompts: 42%|βββββ | 552/1319 [01:49<00:44, 17.22it/s, est. speed input: 5622.22 toks/s, output: 549.03 toks/s][A | |
| Processed prompts: 43%|βββββ | 561/1319 [01:49<00:38, 19.71it/s, est. speed input: 5685.94 toks/s, output: 558.24 toks/s][A | |
| Processed prompts: 43%|βββββ | 563/1319 [01:50<00:47, 15.98it/s, est. speed input: 5689.23 toks/s, output: 558.88 toks/s][A | |
| Processed prompts: 43%|βββββ | 565/1319 [01:50<00:48, 15.57it/s, est. speed input: 5702.64 toks/s, output: 560.77 toks/s][A | |
| Processed prompts: 43%|βββββ | 572/1319 [01:50<00:43, 17.31it/s, est. speed input: 5747.61 toks/s, output: 567.09 toks/s][A | |
| Processed prompts: 44%|βββββ | 574/1319 [01:50<00:50, 14.82it/s, est. speed input: 5754.02 toks/s, output: 568.26 toks/s][A | |
| Processed prompts: 44%|βββββ | 579/1319 [01:51<00:45, 16.44it/s, est. speed input: 5785.70 toks/s, output: 572.28 toks/s][A | |
| Processed prompts: 44%|βββββ | 583/1319 [01:51<00:44, 16.50it/s, est. speed input: 5810.34 toks/s, output: 576.10 toks/s][A | |
| Processed prompts: 45%|βββββ | 588/1319 [01:51<00:44, 16.37it/s, est. speed input: 5842.46 toks/s, output: 581.13 toks/s][A | |
| Processed prompts: 45%|βββββ | 592/1319 [01:51<00:42, 17.01it/s, est. speed input: 5867.91 toks/s, output: 585.48 toks/s][A | |
| Processed prompts: 45%|βββββ | 597/1319 [01:52<00:38, 18.85it/s, est. speed input: 5899.85 toks/s, output: 589.15 toks/s][A | |
| Processed prompts: 45%|βββββ | 599/1319 [01:52<00:44, 16.36it/s, est. speed input: 5907.98 toks/s, output: 591.34 toks/s][A | |
| Processed prompts: 46%|βββββ | 602/1319 [01:52<00:45, 15.84it/s, est. speed input: 5922.60 toks/s, output: 593.31 toks/s][A | |
| Processed prompts: 46%|βββββ | 604/1319 [01:52<00:51, 14.00it/s, est. speed input: 5929.95 toks/s, output: 594.57 toks/s][A | |
| Processed prompts: 47%|βββββ | 614/1319 [01:53<00:36, 19.47it/s, est. speed input: 5996.40 toks/s, output: 603.29 toks/s][A | |
| Processed prompts: 47%|βββββ | 618/1319 [01:53<00:37, 18.72it/s, est. speed input: 6020.03 toks/s, output: 606.93 toks/s][A | |
| Processed prompts: 47%|βββββ | 623/1319 [01:53<00:40, 17.09it/s, est. speed input: 6045.52 toks/s, output: 610.77 toks/s][A | |
| Processed prompts: 48%|βββββ | 628/1319 [01:53<00:38, 18.15it/s, est. speed input: 6078.12 toks/s, output: 615.61 toks/s][A | |
| Processed prompts: 48%|βββββ | 633/1319 [01:54<00:41, 16.70it/s, est. speed input: 6105.43 toks/s, output: 621.21 toks/s][A | |
| Processed prompts: 48%|βββββ | 638/1319 [01:54<00:38, 17.82it/s, est. speed input: 6137.06 toks/s, output: 626.28 toks/s][A | |
| Processed prompts: 49%|βββββ | 641/1319 [01:54<00:38, 17.56it/s, est. speed input: 6154.27 toks/s, output: 628.62 toks/s][A | |
| Processed prompts: 49%|βββββ | 647/1319 [01:55<00:36, 18.35it/s, est. speed input: 6195.11 toks/s, output: 635.38 toks/s][A | |
| Processed prompts: 49%|βββββ | 652/1319 [01:55<00:39, 16.73it/s, est. speed input: 6217.92 toks/s, output: 639.09 toks/s][A | |
| Processed prompts: 50%|βββββ | 655/1319 [01:55<00:38, 17.40it/s, est. speed input: 6236.09 toks/s, output: 641.52 toks/s][A | |
| Processed prompts: 50%|βββββ | 663/1319 [01:55<00:30, 21.52it/s, est. speed input: 6290.21 toks/s, output: 648.94 toks/s][A | |
| Processed prompts: 51%|βββββ | 668/1319 [01:56<00:32, 20.20it/s, est. speed input: 6316.19 toks/s, output: 652.65 toks/s][A | |
| Processed prompts: 51%|βββββ | 673/1319 [01:56<00:32, 20.06it/s, est. speed input: 6343.14 toks/s, output: 655.99 toks/s][A | |
| Processed prompts: 52%|ββββββ | 680/1319 [01:56<00:27, 23.06it/s, est. speed input: 6389.74 toks/s, output: 661.92 toks/s][A | |
| Processed prompts: 52%|ββββββ | 683/1319 [01:56<00:26, 23.85it/s, est. speed input: 6413.43 toks/s, output: 666.35 toks/s][A | |
| Processed prompts: 52%|ββββββ | 686/1319 [01:56<00:25, 24.63it/s, est. speed input: 6434.69 toks/s, output: 670.38 toks/s][A | |
| Processed prompts: 52%|ββββββ | 690/1319 [01:56<00:23, 27.32it/s, est. speed input: 6460.36 toks/s, output: 673.14 toks/s][A | |
| Processed prompts: 53%|ββββββ | 695/1319 [01:57<00:19, 31.86it/s, est. speed input: 6494.02 toks/s, output: 676.79 toks/s][A | |
| Processed prompts: 53%|ββββββ | 699/1319 [01:57<00:18, 33.40it/s, est. speed input: 6520.10 toks/s, output: 679.88 toks/s][A | |
| Processed prompts: 53%|ββββββ | 703/1319 [01:57<00:17, 34.55it/s, est. speed input: 6545.89 toks/s, output: 682.86 toks/s][A | |
| Processed prompts: 54%|ββββββ | 707/1319 [01:57<00:17, 35.49it/s, est. speed input: 6573.65 toks/s, output: 686.28 toks/s][A | |
| Processed prompts: 54%|ββββββ | 714/1319 [01:57<00:13, 44.26it/s, est. speed input: 6626.77 toks/s, output: 693.98 toks/s][A | |
| Processed prompts: 55%|ββββββ | 719/1319 [01:57<00:13, 45.46it/s, est. speed input: 6667.65 toks/s, output: 701.16 toks/s][A | |
| Processed prompts: 55%|ββββββ | 725/1319 [01:57<00:12, 49.17it/s, est. speed input: 6710.58 toks/s, output: 707.32 toks/s][A | |
| Processed prompts: 55%|ββββββ | 731/1319 [01:57<00:11, 52.01it/s, est. speed input: 6753.49 toks/s, output: 712.88 toks/s][A | |
| Processed prompts: 56%|ββββββ | 737/1319 [01:57<00:13, 42.15it/s, est. speed input: 6795.82 toks/s, output: 720.41 toks/s][A | |
| Processed prompts: 56%|ββββββ | 744/1319 [01:58<00:11, 48.88it/s, est. speed input: 6848.99 toks/s, output: 728.26 toks/s][A | |
| Processed prompts: 57%|ββββββ | 754/1319 [01:58<00:11, 49.91it/s, est. speed input: 6916.61 toks/s, output: 736.96 toks/s][A | |
| Processed prompts: 58%|ββββββ | 760/1319 [01:58<00:12, 43.06it/s, est. speed input: 6954.45 toks/s, output: 743.49 toks/s][A | |
| Processed prompts: 58%|ββββββ | 767/1319 [01:58<00:13, 40.87it/s, est. speed input: 7000.38 toks/s, output: 750.43 toks/s][A | |
| Processed prompts: 59%|ββββββ | 777/1319 [01:58<00:12, 44.74it/s, est. speed input: 7068.54 toks/s, output: 760.08 toks/s][A | |
| Processed prompts: 60%|ββββββ | 787/1319 [01:58<00:11, 47.54it/s, est. speed input: 7139.59 toks/s, output: 771.52 toks/s][A | |
| Processed prompts: 60%|ββββββ | 796/1319 [01:59<00:10, 48.04it/s, est. speed input: 7209.06 toks/s, output: 781.73 toks/s][A | |
| Processed prompts: 61%|ββββββ | 807/1319 [01:59<00:09, 51.87it/s, est. speed input: 7287.04 toks/s, output: 793.33 toks/s][A | |
| Processed prompts: 62%|βββββββ | 821/1319 [01:59<00:08, 60.41it/s, est. speed input: 7401.99 toks/s, output: 812.34 toks/s][A | |
| Processed prompts: 63%|βββββββ | 829/1319 [01:59<00:08, 56.63it/s, est. speed input: 7455.26 toks/s, output: 820.40 toks/s][A | |
| Processed prompts: 64%|βββββββ | 840/1319 [01:59<00:08, 59.32it/s, est. speed input: 7530.53 toks/s, output: 830.64 toks/s][A | |
| Processed prompts: 64%|βββββββ | 849/1319 [02:00<00:08, 57.93it/s, est. speed input: 7592.88 toks/s, output: 841.15 toks/s][A | |
| Processed prompts: 66%|βββββββ | 867/1319 [02:00<00:06, 73.63it/s, est. speed input: 7726.84 toks/s, output: 862.10 toks/s][A | |
| Processed prompts: 66%|βββββββ | 875/1319 [02:00<00:06, 67.64it/s, est. speed input: 7780.16 toks/s, output: 870.49 toks/s][A | |
| Processed prompts: 67%|βββββββ | 886/1319 [02:00<00:06, 68.87it/s, est. speed input: 7856.19 toks/s, output: 882.66 toks/s][A | |
| Processed prompts: 68%|βββββββ | 899/1319 [02:00<00:05, 74.41it/s, est. speed input: 7952.33 toks/s, output: 899.72 toks/s][A | |
| Processed prompts: 69%|βββββββ | 908/1319 [02:00<00:05, 71.04it/s, est. speed input: 8013.17 toks/s, output: 909.80 toks/s][A | |
| Processed prompts: 69%|βββββββ | 916/1319 [02:00<00:06, 66.95it/s, est. speed input: 8066.71 toks/s, output: 919.66 toks/s][A | |
| Processed prompts: 70%|βββββββ | 925/1319 [02:01<00:05, 66.48it/s, est. speed input: 8121.69 toks/s, output: 927.49 toks/s][A | |
| Processed prompts: 71%|βββββββ | 932/1319 [02:01<00:06, 62.19it/s, est. speed input: 8167.02 toks/s, output: 936.12 toks/s][A | |
| Processed prompts: 72%|ββββββββ | 944/1319 [02:01<00:05, 69.94it/s, est. speed input: 8252.94 toks/s, output: 952.82 toks/s][A | |
| Processed prompts: 72%|ββββββββ | 952/1319 [02:01<00:05, 67.19it/s, est. speed input: 8309.59 toks/s, output: 965.65 toks/s][A | |
| Processed prompts: 73%|ββββββββ | 962/1319 [02:01<00:05, 70.23it/s, est. speed input: 8373.41 toks/s, output: 976.18 toks/s][A | |
| Processed prompts: 74%|ββββββββ | 978/1319 [02:01<00:03, 86.56it/s, est. speed input: 8487.65 toks/s, output: 997.80 toks/s][A | |
| Processed prompts: 75%|ββββββββ | 990/1319 [02:01<00:03, 89.91it/s, est. speed input: 8566.37 toks/s, output: 1012.00 toks/s][A | |
| Processed prompts: 76%|ββββββββ | 1002/1319 [02:01<00:03, 92.76it/s, est. speed input: 8645.27 toks/s, output: 1026.34 toks/s][A | |
| Processed prompts: 77%|ββββββββ | 1012/1319 [02:02<00:03, 80.14it/s, est. speed input: 8708.44 toks/s, output: 1039.18 toks/s][A | |
| Processed prompts: 77%|ββββββββ | 1021/1319 [02:02<00:03, 80.20it/s, est. speed input: 8768.10 toks/s, output: 1052.25 toks/s][A | |
| Processed prompts: 78%|ββββββββ | 1030/1319 [02:02<00:03, 80.70it/s, est. speed input: 8826.95 toks/s, output: 1064.08 toks/s][A | |
| Processed prompts: 79%|ββββββββ | 1039/1319 [02:02<00:03, 81.68it/s, est. speed input: 8881.95 toks/s, output: 1072.61 toks/s][A | |
| Processed prompts: 79%|ββββββββ | 1048/1319 [02:02<00:03, 82.55it/s, est. speed input: 8936.01 toks/s, output: 1080.66 toks/s][A | |
| Processed prompts: 80%|ββββββββ | 1057/1319 [02:02<00:03, 73.93it/s, est. speed input: 8987.91 toks/s, output: 1089.55 toks/s][A | |
| Processed prompts: 81%|ββββββββ | 1065/1319 [02:02<00:03, 67.65it/s, est. speed input: 9032.34 toks/s, output: 1097.17 toks/s][A | |
| Processed prompts: 82%|βββββββββ | 1078/1319 [02:02<00:03, 77.81it/s, est. speed input: 9114.40 toks/s, output: 1111.32 toks/s][A | |
| Processed prompts: 83%|βββββββββ | 1099/1319 [02:03<00:02, 105.51it/s, est. speed input: 9245.91 toks/s, output: 1131.81 toks/s][A | |
| Processed prompts: 84%|βββββββββ | 1110/1319 [02:03<00:02, 102.71it/s, est. speed input: 9314.36 toks/s, output: 1143.97 toks/s][A | |
| Processed prompts: 85%|βββββββββ | 1121/1319 [02:03<00:02, 86.01it/s, est. speed input: 9378.74 toks/s, output: 1157.19 toks/s] [A | |
| Processed prompts: 86%|βββββββββ | 1131/1319 [02:03<00:02, 80.77it/s, est. speed input: 9438.42 toks/s, output: 1169.76 toks/s][A | |
| Processed prompts: 86%|βββββββββ | 1140/1319 [02:03<00:02, 82.90it/s, est. speed input: 9496.21 toks/s, output: 1182.14 toks/s][A | |
| Processed prompts: 87%|βββββββββ | 1149/1319 [02:03<00:02, 84.54it/s, est. speed input: 9546.21 toks/s, output: 1191.71 toks/s][A | |
| Processed prompts: 88%|βββββββββ | 1158/1319 [02:03<00:01, 80.78it/s, est. speed input: 9597.11 toks/s, output: 1201.64 toks/s][A | |
| Processed prompts: 88%|βββββββββ | 1167/1319 [02:04<00:02, 68.87it/s, est. speed input: 9642.27 toks/s, output: 1210.96 toks/s][A | |
| Processed prompts: 89%|βββββββββ | 1175/1319 [02:04<00:02, 68.88it/s, est. speed input: 9685.77 toks/s, output: 1220.31 toks/s][A | |
| Processed prompts: 90%|βββββββββ | 1183/1319 [02:04<00:02, 52.20it/s, est. speed input: 9723.85 toks/s, output: 1231.32 toks/s][A | |
| Processed prompts: 90%|βββββββββ | 1190/1319 [02:04<00:02, 54.87it/s, est. speed input: 9764.47 toks/s, output: 1241.35 toks/s][A | |
| Processed prompts: 91%|βββββββββ | 1203/1319 [02:04<00:01, 69.12it/s, est. speed input: 9846.83 toks/s, output: 1261.52 toks/s][A | |
| Processed prompts: 92%|ββββββββββ| 1211/1319 [02:04<00:01, 69.87it/s, est. speed input: 9895.21 toks/s, output: 1274.43 toks/s][A | |
| Processed prompts: 92%|ββββββββββ| 1219/1319 [02:04<00:01, 71.03it/s, est. speed input: 9939.65 toks/s, output: 1285.92 toks/s][A | |
| Processed prompts: 93%|ββββββββββ| 1228/1319 [02:04<00:01, 75.79it/s, est. speed input: 9993.26 toks/s, output: 1300.13 toks/s][A | |
| Processed prompts: 94%|ββββββββββ| 1236/1319 [02:05<00:01, 70.97it/s, est. speed input: 10037.30 toks/s, output: 1312.30 toks/s][A | |
| Processed prompts: 94%|ββββββββββ| 1244/1319 [02:05<00:01, 69.07it/s, est. speed input: 10081.58 toks/s, output: 1324.74 toks/s][A | |
| Processed prompts: 95%|ββββββββββ| 1252/1319 [02:05<00:01, 63.18it/s, est. speed input: 10126.69 toks/s, output: 1339.11 toks/s][A | |
| Processed prompts: 95%|ββββββββββ| 1259/1319 [02:05<00:00, 64.17it/s, est. speed input: 10165.61 toks/s, output: 1350.83 toks/s][A | |
| Processed prompts: 96%|ββββββββββ| 1268/1319 [02:05<00:00, 69.59it/s, est. speed input: 10212.27 toks/s, output: 1364.76 toks/s][A | |
| Processed prompts: 97%|ββββββββββ| 1278/1319 [02:05<00:00, 75.43it/s, est. speed input: 10270.46 toks/s, output: 1382.98 toks/s][A | |
| Processed prompts: 98%|ββββββββββ| 1292/1319 [02:05<00:00, 91.83it/s, est. speed input: 10353.68 toks/s, output: 1408.87 toks/s][A | |
| Processed prompts: 99%|ββββββββββ| 1302/1319 [02:05<00:00, 83.46it/s, est. speed input: 10406.57 toks/s, output: 1427.13 toks/s][A | |
| Processed prompts: 99%|ββββββββββ| 1311/1319 [02:06<00:00, 81.22it/s, est. speed input: 10453.11 toks/s, output: 1443.89 toks/s][A | |
| Processed prompts: 100%|ββββββββββ| 1319/1319 [02:06<00:00, 81.22it/s, est. speed input: 10493.62 toks/s, output: 1459.18 toks/s][A Processed prompts: 100%|ββββββββββ| 1319/1319 [02:06<00:00, 10.46it/s, est. speed input: 10493.62 toks/s, output: 1459.18 toks/s] | |
| Running generate_until requests: 0%| | 1/1319 [02:06<46:13:33, 126.26s/it] Running generate_until requests: 100%|ββββββββββ| 1319/1319 [02:06<00:00, 10.45it/s] | |
| fatal: not a git repository (or any of the parent directories): .git | |
| [2025-10-21 08:54:59] INFO evaluation_tracker.py:280: Output path not provided, skipping saving results aggregated | |
| vllm (pretrained=/mnt/nvme2/eldar/for_nvidia/calib1024_damp0.07_obsmse_symTrue,tensor_parallel_size=1,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto | |
| |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| | |
| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| | |
| |gsm8k| 3|flexible-extract| 5|exact_match|β |0.8014|Β± |0.0110| | |
| | | |strict-match | 5|exact_match|β |0.8522|Β± |0.0098| | |