Zhuoning commited on
Commit
2c2962d
·
verified ·
1 Parent(s): 2e31242

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -10
README.md CHANGED
@@ -1,3 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # 🎯 General Video Embedder (GVE)
2
 
3
  > **One Embedder for All Video Retrieval Scenarios**
@@ -60,11 +83,9 @@ Built on **Qwen2.5-VL** and trained only with LoRA with **13M** collected and sy
60
  1. Loading model
61
 
62
  ```python
63
- from transformers import AutoModel, AutoProcessor
64
-
65
- model_path = '.'
66
- model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map='auto', low_cpu_mem_usage=True, torch_dtype='bfloat16')
67
- processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True, add_eos_token=True)
68
  processor.tokenizer.padding_side = 'left'
69
  ```
70
 
@@ -111,9 +132,13 @@ embedding = F.normalize(outputs['last_hidden_state'][:, -1, :], p=2, dim=1)
111
  ## 📚 Citation
112
 
113
  ```bibtex
114
- @inproceedings{guo2025general-video-embedding,
115
- title={Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum},
116
- author={Zhuoning Guo, Mingxin Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie and Xiaowen Chu},
117
- year={2025}
 
 
 
 
118
  }
119
- ```
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: transformers
5
+ tags:
6
+ - pytorch
7
+ - video
8
+ - retrieval
9
+ - embedding
10
+ - multimodal
11
+ - qwen2.5-vl
12
+ pipeline_tag: sentence-similarity
13
+ datasets:
14
+ - Alibaba-NLP/UVRB
15
+ - Vividbot/vast-2m-vi
16
+ - TempoFunk/webvid-10M
17
+ - OpenGVLab/InternVid
18
+ metrics:
19
+ - recall
20
+ base_model:
21
+ - Qwen/Qwen2.5-VL-3B-Instruct
22
+ ---
23
+
24
  # 🎯 General Video Embedder (GVE)
25
 
26
  > **One Embedder for All Video Retrieval Scenarios**
 
83
  1. Loading model
84
 
85
  ```python
86
+ model_path = 'Alibaba-NLP/GVE-3B'
87
+ model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map='auto', low_cpu_mem_usage=True, torch_dtype=torch.bfloat16)
88
+ processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True, use_fast=True)
 
 
89
  processor.tokenizer.padding_side = 'left'
90
  ```
91
 
 
132
  ## 📚 Citation
133
 
134
  ```bibtex
135
+ @misc{guo2025gve,
136
+ title={Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum},
137
+ author={Zhuoning Guo and Mingxin Li and Yanzhao Zhang and Dingkun Long and Pengjun Xie and Xiaowen Chu},
138
+ year={2025},
139
+ eprint={2510.27571},
140
+ archivePrefix={arXiv},
141
+ primaryClass={cs.CV},
142
+ url={https://arxiv.org/abs/2510.27571},
143
  }
144
+ ```