Alibaba-NLP
/

GVE-3B

Sentence Similarity

feature-extraction

text-generation-inference

Model card Files Files and versions

Zhuoning commited on 10 days ago

Commit

2c2962d

·

verified ·

1 Parent(s): 2e31242

Update README.md

Files changed (1) hide show

README.md +35 -10

README.md CHANGED Viewed

@@ -1,3 +1,26 @@
 # 🎯 General Video Embedder (GVE)
 > **One Embedder for All Video Retrieval Scenarios**
@@ -60,11 +83,9 @@ Built on **Qwen2.5-VL** and trained only with LoRA with **13M** collected and sy
 1. Loading model
 ```python
-from transformers import AutoModel, AutoProcessor
-model_path = '.'
-model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map='auto', low_cpu_mem_usage=True, torch_dtype='bfloat16')
-processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True, add_eos_token=True)
 processor.tokenizer.padding_side = 'left'
 ```
@@ -111,9 +132,13 @@ embedding = F.normalize(outputs['last_hidden_state'][:, -1, :], p=2, dim=1)
 ## 📚 Citation
 ```bibtex
-@inproceedings{guo2025general-video-embedding,
-  title={Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum},
-  author={Zhuoning Guo, Mingxin Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie and Xiaowen Chu},
-  year={2025}
 }
-```

+---
+language: en
+license: apache-2.0
+library_name: transformers
+tags:
+- pytorch
+- video
+- retrieval
+- embedding
+- multimodal
+- qwen2.5-vl
+pipeline_tag: sentence-similarity
+datasets:
+- Alibaba-NLP/UVRB
+- Vividbot/vast-2m-vi
+- TempoFunk/webvid-10M
+- OpenGVLab/InternVid
+metrics:
+- recall
+base_model:
+- Qwen/Qwen2.5-VL-3B-Instruct
+---
 # 🎯 General Video Embedder (GVE)
 > **One Embedder for All Video Retrieval Scenarios**
 1. Loading model
 ```python
+model_path = 'Alibaba-NLP/GVE-3B'
+model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map='auto', low_cpu_mem_usage=True, torch_dtype=torch.bfloat16)
+processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True, use_fast=True)
 processor.tokenizer.padding_side = 'left'
 ```
 ## 📚 Citation
 ```bibtex
+@misc{guo2025gve,
+  title={Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum},
+  author={Zhuoning Guo and Mingxin Li and Yanzhao Zhang and Dingkun Long and Pengjun Xie and Xiaowen Chu},
+  year={2025},
+  eprint={2510.27571},
+  archivePrefix={arXiv},
+  primaryClass={cs.CV},
+  url={https://arxiv.org/abs/2510.27571},
 }
+```