CLIP ViT-B/16 Model
This is a CLIP (Contrastive Language-Image Pre-training) model trained on DataComp-12M dataset.
Training Details
- Architecture: ViT-B/16
- Dataset: DataComp-12M
- Batch size per GPU: 512
- Number of GPUs: 4
- Total epochs: 20
- Precision: amp
- Total training samples: 203,020,740
Training Command
torchrun --nproc_per_node 4 -m open_clip_train.main \
--save-frequency 1 \
--train-data '/pasteur2/u/yuhuiz/yiming/datacomp_12m/processed_dataset/{00000000..00001023}.tar' \
--train-num-samples $((203020740 / 20)) \
--local-loss \
--gather-with-grad \
--warmup 1000 \
--dataset-type webdataset \
--batch-size 512 \
--epochs 20 \
--model ViT-B-16 \
--precision amp \
--seed 0 \
--workers 4
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support