Clarification on DINOv2 Distillation in Video DC-AE Checkpoint

#2
by jmkim0309 - opened

Hello,

In the technical report, it is mentioned that the video DC-AE model is further tuned using DINOv2 distillation loss after the initial training.
However, I noticed that the distillation step is not included in the training code available on the OpenSora GitHub repository.

This leads me to some confusion β€” is the released checkpoint (F32T4C128_AE.safetensors) the distilled version of the model, or not?

Could you please clarify this?

Thank you in advance!

Sign up or log in to comment