Clarification on DINOv2 Distillation in Video DC-AE Checkpoint

by jmkim0309 - opened May 13

May 13

Hello,

In the technical report, it is mentioned that the video DC-AE model is further tuned using DINOv2 distillation loss after the initial training.
However, I noticed that the distillation step is not included in the training code available on the OpenSora GitHub repository.

This leads me to some confusion — is the released checkpoint (F32T4C128_AE.safetensors) the distilled version of the model, or not?

Could you please clarify this?

Thank you in advance!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment