ViT Features Visualization |
||||||||||||||||||||||||||||||||
We show feature inversion and PCA self-similarity visualizations for supervised ViT [8] on the same examples included in the paper (Fig. 3-5) for DINO-ViT [4]. As can be seen, the features contain detailed information of the input image, however the singal is noisier in supervised ViT compared to DINO-ViT, as discussed in Sec.3.1.
|
||||||||||||||||||||||||||||||||
Supervised ViT Features Visualization |
||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
DINO-ViT Feature Inversion w/o Image Prior |
||||||||||||||||||||||||||||||||
As discussed in Sec. 3.2, feature inversion w/o any regularization, i.e., optimizing directly the image pixels, is insufficient for converging into a meaningful result.
|
||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||