Disentangling Structure and Appearance in ViT Feature Space

Supplementary Material

 


We recommend watching all images in full screen. Click on the images for seeing them in full scale.

 


 

SpliceNet Results

We present SpliceNet results of semantic appearance transfer on a variety of structure and appearance image pairs.

appearance

structure

SpliceNet (Ours)

appearance

structure

SpliceNet (Ours)

appearance

structure

SpliceNet (Ours)




Animal Faces


Flowers


SD-Dogs


SD-Horses

Appearance Interpolation

We can control the extent of stylization by feeding to our model interpolating the [CLS] tokens of the appearance and structure images. (See Sec. 4.5)

Animations appear in the following section.


Detecting and Visualizing Appearance Modes.

Appearance modes are automatically detected by clustering the [CLS] token across all AFHQ training set. We transfer each of the discovered appearance modes to test structure images. See Sec. 4.5.


SpliceNet Video Stylization

Given a video and an appearance image as input, we apply SpliceNet on each frame separately to achieve a stylized video.