Disentangling Structure and Appearance in ViT Feature Space

Supplementary Material

SpliceNet Results
SpliceNet Comparisons
- SD-Dogs
- SD-Horses
- Oxford-102
- AFHQ
SpliceNet Ablations

We recommend watching all images in full screen. Click on the images for seeing them in full scale.

SpliceNet Results

We present SpliceNet results of semantic appearance transfer on a variety of structure and appearance image pairs.

appearance	structure	SpliceNet (Ours)

appearance	structure	SpliceNet (Ours)

appearance	structure	SpliceNet (Ours)

Animal Faces

Flowers

SD-Dogs

SD-Horses

Appearance Interpolation

We can control the extent of stylization by feeding to our model interpolating the [CLS] tokens of the appearance and structure images. (See Sec. 4.5)

Animations appear in the following section.

Detecting and Visualizing Appearance Modes.

Appearance modes are automatically detected by clustering the [CLS] token across all AFHQ training set. We transfer each of the discovered appearance modes to test structure images. See Sec. 4.5.

SpliceNet Video Stylization

Given a video and an appearance image as input, we apply SpliceNet on each frame separately to achieve a stylized video.