Disentangling Structure and Appearance in ViT Feature Space

Supplementary Material

SpliceNet Ablations


 


Inlier / Outlier Examples

We present examples of inliers and outliers acquired using our pairing method (Sec. 3.5 in the paper).

Dogs

Source Image

Inliers

Outliers

Rejected images:

Horses

Source Image

Inliers

Outliers

Rejected images:


Pairing Ablation

We show results generated by SpliceNet with (i) training with dataset distillation, (ii) training without dataset distillation. Evidently, the model manages to transfer semantic regions in a more coherent manner when trained with our distillation method.

Appearance

Structure

SpliceNet w/ pairing

SpliceNet w/o pairing

Appearance

Structure

SpliceNet w/ pairing

SpliceNet w/o pairing

Appearance

Structure

SpliceNet w/ pairing

SpliceNet w/o pairing

Appearance

Structure

SpliceNet w/ pairing

SpliceNet w/o pairing

Appearance

Structure

SpliceNet w/ pairing

SpliceNet w/o pairing

Appearance

Structure

SpliceNet w/ pairing

SpliceNet w/o pairing

Appearance

Structure

SpliceNet w/ pairing

SpliceNet w/o pairing

Appearance

Structure

SpliceNet w/ pairing

SpliceNet w/o pairing


CNN Baselines

We show results generated by SpliceNet with (i) recieving the [CLS] as input (ii) receiving the apearance image as input (i.e. CNN baseline). Evidently, the model conditioned on the [CLS] token manages to transfer more complex texture (e.g. fur, different colors in different parts) than the CNN baseline.

Appearance

Structure

SpliceNet

SpliceNet CNN Baseline

Appearance

Structure

SpliceNet

SpliceNet CNN Baseline

Appearance

Structure

SpliceNet

SpliceNet CNN Baseline

Appearance

Structure

SpliceNet

SpliceNet CNN Baseline

Appearance

Structure

SpliceNet

SpliceNet CNN Baseline

Appearance

Structure

SpliceNet

SpliceNet CNN Baseline

Appearance

Structure

SpliceNet

SpliceNet CNN Baseline

Appearance

Structure

SpliceNet

SpliceNet CNN Baseline

Appearance

Structure

SpliceNet

SpliceNet CNN Baseline