️ CoNeRF: Controllable Neural Radiance Fields

Supplementary Material

Kacper Kania^1,2

Kwang Moo Yi¹

Marek Kowalski^2,4

Tomasz Trzciński²

Andrea Tagliasacchi^3,5

University of British Columbia¹ Warsaw University of Technology² University of Toronto³ Microsoft⁴ Google Research⁵

CoNeRF is the first method that enables explicit control of generated images. We can easily synchronize metronomes beating with different tempo, stabilize camera, change our facial expression and much more!

Generated sequences

We present generated video sequences from models trained on datasets used in the paper. We directly compare visualizations generated from our method with baslines: Ours-$\mathcal{M}$ and HyperNeRF+$\pi$. We generate each sequence by interpolating between extreme points of attributes ($-1$ and $+1$) and then between randomly sampled values. And the same time, we freely orbit the camera around the central object. Our method generates the most realistic images while providing the expected controllability of the output.

Face Expressions I

Face Expressions II

Face Expressions III

Metronome

Transformer

Two Metronomes

Synthetic

Resynchronization

Our approach also enables resynchronization of metronomes beating with different rates while keeping the original camera motion.

Original Video

0 Beats Per Minute

20 Beats Per Minute

120 Beats Per Minute

Synthetic Data

We additionally show below two sequences generated with Kubric [13], used during the evaluation of our method and baselines for novel view and novel attributes synthesis.

Training sequence

Validation sequence

Abstract

We extend neural 3D representations to allow for intuitive and interpretable user control beyond novel view rendering (i.e. camera control). We allow the user to annotate which part of the scene one wishes to control with just a small number of mask annotations in the training images. Our key idea is to treat the attributes as latent variables that are regressed by the neural network given the scene encoding. This leads to a few-shot learning framework, where attributes are discovered automatically by the framework, when annotations are not provided. We apply our method to various scenes with different types of controllable attributes (e.g. expression control on human faces, or state control in movement of inanimate objects). Overall, we demonstrate, to the best of our knowledge, for the first time novel view and novel attribute re-rendering of scenes from a single video.

Bibtex

@inproceedings{kania2022conerf,
    title     = {{CoNeRF: Controllable Neural Radiance Fields}},
    author    = {Kania, Kacper and Yi, Kwang Moo and Kowalski, Marek and Trzci{\'n}ski, Tomasz and Tagliasacchi, Andrea},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    year      = {2022}
}

Acknowledgements

We thank Thabo Beeler, JP Lewis, and Mark J. Matthews for their fruitful discussions, and Daniel Rebain for helping with processing the synthetic dataset. The work was partly supported by National Sciences and Engineering Research Council of Canada (NSERC), Compute Canada, and Microsoft Mixed Reality & AI Lab.