Abstract:
Video prediction models often combine three components: an encoder from pixel space to a small latent space, a latent space prediction model, and a generative model back to pixel space. However, the large and unpredictable pixel space makes training such models difficult, requiring many training examples. We argue that finding a predictive latent variable and using it to evaluate the consistency of a future image enables data-efficient predictions because it precludes the necessity of a generative model training. To demonstrate it, we created sequence completion intelligence tests in which the task is to identify a predictably changing feature in a sequence of images and use this prediction to select the subsequent image. We show that a one-dimensional Markov Contrastive Predictive Coding (M-CPC
1D) model solves these tests efficiently, with only five examples. Finally, we demonstrate the usefulness of M-CPC
1D in solving two tasks without prior training: anomaly detection and stochastic movement video prediction.
Publisher's Version