Dark Web

Don't forget to turn on your firewall!

Video prediction

Action-Conditional Video Predictions article
Unsupervised Learning of Video Representations using LSTMs article
Deep predictive coding networks for video prediction and unsupervised learning article
Hierarchical Long-term Video Prediction without Supervision article

GAN

Generating Videos with Scene Dynamics article(VGAN)
Adversarial Video Generation on Complex Datasets article(DVD-GAN)

Generative Temporal Models

generative temporal models

Variational RNNs

A Recurrent Latent Variable Model for Sequential Data article
Generative Temporal Models with (external) Memory article
Improved Conditional VRNNs for Video Prediction article
- hierarchical

Autoregressive dynamics

feed back predicted images during prediction
- computationally expensive
- strongly tie frame synthesis and temporal models
- distributional shift (during training the input images come from the dataset) article

Autoregressive in space and time:

Video Pixel Networks article(VPN)
Scaling Autoregressive Video Models (with Transformer) article

Autoregressive in time, latent variable for space:
(pixels of each image can be predicted in parallel)

Stochastic Variational Video Prediction article (SV2P)
Stochastic Video Generation with a Learned Prior article (SVG-LP)
Stochastic Adversarial Video Prediction article(SAVP)

Latent dynamics

= State-space models
predict a sequence of latent states forward then decode into the video
learn all dependencies via latent variables without autoregressive conditioning
no need to feed back images -> efficiency
early examples: Kalman filters article video, Markovian models article

![[/assets/latent_dynamics.png]]

Deep Kalman filters article
- counterfactual inference
Deep Variational Bayes filters article
Learning Latent Dynamics for Planning from Pixels article(RSSM)
Stochastic Latent Residual Video Prediction articlecode(SLRVP)
- residual: ODE Euler discretization
- content variable: disentangling dynamics and content
Videoflow: A conditional flow-based model for stochastic video generation article
hierarchical latents:
- learn to separate high-level details from low-level details
temporally abstract latent dynamics models:
- predict learned features at a slower frequency than the input sequence
- VTVAE two levels the fast states decide when the slow states tick
- Temporal Difference VAE jumpy predictions, without a hierarchy
VDSM: Unsupervised Video Disentanglement with State-Space Modeling and Deep Mixtures of Experts link
- uses inductive bias (architecture) to learn disentangled representations
- both a varying and constant variable over the sequence
Clockwork Variational Autoencoder articlecode
- hierarchy of latent sequences, where higher levels tick at slower intervals
- accurate long-term predictions
- automatically adapts to the frame rate of the dataset
- slower levels represent content that changes more slowly in the input, such as the wall colors of a maze, while faster levels represent faster changing concepts, such as the camera position
Dynamic Predictive Coding: A New Model of Hierarchical Sequence Learning and Prediction in the Cortex link
The Sparse Manifold Transform link video
Minimalistic Unsupervised Learning with the Sparse Manifold Transform link
Structured State-space models (S4) video annotated article

Sparse Manifold Transform

sparse coding: unsupervised learning algorithm video
- manifold transform (+ slow feature analysis)

lift the raw signal x into a high-dimensional space by a non-linear sparse feature function f, i.e. α = f(x), where locality and decomposition are respected
- sparse coding/ vector quantization
- variational approach: parameterize a Laplace distribution
linearly embed the sparse feature α into a low-dimensional space

td-vae
- konvolucioval
- gabor filterek?