In our latest paper we shoa that a GAN loss (used by almost all latent diffusion models) to train their autoencoders is not required and instead can be replaced with a diffusion loss. Our auto-encoder is trained end-to-end and achieves higher compression and better generation quality.
I am excited to share it with you. Let me know what you think.
Cheers
it seems to be another autoencoder(autoregressive) + diffusion.
The difference is they use discrete tokens.