About Past Issues Editorial Board


Research Webzine of the KAIST College of Engineering since 2014

Fall 2023 Vol. 21

Discriminator Guidance for Diffusion Model in Image Generation

August 23, 2023   hit 323

 An improved discriminator guidance is provided to enhance the image generation quality of pre-trained diffusion models. This guidance constitutes a new diffusion model in combination of prior diffusion training and discriminator response.

 Figure 1. Three diffusion models (a) Pre-trained diffusion model, (b) Diffusion model with classifier guidance, (c) Discriminator-guided diffusion model


The diffusion model has recently been highlighted for its success in image generation, video generation, and text-to-image generation. The state-of-the-art models perform human-level generation, but there is still much more room to be investigated for a deeper understanding of diffusion models. The generative model community widely uses well-trained score models in down-stream tasks. This is partially because training a new score model from scratch can be computationally expensive. However, as the demand for reusing pre-trained models increases, there have been only a few studies that focus on improving sample quality with a pretrained score model.


The proposed method, Discriminator Guidance, aims to improve sample generation of pre-trained diffusion models. The approach introduces a discriminator that gives explicit supervision to a de-noising sample path whether it is realistic or not. Unlike GANs, the proposed approach does not require joint training of score and discriminator networks. Instead, the discriminator is trained after score training, making discriminator training stable and fast to converge. In sample generation, this method adds an auxiliary term to the pre-trained score to deceive the discriminator. This term corrects the model score to the data score at the optimal discriminator, which implies that the discriminator helps better score estimation in a complementary way. Using the new algorithm, the proposed model achieves state-of-the-art results on ImageNet 256x256 with an FID of 1.83 and a recall of 0.64, similar to the validation data’s FID (1.68) and recall (0.66).


Figure 2. Comparison of the denoising processes. Discriminator Guidance adjusts the score function by estimating the gap cφ between the predicted model score and the true data score. As a result, the sample generated using Discriminator Guidance is indistinguishable from real data according to the discriminator.