AI revolution: image quality in a flash with MIT’s new method

April 8, 2024
Artificial intelligence is revolutionizing the way high-quality images are generated. Thanks to new developments in the field of diffusion models, AI algorithms can now generate images of comparable quality to those obtained by traditional methods, but in a much shorter time. In this paper, we will explore a new approach introduced by the Massachusetts Institute of Technology(MIT) that simplifies the image generation process, reducing the time required and maintaining or improving the quality of the generated images.
The evolution of diffusion models
In the current era ofartificial intelligence, computers can generate “art” using diffusion models. These models progressively add structure to a noisy initial state until a crisp image or video is obtained. However, traditional diffusion models require a complex and time-intensive process, with numerous iterations to refine the image.
To address these limitations, MIT researchers introduced a new framework called Distribution Matching Distillation(DMD). This framework simplifies the image generation process, reducing the steps required by traditional diffusion models to a single step. The result is a significant increase in the speed of image generation, up to 30 times faster, while maintaining or exceeding the quality of the generated images.
The MIT method and advantages
The DMD method is based on a teacher-student model. Basically, a new computer model is taught to mimic the behavior of more complex image-generating models. This is done through the use of regression loss, which ensures an approximate structure of the generated images, and distribution matching loss, which ensures that the probability of generating a specific image with the student model matches its frequency of occurrence in the real world.
The DMD system achieves faster generation by training a new network to reduce the distribution divergence between the generated images and those in the training dataset used by traditional diffusion models. This is achieved by using two diffusion models as guides, which help the system distinguish between real and generated images and make it possible to train the generator in a single step.
The one-step image generation approach offered by the DMD framework could have numerous applications and advantages. For example, it could improve design tools by enabling faster content creation. In addition, it could support advances in drug discovery and 3D modeling, where timeliness and efficiency are critical.
Results and benchmarks
The MIT method was tested on several benchmarks and showed consistent performance. For example, on ImageNet, one of the most popular benchmarks for generating images based on specific classes, DMD performed comparable to more complex models, with a Fréchet inception distance (FID) score of only 0.3. This score indicates the quality and diversity of the images generated. In addition, DMD excels in large-scale text-based image generation and achieves state-of-the-art one-step generation performance.
Limitations and future developments
It is important to note that the performance of images generated by the MIT method depends on the capabilities of the teacher model used during the distillation process. Currently, the system uses Stable Diffusion v1.5 as the teacher model and has some limitations, such as rendering detailed text portraits and small faces. However, the images generated by the DMD system can be further improved by using more advanced teacher models.
The future of image generation
The generation of high-quality images in one step is a major breakthrough in artificial intelligence. Thanks to the DMD framework introduced by MIT, it is possible to generate images with greatly reduced computation time while maintaining or improving image quality. This could pave the way for new applications and possibilities in content design, drug discovery, and many other areas.