TECHNOLOGY, INTERNET TRENDS, GAMING

Apple launches “MGIE”: revolution in AI-powered image editing’and

Apple launches “MGIE”: revolution in AI-powered image editing’and

By auroraoddi

Apple has announced the launch of a new artificial intelligence model called “MGIE,” which stands for MLLM-Guided Image Editing. This model was developed in collaboration with researchers at the University of California, Santa Barbara, and was presented in a paper accepted at the International Conference on Representation Learning (ICLR) 2024, a major event in the field of AI research. MGIE is an open-source artificial intelligence model that allows images to be modified based on instructions written in natural language.

How does MGIE work?

MGIE is based on the concept of using large-scale multimodal language models (MLLMs) to improve instruction-based image editing. MLLMs are powerful artificial intelligence models that can process both text and images and have demonstrated remarkable capabilities in cross-modal comprehension and generation of visually-aware responses. However, they had not been widely applied to image editing tasks.

MGIE integrates MLLMs into the image editing process in two main ways:

  1. It derives expressive instructions from user input using MLLMs. These instructions are concise and clear, providing explicit guidance for the editing process. For example, the user can provide the instruction “make the sky bluer,” and MGIE can produce the instruction “increase the saturation of the sky region by 20%.”

  2. It generates a visual imagination using MLLMs. This latent representation of the desired change captures the essence of the change and can be used to guide manipulation at the pixel level. MGIE uses an innovative end-to-end training scheme that jointly optimizes instruction derivation, visual imagination, and image modification modules.

Functionality of MGIE

MGIE can handle a wide range of editing scenarios, from simple color adjustments to complex object manipulations. The model can perform both global and local edits, depending on the user’s preferences. Some of MGIE’s key features include:

  1. Expressive instruction-based editing: MGIE can produce concise and clear instructions that effectively guide the editing process. This not only improves the quality of edits, but also enhances the overall user experience.

  2. Photoshop-style edits: MGIE can perform common Photoshop-style edits, such as cropping, resizing, rotating, inverting, and applying filters. The model can also apply more advanced edits, such as changing the background, adding or removing objects, and merging images.

  3. Global photo optimization: MGIE can optimize the overall quality of a photo by adjusting brightness, contrast, sharpness and color balance. The model can also apply artistic effects such as hand drawing, painting and cartoon creation.

  4. Local modifications: MGIE can modify specific regions or objects in an image, such as faces, eyes, hair, clothing and accessories. The model can also modify the attributes of these regions or objects, such as shape, size, color, texture and style.

How to use MGIE

MGIE is available as an open-source project on GitHub, where users can find code, data and pre-trained models. The project also provides a demo notebook that illustrates how to use MGIE for various editing tasks. Users can also try MGIE online via a web demo hosted on Hugging Face Spaces, a platform for sharing and collaborating on machine learning(ML) projects.

MGIE is designed to be easy to use and flexible to customize. Users can provide natural language instructions to modify images, and MGIE will generate the modified images along with the derived instructions. Users can also provide feedback to MGIE to refine the changes or request different changes. MGIE can also be integrated with other applications or platforms that require image editing capabilities.

Importance of MGIE

MGIE represents a breakthrough in instruction-based image editing, a major challenge for both AI and human creativity. MGIE demonstrates the potential of using MLLMs to improve image editing and opens up new possibilities for cross-modal interaction and communication.

MGIE is not only a research result, but also a practical and useful tool for various scenarios. MGIE can help users create, edit, and optimize images for personal or professional purposes, such as social media, e-commerce, education, entertainment, and art. MGIE can also provide users with the tools to express their ideas and emotions through images and inspire them to explore their creativity.

The Future of MGIE

For Apple, MGIE also highlights the company’s growing expertise in AI research and development. The consumer technology giant has rapidly expanded its machine learning capabilities in recent years, and MGIE represents perhaps the most impressive demonstration of how AI can enhance everyday creative activities.

Despite MGIE’s success, experts say there is still much work to be done to improve multimodal AI systems. However, the pace of progress in this field is accelerating rapidly. If the enthusiasm around the launch of MGIE is any indication, this type of assistive AI may soon become an indispensable ally for creativity.

Discover more from Syrus

Subscribe now to keep reading and get access to the full archive.

Continue reading

Discover more from Syrus

Subscribe now to keep reading and get access to the full archive.

Continue reading