Back to Community Open Source Multimodal understanding and generation model Active research line

Community Project Profile

Show-o

A single-Transformer model for unified multimodal understanding and generation

GitHub stars
1.9k+
Papers
ICLR / NeurIPS
Core capability
understanding + generation
Organisation
NUS Show Lab
Group
University / research
Category
Multimodal understanding and generation model
Status
Active research line
Started
2024-08
Language / Form
Python / Models
License
Apache-2.0
GitHub Stars
1,923
Updated
2026-05-04

Show-o is a multimodal foundation-model line from NUS Show Lab: one Transformer handles both image understanding and image generation instead of splitting the two capabilities into separate systems.

What It Is

Show-o aims to unify multimodal understanding and generation. It places visual understanding, text-conditioned generation, image generation, and related capabilities inside one model framework, reducing the split between "understanding models" and "generation models."

Show Lab later continued this line with Show-o2, extending the approach toward stronger generation and understanding.

AI Relevance

Multimodal models are moving from stitched systems toward unified architectures. Show-o’s question is direct: if one model can both understand and generate images, many interactive design, editing, visual QA, and content-production workflows become more natural.

That makes it an important direction in open multimodal research.

Singapore Relevance

Show-o places NUS Show Lab on the global map of open multimodal research. For Singapore, it is a sample of a university lab exporting frontier models, not a government programme or enterprise application.

Future tracking should cover Show Lab’s model series, paper acceptances, Hugging Face usage, and whether the work turns into production tools.

Milestones

  1. 2024-08
    Show-o repository created
  2. 2025-01
    Show-o accepted to ICLR 2025
  3. 2025-09
    Show-o2 accepted to NeurIPS 2025

Resources

More Community Projects