NExT-GPT

An any-to-any multimodal LLM across text, image, video, and audio

GitHub stars

3.6k+

Paper

ICML 2024

Modalities

text / image / video / audio

機関: NUS NExT++ Research Center
グループ: University / research
カテゴリー: Any-to-any multimodal model
ステータス: Research open source
ローンチ: 2023-08
言語 / 形態: Python
ライセンス: BSD-3-Clause
GitHub Stars: 3,621
情報更新: 2026-05-04

NExT-GPT is a representative NUS multimodal LLM project, aiming to let one system understand and generate across text, image, video, and audio.

説明

NExT-GPT uses a large language model as the hub, connecting encoders and generators for different modalities. A user can input text, images, videos, or audio, and the system can output another modality or multiple modalities.

Its point is pushing multimodality beyond image-text question answering toward fuller any-to-any conversion.

AIとの関係

Multimodality is one of the core directions for the next stage of large models. NExT-GPT explores the orchestration problem early: how specialized models can coordinate around an LLM instead of retraining one giant model for every input-output pairing.

That path matters for research and gives application builders a composable architecture reference.

シンガポールとの関係

NExT-GPT shows that NUS has globally visible work in multimodal foundation-model research. It is not a local Singapore application project, but a sample of Singapore academia participating in global model-paradigm competition.

This page is a place to keep adding citations, follow-on models, industrial translation, and links to other NUS multimodal teams.

重要マイルストーン

2023-08
NExT-GPT repository released
2024
Paper published at ICML 2024

リソース入口

NExT-GPT on GitHub NExT-GPT project page

その他の産学研プロジェクト

NUS HPC-AI Lab

NExT-GPT

説明

AIとの関係

シンガポールとの関係

重要マイルストーン

リソース入口

その他の産学研プロジェクト

Colossal-AI

OpenMMLab

Show-o

ShowUI