産学研オープンソースエコシステムに戻る Any-to-any multimodal model Research open source

プロジェクト情報

NExT-GPT

An any-to-any multimodal LLM across text, image, video, and audio

GitHub stars
3.6k+
Paper
ICML 2024
Modalities
text / image / video / audio
機関
NUS NExT++ Research Center
グループ
University / research
カテゴリー
Any-to-any multimodal model
ステータス
Research open source
ローンチ
2023-08
言語 / 形態
Python
ライセンス
BSD-3-Clause
GitHub Stars
3,621
情報更新
2026-05-04

NExT-GPT is a representative NUS multimodal LLM project, aiming to let one system understand and generate across text, image, video, and audio.

説明

NExT-GPT uses a large language model as the hub, connecting encoders and generators for different modalities. A user can input text, images, videos, or audio, and the system can output another modality or multiple modalities.

Its point is pushing multimodality beyond image-text question answering toward fuller any-to-any conversion.

AIとの関係

Multimodality is one of the core directions for the next stage of large models. NExT-GPT explores the orchestration problem early: how specialized models can coordinate around an LLM instead of retraining one giant model for every input-output pairing.

That path matters for research and gives application builders a composable architecture reference.

シンガポールとの関係

NExT-GPT shows that NUS has globally visible work in multimodal foundation-model research. It is not a local Singapore application project, but a sample of Singapore academia participating in global model-paradigm competition.

This page is a place to keep adding citations, follow-on models, industrial translation, and links to other NUS multimodal teams.

重要マイルストーン

  1. 2023-08
    NExT-GPT repository released
  2. 2024
    Paper published at ICML 2024

リソース入口

その他の産学研プロジェクト