産学研オープンソースエコシステムに戻る Distributed training framework Actively maintained

プロジェクト情報

Colossal-AI

Distributed deep-learning training framework optimised for efficient large-model training

GitHub stars
41.3k+
Core use case
large-model training
Form
training system
機関
NUS HPC-AI Lab
グループ
University / research
カテゴリー
Distributed training framework
ステータス
Actively maintained
ローンチ
2021-10
言語 / 形態
Python
ライセンス
Apache-2.0
GitHub Stars
41,376
情報更新
2026-05-04

Colossal-AI is one of the most globally visible open-source projects from Singapore’s university ecosystem: it tackles memory, parallelism, and cost problems in large-model training.

説明

Colossal-AI is a distributed AI training system. Developers use it for tensor parallelism, pipeline parallelism, ZeRO, heterogeneous memory management, and large-model inference optimization, splitting workloads that would overwhelm a single machine across multi-GPU and multi-node environments.

It was incubated by the NUS HPC-AI Lab and later grew into a global open-source engineering project.

AIとの関係

Large-model competition is not only about model weights; it is also about training systems. Colossal-AI turns "can we afford to train this" into an engineering problem: reduce memory pressure, improve throughput, and bring large-model training closer to research teams and smaller companies.

This kind of infrastructure may not face end users directly, but it affects the cost curve of model development.

シンガポールとの関係

Colossal-AI shows that Singapore’s universities are not limited to applied AI; they can also have a presence in global AI infrastructure. It complements model projects such as SEA-LION: one addresses training systems, the other regional model supply.

For sgai.md, it is a long-running sample of whether Singapore can export general AI engineering infrastructure.

重要マイルストーン

  1. 2021-10
    Colossal-AI repository created
  2. 2023-2024
    Moves into the mainstream LLM training-tool conversation

リソース入口

その他の産学研プロジェクト