Community Project Profile
Colossal-AI
Distributed deep-learning training framework optimised for efficient large-model training
- Organisation
- NUS HPC-AI Lab
- Group
- University / research
- Category
- Distributed training framework
- Status
- Actively maintained
- Started
- 2021-10
- Language / Form
- Python
- License
- Apache-2.0
- GitHub Stars
- 41,376
- Updated
- 2026-05-04
Colossal-AI is one of the most globally visible open-source projects from Singapore’s university ecosystem: it tackles memory, parallelism, and cost problems in large-model training.
What It Is
Colossal-AI is a distributed AI training system. Developers use it for tensor parallelism, pipeline parallelism, ZeRO, heterogeneous memory management, and large-model inference optimization, splitting workloads that would overwhelm a single machine across multi-GPU and multi-node environments.
It was incubated by the NUS HPC-AI Lab and later grew into a global open-source engineering project.
AI Relevance
Large-model competition is not only about model weights; it is also about training systems. Colossal-AI turns "can we afford to train this" into an engineering problem: reduce memory pressure, improve throughput, and bring large-model training closer to research teams and smaller companies.
This kind of infrastructure may not face end users directly, but it affects the cost curve of model development.
Singapore Relevance
Colossal-AI shows that Singapore’s universities are not limited to applied AI; they can also have a presence in global AI infrastructure. It complements model projects such as SEA-LION: one addresses training systems, the other regional model supply.
For sgai.md, it is a long-running sample of whether Singapore can export general AI engineering infrastructure.
Milestones
- 2021-10Colossal-AI repository created
- 2023-2024Moves into the mainstream LLM training-tool conversation