Back to Community Open Source Vision-language foundation models Classic open-source asset

Community Project Profile

LAVIS / BLIP

Vision-language foundation models and a one-stop library; a cornerstone of global image-text AI

GitHub stars
11.2k+
Flagship models
BLIP / BLIP-2
Direction
image-text AI
Organisation
Salesforce AI Research Singapore
Group
International corporate lab
Category
Vision-language foundation models
Status
Classic open-source asset
Started
2022-08
Language / Form
Python / Jupyter Notebook
License
BSD-3-Clause
GitHub Stars
11,214
Updated
2026-05-04

LAVIS / BLIP is a major contribution from Salesforce’s Singapore research team to global vision-language AI, making image-text understanding, captioning, VQA, and multimodal pretraining more reusable through open source.

What It Is

LAVIS stands for Library for Language-Vision Intelligence, a unified library for vision-language research and applications. BLIP and BLIP-2 are the most influential model lines within that family.

Developers can use it to load pretrained models for captioning, visual question answering, image-text retrieval, multimodal alignment, and related tasks.

AI Relevance

The BLIP family is one of the base components of multimodal AI. Many later vision-language models, data-generation pipelines, and image-text alignment studies are directly or indirectly influenced by it.

Its value is not only high citation count, but reusable code and models that lower the entry barrier for later research.

Singapore Relevance

Salesforce’s Singapore lab proves that international corporate research teams in Singapore are not merely sales or regional offices; they can produce global foundation research.

This is an important but often underestimated layer of Singapore’s AI ecosystem: multinational labs connect local talent, global research networks, and open-source influence.

Milestones

  1. 2022
    BLIP paper published at ICML 2022
  2. 2023
    BLIP-2 paper published at ICML 2023

Resources

More Community Projects