image-captioning

Here are 1,131 public repositories matching this topic...

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

Updated Nov 18, 2024
Jupyter Notebook

salesforce / BLIP

Star

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

image-captioning visual-reasoning visual-question-answering vision-language vision-language-transformer image-text-retrieval vision-and-language-pre-training

Updated Mar 3, 2026
Jupyter Notebook

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Updated Aug 20, 2024
Python

sgrvinod / a-PyTorch-Tutorial-to-Image-Captioning

Star

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

computer-vision pytorch image-captioning show-attend-and-tell attention-mechanism encoder-decoder pytorch-tutorial mscoco

Updated Jul 28, 2022
Python

OFA-Sys / OFA

Star

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

prompt chinese image-captioning pretrained-models visual-question-answering multimodal text-to-image-synthesis vision-language pretraining referring-expression-comprehension prompt-tuning

Updated Apr 24, 2024
Python

ttengwang / Caption-Anything

Star

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

image-captioning controllable-image-captioning controllable-generation chatgpt segment-anything

Updated Aug 29, 2023
Python

peteanderson80 / bottom-up-attention

Star

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

caffe vqa faster-rcnn image-captioning captioning-images mscoco mscoco-dataset visual-question-answering

Updated Feb 3, 2023
Jupyter Notebook

imaginary-cloud / CameraManager

Star

Simple Swift class to provide all the configurations you need to create custom camera view in your app

swift ios camera cocoapods carthage swift-package-manager video-recording custom-camera image-captioning qrcode-reader

Updated Jul 19, 2024
Swift

jhc13 / taggui

Star

Tag manager and captioner for image datasets

image-captioning image-tagging tag-manager pyside6 stable-diffusion llava cogvlm florence-2

Updated Oct 11, 2025
Python

NVlabs / prismer

Star

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

vqa image-captioning language-model multi-task-learning vision-and-language multi-modal-learning vision-language-model

Updated Jan 17, 2024
Python

microsoft / Oscar

Star

Oscar and VinVL

vqa image-captioning oscar vision-and-language pre-training image-text-search vinvl

Updated Aug 28, 2023
Python

ruotianluo / self-critical.pytorch

Star

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

image-captioning

Updated Oct 5, 2023
Python

YehLi / xmodaler

Star

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

image-captioning video-captioning visual-question-answering vision-and-language cross-modal-retrieval pretraining tden

Updated Feb 27, 2023
Python

cuixing158 / Awesome-CV-MasterHub

Star

🔥 🔥 🔥 A paper list of some recent Computer Vision(CV) works

Updated Apr 30, 2026

yunjey / show-attend-and-tell

Star

TensorFlow Implementation of "Show, Attend and Tell"

tensorflow image-captioning show-attend-and-tell attention-mechanism mscoco-image-dataset

Updated Jul 28, 2018
Jupyter Notebook

AkagawaTsurunaki / ZerolanLiveRobot

Star

AI VTuber with LLM, ASR, TTS, OCR, CV and more technologies to live stream or play Minecraft with you.

minecraft ocr ai cv tts image-captioning bilibili asr video-captioning llm ai-vtuber

Updated Apr 14, 2026
Python

SkalskiP / awesome-foundation-and-multimodal-models

Star

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

nlp computer-vision image-captioning clip blip multimodal zero-shot-detection foundational-models llava segment-anything open-vocabulary-detection open-vocabulary-segmentation grounding-dino

Updated Feb 29, 2024
Python

kuanghuei / SCAN

Star

PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

computer-vision deep-learning neural-network pytorch image-captioning cross-modal visual-semantic

Updated May 18, 2023
Python

gokayfem / ComfyUI_VLM_nodes

Star

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Jan 11, 2026
Python

kdexd / virtex

Star

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

model-zoo image-captioning pretrained-models coco-dataset cvpr2021

Updated Aug 22, 2025
Python

Improve this page

Add a description, image, and links to the image-captioning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the image-captioning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image-captioning

Here are 1,131 public repositories matching this topic...

salesforce / LAVIS

salesforce / BLIP

OpenGVLab / InternGPT

sgrvinod / a-PyTorch-Tutorial-to-Image-Captioning

OFA-Sys / OFA

ttengwang / Caption-Anything

peteanderson80 / bottom-up-attention

imaginary-cloud / CameraManager

jhc13 / taggui

NVlabs / prismer

microsoft / Oscar

ruotianluo / self-critical.pytorch

YehLi / xmodaler

cuixing158 / Awesome-CV-MasterHub

yunjey / show-attend-and-tell

AkagawaTsurunaki / ZerolanLiveRobot

SkalskiP / awesome-foundation-and-multimodal-models

kuanghuei / SCAN

gokayfem / ComfyUI_VLM_nodes

kdexd / virtex

Improve this page

Add this topic to your repo