LAVIS - A One-stop Library for Language-Vision Intelligence
-
Updated
Nov 18, 2024 - Jupyter Notebook
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Simple Swift class to provide all the configurations you need to create custom camera view in your app
Tag manager and captioner for image datasets
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Oscar and VinVL
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
🔥 🔥 🔥 A paper list of some recent Computer Vision(CV) works
TensorFlow Implementation of "Show, Attend and Tell"
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
Add a description, image, and links to the image-captioning topic page so that developers can more easily learn about it.
To associate your repository with the image-captioning topic, visit your repo's landing page and select "manage topics."