Vision Encoder/Decoder Image to Text

Silicon Valley is raising billions to develop wearable AI products. These 15-year-olds built theirs for under $100.

Akhil Nagori, Evann Sun, and Lucas Shengwen Yen spent about five months creating a pair of 3D-printed smart glasses that can ...

GitHub

Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages

Abstract. An old-school recipe for training a classifier is to (i) learn a good feature extractor and (ii) optimize a linear layer atop. When only a handful of samples are available per category, as ...

IEEE

TJCMNet: An Efficient Vision-Text Joint Identity Clues Mining Network for Visible-Infrared Person Re-Identification

Abstract: Retrieving images for Visible-Infrared Person Re-identification task is challenging, because of the huge modality discrepancy caused by the different imaging principle of RGB and infrared ...

21d

Z.ai debuts open source GLM-4.6V, a native tool-calling vision model for multimodal reasoning

Chinese AI startup Zhipu AI aka Z.ai has released its GLM-4.6V series, a new generation of open-source vision-language models (VLMs) optimized for multimodal reasoning, frontend automation, and ...

GitHub

Variational Autoencoder (VAE) for Image Generation

This project implements a Variational Autoencoder (VAE) for image generation. Unlike standard autoencoders, VAE learns a probabilistic latent space by encoding images to a distribution and sampling ...

IEEE

TALIU: A Novel Decoder and Augmentation Strategy for Boosting Tampered Document Image Detection

Abstract: In modern information exchange, document images are vital, often embedding sensitive data. The emergence of advanced image editing tools and generative AI models has elevated the risks ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results