一区二区日本_久久久久久久国产精品_无码国模国产在线观看_久久99深爱久久99精品_亚洲一区二区三区四区五区午夜_日本在线观看一区二区

Mini-Gemini:

Mining the Potential of Multi-modality Vision Language Models

The Chinese University of Hong Kong

Updates: Mini-Gemini is comming! We release the paper, code, data, models, and demo for Mini-Gemini.

Abstract

In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to advanced models like GPT-4 and Gemini. We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i.e., high-resolution visual tokens, high-quality data, and VLM-guided generation. To enhance visual tokens, we propose to utilize an additional visual encoder for high-resolution refinement without increasing the visual token count. We further construct a high-quality dataset that promotes precise image comprehension and reasoning-based generation, expanding the operational scope of current VLMs. In general, Mini-Gemini further mines the potential of VLMs and empowers current framework with image understanding, reasoning, and generation simultaneously. Mini-Gemini supports a series of dense and MoE Large Language Models (LLMs) from 2B to 34B. It is demonstrated to achieve leading performance in several zero-shot benchmarks and even surpass the developed private models.



Model

The framework of Mini-Gemini is conceptually simple: dual vision encoders are utilized to provide low-resolution visual embedding and high-resolution candidates; patch info mining is proposed to conduct patch-level mining between high-resolution regions and low-resolution visual queries; LLM is utilized to marry text with images for both comprehension and generation at the same time.

BibTeX


@article{li2024minigemini,
  title={Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models},
  author={Li, Yanwei and Zhang, Yuechen and Wang, Chengyao and Zhong, Zhisheng and Chen, Yixin and Chu, Ruihang and Liu, Shaoteng and Jia, Jiaya},
  journal={arXiv preprint arXiv:2403.18814},
  year={2024}
}
  

Acknowledgement

This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Examples









主站蜘蛛池模板: 激情毛片 | 久久免费看 | 成人国产在线视频 | 大陆一级毛片免费视频观看 | 成人在线免费网站 | 国产精品久久久免费 | 日韩一级欧美一级 | 黄色在线免费网站 | 中文字幕国产视频 | 国产在线播放一区二区三区 | 99视频久 | 欧美一级大片免费看 | 成人精品一区 | 国产精品视频导航 | 亚洲精品久久久久久久久久久久久 | 免费在线观看91 | 激情欧美一区二区三区中文字幕 | 久久久成人网 | 天天干夜夜操视频 | 色综合一区二区 | 波多野结衣一区二区三区在线观看 | 2021天天干夜夜爽 | 亚洲精品二三区 | 国产视频三区 | 午夜天堂精品久久久久 | 免费看爱爱视频 | 九九精品在线 | 精品一二三区视频 | 亚洲精品国产一区 | 在线日韩欧美 | 91欧美精品成人综合在线观看 | 福利久久| 97国产一区二区精品久久呦 | 91看片免费| 成人久久18免费网站图片 | 欧美色综合网 | 亚洲男人网 | 久久涩涩 | 亚洲激情视频在线 | 国产精品日本一区二区在线播放 | 免费网站国产 |