一区二区日本_久久久久久久国产精品_无码国模国产在线观看_久久99深爱久久99精品_亚洲一区二区三区四区五区午夜_日本在线观看一区二区

Mini-Gemini:

Mining the Potential of Multi-modality Vision Language Models

The Chinese University of Hong Kong

Updates: Mini-Gemini is comming! We release the paper, code, data, models, and demo for Mini-Gemini.

Abstract

In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to advanced models like GPT-4 and Gemini. We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i.e., high-resolution visual tokens, high-quality data, and VLM-guided generation. To enhance visual tokens, we propose to utilize an additional visual encoder for high-resolution refinement without increasing the visual token count. We further construct a high-quality dataset that promotes precise image comprehension and reasoning-based generation, expanding the operational scope of current VLMs. In general, Mini-Gemini further mines the potential of VLMs and empowers current framework with image understanding, reasoning, and generation simultaneously. Mini-Gemini supports a series of dense and MoE Large Language Models (LLMs) from 2B to 34B. It is demonstrated to achieve leading performance in several zero-shot benchmarks and even surpass the developed private models.



Model

The framework of Mini-Gemini is conceptually simple: dual vision encoders are utilized to provide low-resolution visual embedding and high-resolution candidates; patch info mining is proposed to conduct patch-level mining between high-resolution regions and low-resolution visual queries; LLM is utilized to marry text with images for both comprehension and generation at the same time.

BibTeX


@article{li2024minigemini,
  title={Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models},
  author={Li, Yanwei and Zhang, Yuechen and Wang, Chengyao and Zhong, Zhisheng and Chen, Yixin and Chu, Ruihang and Liu, Shaoteng and Jia, Jiaya},
  journal={arXiv preprint arXiv:2403.18814},
  year={2024}
}
  

Acknowledgement

This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Examples









主站蜘蛛池模板: 狠狠入ady亚洲精品经典电影 | 玖玖国产精品视频 | 精品国产欧美一区二区三区成人 | 国产成人精品久久二区二区 | 国产精品免费一区二区三区 | 在线日韩中文字幕 | 欧洲一区二区在线 | 波多野结衣先锋影音 | 久久久999成人 | 在线免费观看黄视频 | 日韩视频精品在线 | 日本一本视频 | 中文字幕精品一区二区三区在线 | 国产一区二区三区在线 | av网站在线看 | 国产欧美视频一区二区 | 美女中文字幕视频 | 日本精品视频在线观看 | 亚洲福利在线观看 | 在线91| 国产欧美在线播放 | 黑人精品欧美一区二区蜜桃 | 国产精品成人一区二区三区夜夜夜 | 亚洲三区在线观看 | 高清久久久 | 99re在线观看 | 91久久 | 国产精品亚洲第一 | 国产精品久久久久久网站 | 国产福利精品一区 | 色婷婷av一区二区三区软件 | 日韩精品视频一区二区三区 | 国产午夜精品视频 | 玖玖精品 | 欧美日韩一区二区三区不卡视频 | 亚洲精品视频免费 | www.9191.com| www.亚洲.com | 精品成人佐山爱一区二区 | 免费观看的av毛片的网站 | 欧美一级久久 |