MiniCPM

MiniCPM 是一系列由面壁智能与清华大学自然语言处理实验室共同开发的端侧大语言模型(LLM)。

MiniCPM 系列模型版本

MiniCPM-2B

  • 参数量:2.4B(非词嵌入参数)
  • 特点:尽管参数量较小,但在中文、数学和代码能力方面表现优异,整体性能超过了 Llama2-13B 等更大参数量的模型。
  • 应用场景:适用于文本生成、翻译、摘要等自然语言处理任务。

MiniCPM-V 2.6

  • 参数量:8B
  • 特点:最新且性能最强的模型,支持多图对话和推理,能够处理任意长宽比的图像,并在 OCR(光学字符识别)方面表现出色。
  • 应用场景:多模态理解,包括图像和视频的描述、对话和推理。

MiniCPM-2B-128k

  • 参数量:2.4B(非词嵌入参数)
  • 特点:支持128k上下文长度,在综合长文本评测 InfiniteBench 上取得了7B以下最佳成绩,但在4k以内性能有所下降。
  • 应用场景:长文本处理任务,如长篇文章生成和分析。

MiniCPM-1B-SFT

  • 参数量:1B
  • 特点:更轻量级的版本,经过指令微调,适用于手机端的文本及多模态推理。
  • 应用场景:移动设备上的自然语言处理和多模态任务。

应用场景

1. 自然语言处理

  • 文本生成:MiniCPM 可以用于生成高质量的文本内容,如新闻报道、故事创作等。
  • 翻译:支持多语言翻译,提升翻译的准确性和流畅度。
  • 摘要:能够从长文本中提取关键信息,生成简洁的摘要。

2. 多模态理解

  • 图像识别:MiniCPM-V 2.6 在 OCR(光学字符识别)方面表现出色,能够识别复杂的场景文字。
  • 视频分析:支持多图像和视频的理解与分析,适用于监控、视频内容审核等场景。
  • 图文对话:能够处理图像和文本的混合输入,进行多模态对话和推理。

3. 移动端应用

  • 智能助手:MiniCPM 可以部署在智能手机和平板电脑上,提供智能对话、信息查询等功能。
  • 实时翻译:在移动设备上实现实时翻译,方便用户在不同语言环境中交流。

4. 教育领域

  • 智慧课堂:通过 MiniCPM,学生可以更便捷地获取学习资料、解答疑惑,提高学习效率和质量。
  • 智能辅导:提供个性化的学习建议和辅导,帮助学生更好地理解和掌握知识。

5. 商业应用

  • 发票识别:在商业领域,MiniCPM 可以用于发票识别、合同审核等需要 OCR 技术的场景。
  • 客户服务:通过智能对话系统,提升客户服务的效率和满意度。

6. 文化遗产保护

  • 古文字识别:MiniCPM 在 OCR 方面的强大能力使其能够识别和解读古文字,助力文化遗产的保护和研究。

开源版本

MiniCPM3-4B

  • 参数量:4B
  • 特点:这是 MiniCPM 系列的第三代产品,整体性能超过了 Phi-3.5-mini-Instruct 和 GPT-3.5-Turbo-0125,媲美多款 70 亿到 90 亿参数的 AI 模型。支持函数调用和代码解释器,适用于更广泛的用途。
  • 开源地址:可以在 GitHub 上找到相关的开源代码和模型。

MiniCPM-V 2.6

  • 参数量:8B
  • 特点:最新且性能最强的模型,支持多图对话和推理,能够处理任意长宽比的图像,并在 OCR(光学字符识别)方面表现出色。该模型在多个多模态评测基准中取得了最佳水平。
  • 开源地址:可以在 Hugging Face 和 GitHub 上找到相关的开源代码和模型。

MiniCPM-2B

  • 参数量:2.4B
  • 特点:支持128k上下文长度,在综合长文本评测 InfiniteBench 上取得了7B以下最佳成绩,但在4k以内性能有所下降。
  • 开源地址:可以在 GitHub 上找到相关的开源代码和模型。

MiniCPM-1B-SFT

  • 参数量:1B
  • 特点:更轻量级的版本,经过指令微调,适用于手机端的文本及多模态推理。
  • 开源地址:可以在 GitHub 上找到相关的开源代码和模型。

闭源版本

MiniCPM-Llama3-V 2.5

  • 参数量:8B
  • 特点:该模型在多模态综合性能上超越了 GPT-4V-1106、Gemini Pro、Claude 3、Qwen-VL-Max 等商用闭源模型。其 OCR 能力及指令跟随能力得到了进一步提升,支持 30 多种语言,并首次在端侧实现了 GPT-4V 级的多模态能力。
  • 应用场景:适用于图像识别、视频分析、多语言翻译等多模态任务。

MiniCPM-V 2.6

  • 参数量:8B
  • 特点:这是 MiniCPM 系列中最新且性能最强的模型,支持多图对话和推理,能够处理任意长宽比的图像,并在 OCR(光学字符识别)方面表现出色。该模型在多个多模态评测基准中取得了最佳水平。
  • 应用场景:多模态理解,包括图像和视频的描述、对话和推理。

MiniCPM-2B-128k

  • 参数量:2.4B
  • 特点:支持128k上下文长度,在综合长文本评测 InfiniteBench 上取得了7B以下最佳成绩,但在4k以内性能有所下降。
  • 应用场景:长文本处理任务,如长篇文章生成和分析。

MiniCPM is a series of edge-based large language models (LLMs) jointly developed by WALL Intelligence and the Tsinghua University Natural Language Processing Lab.

MiniCPM Series Model Versions

  • MiniCPM-2B
    • Parameters: 2.4 billion (excluding embedding parameters)
    • Features: Despite having fewer parameters, it excels in Chinese, math, and coding abilities, outperforming larger models like Llama2-13B in overall performance.
    • Use Cases: Suitable for tasks such as text generation, translation, and summarization.
  • MiniCPM-V 2.6
    • Parameters: 8 billion
    • Features: The latest and most powerful model, supporting multi-image dialogue and reasoning. It handles images of any aspect ratio and performs well in OCR (optical character recognition).
    • Use Cases: Multimodal understanding, including image and video descriptions, dialogue, and reasoning.
  • MiniCPM-2B-128k
    • Parameters: 2.4 billion (excluding embedding parameters)
    • Features: Supports a 128k context length, achieving the best performance on the InfiniteBench evaluation for models below 7B parameters, although performance drops for contexts under 4k.
    • Use Cases: Long-text processing tasks, such as generating and analyzing lengthy articles.
  • MiniCPM-1B-SFT
    • Parameters: 1 billion
    • Features: A more lightweight version fine-tuned on instructions, designed for text and multimodal reasoning on mobile devices.
    • Use Cases: Natural language processing and multimodal tasks on mobile platforms.

Application Scenarios

  1. Natural Language Processing
    • Text Generation: MiniCPM can generate high-quality text content, such as news articles and creative stories.
    • Translation: Supports multilingual translation, improving accuracy and fluency.
    • Summarization: Extracts key information from long texts to generate concise summaries.
  2. Multimodal Understanding
    • Image Recognition: MiniCPM-V 2.6 excels at OCR, capable of recognizing complex scene texts.
    • Video Analysis: Supports understanding and analyzing multiple images and videos, useful for surveillance and content review.
    • Image-Text Dialogue: Handles mixed input of images and text for multimodal dialogue and reasoning.
  3. Mobile Applications
    • Smart Assistant: MiniCPM can be deployed on smartphones and tablets to provide smart dialogue, information retrieval, and more.
    • Real-time Translation: Enables real-time translation on mobile devices, allowing users to communicate across different languages.
  4. Education
    • Smart Classroom: MiniCPM helps students access study materials and resolve doubts more efficiently, improving the quality of learning.
    • Intelligent Tutoring: Provides personalized learning advice and tutoring to help students better understand and master knowledge.
  5. Business Applications
    • Invoice Recognition: MiniCPM can be used in business settings for tasks like invoice recognition and contract review, where OCR is required.
    • Customer Service: Enhances customer service efficiency and satisfaction through intelligent dialogue systems.
  6. Cultural Heritage Preservation
    • Ancient Text Recognition: MiniCPM’s strong OCR capabilities enable it to recognize and interpret ancient texts, aiding in cultural heritage preservation and research.

Open-Source Versions

  • MiniCPM3-4B
    • Parameters: 4 billion
    • Features: The third generation of MiniCPM models, with overall performance exceeding Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125. It supports function calling and code interpretation, making it suitable for a wider range of tasks.
    • Open-source: Available on GitHub.
  • MiniCPM-V 2.6
    • Parameters: 8 billion
    • Features: The most recent and powerful model, supporting multi-image dialogue and reasoning, excelling in OCR tasks, and achieving top scores on multimodal evaluation benchmarks.
    • Open-source: Available on Hugging Face and GitHub.
  • MiniCPM-2B
    • Parameters: 2.4 billion
    • Features: Supports a 128k context length and achieved top performance on InfiniteBench for models below 7B, though it has reduced performance with shorter contexts.
    • Open-source: Available on GitHub.
  • MiniCPM-1B-SFT
    • Parameters: 1 billion
    • Features: A lightweight version fine-tuned for instruction-following and multimodal reasoning on mobile devices.
    • Open-source: Available on GitHub.

Closed-Source Versions

  • MiniCPM-Llama3-V 2.5
    • Parameters: 8 billion
    • Features: This model surpasses commercial closed-source models like GPT-4V-1106, Gemini Pro, Claude 3, and Qwen-VL-Max in multimodal performance. It has enhanced OCR and instruction-following abilities, supporting over 30 languages and delivering GPT-4V-level multimodal capabilities on edge devices for the first time.
    • Use Cases: Suitable for tasks like image recognition, video analysis, and multilingual translation.
  • MiniCPM-V 2.6
    • Parameters: 8 billion
    • Features: The latest and most powerful model in the MiniCPM series, supporting multi-image dialogue and reasoning. It handles any image aspect ratio and performs exceptionally well in OCR tasks, achieving top scores in several multimodal evaluation benchmarks.
    • Use Cases: Multimodal understanding, including image and video description, dialogue, and reasoning.
  • MiniCPM-2B-128k
    • Parameters: 2.4 billion
    • Features: Supports a 128k context length and achieved top performance on InfiniteBench for models below 7B, though performance drops with contexts under 4k.
    • Use Cases: Long-text processing tasks, such as generating and analyzing lengthy documents.
声明:沃图AIGC收录关于AI类别的工具产品,总结文章由AI原创编撰,任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系邮箱wt@wtaigc.com.