结构化）》

Day 46：多模态输入统一模型（文本 / 图片 / 文件 / 结构化）

学习目标

统一抽象 文本、图片、文件、结构化数据等多种输入类型
搞清楚 前端到后端如何传多模态 payload（不绑具体业务）
会写一个通用 InputItem 模型 + 基础管理逻辑

核心知识点

1. 多模态输入统一抽象

export type InputKind = 'text' | 'image' | 'file' | 'json'

export interface BaseInputItem {
  id: string
  kind: InputKind
  createdAt: number
  meta?: Record<string, any>
}

export interface TextInputItem extends BaseInputItem {
  kind: 'text'
  text: string
}

export interface ImageInputItem extends BaseInputItem {
  kind: 'image'
  file: File          // 或 url:string
  previewUrl?: string
}

export interface FileInputItem extends BaseInputItem {
  kind: 'file'
  file: File
}

export interface JsonInputItem extends BaseInputItem {
  kind: 'json'
  data: unknown
}

export type InputItem =
  | TextInputItem
  | ImageInputItem
  | FileInputItem
  | JsonInputItem

2. 前端统一转成后端 payload

// 抽象成后端可以消费的结构（不带 File 本体）
export interface InputPayload {
  kind: InputKind
  text?: string
  url?: string
  fileName?: string
  contentType?: string
  data?: unknown
}

export function toPayload(items: InputItem[]): InputPayload[] {
  return items.map((it) => {
    switch (it.kind) {
      case 'text':
        return { kind: 'text', text: it.text }
      case 'image':
        return {
          kind: 'image',
          url: it.meta?.uploadedUrl, // 由上传接口返回
          fileName: (it.file as File).name,
          contentType: (it.file as File).type
        }
      case 'file':
        return {
          kind: 'file',
          url: it.meta?.uploadedUrl,
          fileName: it.file.name,
          contentType: it.file.type
        }
      case 'json':
        return { kind: 'json', data: it.data }
    }
  })
}

3. 与聊天结合的基本思路
- 文本：直接作为 message 内容
- 图片/文件：先上传 → 拿到 url/id → 作为 InputPayload 附在当前问题上
- 结构化 JSON：可作为 context（例如当前选中行/报表过滤条件）

简单实战代码示例（可直接用）

多模态输入管理 hook

import { useState, useCallback } from 'react'
import type { InputItem, TextInputItem, ImageInputItem, FileInputItem } from './inputTypes'

export function useInputItems() {
  const [items, setItems] = useState<InputItem[]>([])

  const addText = useCallback((text: string) => {
    const item: TextInputItem = {
      id: crypto.randomUUID(),
      kind: 'text',
      text,
      createdAt: Date.now()
    }
    setItems(prev => [...prev, item])
  }, [])

  const addImageFile = useCallback((file: File) => {
    const item: ImageInputItem = {
      id: crypto.randomUUID(),
      kind: 'image',
      file,
      previewUrl: URL.createObjectURL(file),
      createdAt: Date.now()
    }
    setItems(prev => [...prev, item])
  }, [])

  const addFile = useCallback((file: File) => {
    const item: FileInputItem = {
      id: crypto.randomUUID(),
      kind: 'file',
      file,
      createdAt: Date.now()
    }
    setItems(prev => [...prev, item])
  }, [])

  const removeItem = useCallback((id: string) => {
    setItems(prev => prev.filter(i => i.id !== id))
  }, [])

  const clear = useCallback(() => setItems([]), [])

  return { items, addText, addImageFile, addFile, removeItem, clear }
}

简单预览组件（展示当前附带的多模态输入）

import React from 'react'
import type { InputItem } from './inputTypes'

interface Props {
  items: InputItem[]
  onRemove: (id: string) => void
}

export const InputPreviewList: React.FC<Props> = ({ items, onRemove }) => {
  if (!items.length) return null
  return (
    <div style={{ border:'1px solid #eee', padding:8, marginBottom:8 }}>
      {items.map(it => (
        <div key={it.id} style={{ display:'flex', alignItems:'center', marginBottom:4 }}>
          <span>[{it.kind}]</span>
          {it.kind === 'text' && (
            <span style={{ marginLeft:4 }}>{(it as any).text.slice(0,30)}...</span>
          )}
          {it.kind === 'image' && (it as any).previewUrl && (
            <img
              src={(it as any).previewUrl}
              style={{ width:40, height:40, objectFit:'cover', marginLeft:4 }}
            />
          )}
          {it.kind === 'file' && (
            <span style={{ marginLeft:4 }}>{(it as any).file.name}</span>
          )}
          <button style={{ marginLeft:'auto' }} onClick={() => onRemove(it.id)}>
            移除
          </button>
        </div>
      ))}
    </div>
  )
}

明日学习计划预告（Day 47）

主题：知识库版本管理与变更可视化（前端视角）
方向：
- 为文档/知识条目设计 version / lastUpdated / diff 模型
- 在前端展示“当前回答基于哪一版文档”，支持查看历史版本对比

posted @ 2025-12-17 11:19 XiaoZhengTou 阅读(1) 评论(0) 收藏举报

刷新页面返回顶部

前端+AI的结合

《60天AI学习计划启动 | Day 46: 多模态输入统一模型（文本 / 图片 / 文件 / 结构化）》

Day 46：多模态输入统一模型（文本 / 图片 / 文件 / 结构化）

学习目标

核心知识点

简单实战代码示例（可直接用）

明日学习计划预告（Day 47）

公告