从设备向量到知识文档:SonnetDB Measurement KNN 与文档关联检索

前一篇讲的是 document collection 内部的 Hybrid Search。这一篇更接近工业现场:设备事件本身是时序 measurement,知识库是 JSON 文档,设备和租户信息在关系表里。SonnetDB 的 hybrid_search(...) 可以把这些模型放到一个查询里。

先写入设备事件向量:

using SonnetDB.Engine;
using SonnetDB.Sql;
using SonnetDB.Sql.Execution;

using var db = Tsdb.Open(new TsdbOptions
{
    RootDirectory = "data/measurement-knowledge-demo"
});

SqlExecutor.Execute(db, """
    CREATE MEASUREMENT incidents (
        device_id TAG,
        embedding FIELD VECTOR(3),
        severity FIELD FLOAT
    )
    """);

SqlExecutor.Execute(db, """
    INSERT INTO incidents (device_id, embedding, severity, time)
    VALUES
      ('pump-1', [1, 0, 0], 9.0, 1000),
      ('pump-2', [0.8, 0.2, 0], 7.5, 2000),
      ('fan-1',  [0, 1, 0], 3.0, 3000)
    """);

再写知识库文档:

SqlExecutor.Execute(db, "CREATE DOCUMENT COLLECTION knowledge");

SqlExecutor.Execute(db, """
    INSERT INTO knowledge (id, document)
    VALUES
      ('kb-pump-1', '{"device_id":"pump-1","title":"Pump overheating guide","body":"pump alarm overheating recovery","category":"fault","embedding":[1,0,0]}'),
      ('kb-pump-2', '{"device_id":"pump-2","title":"Pump pressure guide","body":"pump alarm pressure inspection","category":"fault","embedding":[0.8,0.2,0]}'),
      ('kb-fan-1',  '{"device_id":"fan-1","title":"Fan maintenance note","body":"fan maintenance normal","category":"normal","embedding":[0,1,0]}')
    """);

SqlExecutor.Execute(db, "CREATE JSON INDEX idx_knowledge_device ON knowledge ('$.device_id')");
SqlExecutor.Execute(db, "CREATE FULLTEXT INDEX ft_knowledge_body ON knowledge ('$.body') USING unicode");

现在可以把 measurement KNN、知识文档 JSON path、全文 BM25 融合起来:

var result = (SelectExecutionResult)SqlExecutor.Execute(db, """
    SELECT measurement.device_id AS device,
           document_id,
           json_value(document, '$.title') AS title,
           measurement_distance() AS m_distance,
           bm25_score() AS text_score,
           hybrid_score() AS score
    FROM hybrid_search(
        source => incidents,
        documents => knowledge,
        vector_field => embedding,
        vector => [1, 0, 0],
        k => 5,
        measurement_join_tag => device_id,
        document_join_path => '$.device_id',
        document_join_index => idx_knowledge_device,
        text_index => ft_knowledge_body,
        text_field => '$.body',
        text => 'pump alarm overheating',
        measurement_weight => 0.7,
        text_weight => 0.3)
    WHERE time >= 1000 AND category = 'fault'
    ORDER BY score DESC
    """);

这个查询做了几件事:

  • 先在 incidents.embedding 上按向量召回相近事件。
  • device_id 关联 knowledge 文档里的 $.device_id
  • 对知识正文 $.body 做全文 BM25 评分。
  • 只保留 category = 'fault' 的知识文档。
  • 最后按融合分数排序。

加租户维表过滤

如果你还维护设备维表,可以继续 JOIN:

SqlExecutor.Execute(db, """
    CREATE TABLE devices (
        id STRING,
        tenant STRING,
        site STRING,
        PRIMARY KEY (id)
    )
    """);

SqlExecutor.Execute(db, "CREATE INDEX idx_devices_tenant ON devices (tenant)");

SqlExecutor.Execute(db, """
    INSERT INTO devices (id, tenant, site)
    VALUES
      ('pump-1', 'tenant-1', 'north'),
      ('pump-2', 'tenant-2', 'south'),
      ('fan-1',  'tenant-1', 'east')
    """);

然后:

var tenantScoped = (SelectExecutionResult)SqlExecutor.Execute(db, """
    SELECT measurement.device_id AS device,
           d.site AS site,
           document_id,
           hybrid_score() AS score
    FROM hybrid_search(
        source => incidents,
        documents => knowledge,
        vector_field => embedding,
        vector => [1, 0, 0],
        measurement_join_tag => device_id,
        document_join_path => '$.device_id',
        text => 'pump alarm')
    JOIN devices d ON measurement.device_id = d.id
    WHERE d.tenant = 'tenant-1'
    ORDER BY score DESC
    """);

这就是 SonnetDB 多模型能力最有意思的地方:向量不是孤立的,全文不是孤立的,关系维表也不是孤立的。它们可以一起回答一个业务问题。


官网地址:https://sonnetdb.com

技术文章站:https://iotpaper.net

开源仓库:https://github.com/IoTSharp/SonnetDB

posted @ 2026-06-14 23:04  IoTSharp  阅读(1)  评论(0)    收藏  举报