从设备向量到知识文档:SonnetDB Measurement KNN 与文档关联检索
前一篇讲的是 document collection 内部的 Hybrid Search。这一篇更接近工业现场:设备事件本身是时序 measurement,知识库是 JSON 文档,设备和租户信息在关系表里。SonnetDB 的 hybrid_search(...) 可以把这些模型放到一个查询里。
先写入设备事件向量:
using SonnetDB.Engine;
using SonnetDB.Sql;
using SonnetDB.Sql.Execution;
using var db = Tsdb.Open(new TsdbOptions
{
RootDirectory = "data/measurement-knowledge-demo"
});
SqlExecutor.Execute(db, """
CREATE MEASUREMENT incidents (
device_id TAG,
embedding FIELD VECTOR(3),
severity FIELD FLOAT
)
""");
SqlExecutor.Execute(db, """
INSERT INTO incidents (device_id, embedding, severity, time)
VALUES
('pump-1', [1, 0, 0], 9.0, 1000),
('pump-2', [0.8, 0.2, 0], 7.5, 2000),
('fan-1', [0, 1, 0], 3.0, 3000)
""");
再写知识库文档:
SqlExecutor.Execute(db, "CREATE DOCUMENT COLLECTION knowledge");
SqlExecutor.Execute(db, """
INSERT INTO knowledge (id, document)
VALUES
('kb-pump-1', '{"device_id":"pump-1","title":"Pump overheating guide","body":"pump alarm overheating recovery","category":"fault","embedding":[1,0,0]}'),
('kb-pump-2', '{"device_id":"pump-2","title":"Pump pressure guide","body":"pump alarm pressure inspection","category":"fault","embedding":[0.8,0.2,0]}'),
('kb-fan-1', '{"device_id":"fan-1","title":"Fan maintenance note","body":"fan maintenance normal","category":"normal","embedding":[0,1,0]}')
""");
SqlExecutor.Execute(db, "CREATE JSON INDEX idx_knowledge_device ON knowledge ('$.device_id')");
SqlExecutor.Execute(db, "CREATE FULLTEXT INDEX ft_knowledge_body ON knowledge ('$.body') USING unicode");
现在可以把 measurement KNN、知识文档 JSON path、全文 BM25 融合起来:
var result = (SelectExecutionResult)SqlExecutor.Execute(db, """
SELECT measurement.device_id AS device,
document_id,
json_value(document, '$.title') AS title,
measurement_distance() AS m_distance,
bm25_score() AS text_score,
hybrid_score() AS score
FROM hybrid_search(
source => incidents,
documents => knowledge,
vector_field => embedding,
vector => [1, 0, 0],
k => 5,
measurement_join_tag => device_id,
document_join_path => '$.device_id',
document_join_index => idx_knowledge_device,
text_index => ft_knowledge_body,
text_field => '$.body',
text => 'pump alarm overheating',
measurement_weight => 0.7,
text_weight => 0.3)
WHERE time >= 1000 AND category = 'fault'
ORDER BY score DESC
""");
这个查询做了几件事:
- 先在
incidents.embedding上按向量召回相近事件。 - 用
device_id关联knowledge文档里的$.device_id。 - 对知识正文
$.body做全文 BM25 评分。 - 只保留
category = 'fault'的知识文档。 - 最后按融合分数排序。
加租户维表过滤
如果你还维护设备维表,可以继续 JOIN:
SqlExecutor.Execute(db, """
CREATE TABLE devices (
id STRING,
tenant STRING,
site STRING,
PRIMARY KEY (id)
)
""");
SqlExecutor.Execute(db, "CREATE INDEX idx_devices_tenant ON devices (tenant)");
SqlExecutor.Execute(db, """
INSERT INTO devices (id, tenant, site)
VALUES
('pump-1', 'tenant-1', 'north'),
('pump-2', 'tenant-2', 'south'),
('fan-1', 'tenant-1', 'east')
""");
然后:
var tenantScoped = (SelectExecutionResult)SqlExecutor.Execute(db, """
SELECT measurement.device_id AS device,
d.site AS site,
document_id,
hybrid_score() AS score
FROM hybrid_search(
source => incidents,
documents => knowledge,
vector_field => embedding,
vector => [1, 0, 0],
measurement_join_tag => device_id,
document_join_path => '$.device_id',
text => 'pump alarm')
JOIN devices d ON measurement.device_id = d.id
WHERE d.tenant = 'tenant-1'
ORDER BY score DESC
""");
这就是 SonnetDB 多模型能力最有意思的地方:向量不是孤立的,全文不是孤立的,关系维表也不是孤立的。它们可以一起回答一个业务问题。
官网地址:https://sonnetdb.com
技术文章站:https://iotpaper.net

浙公网安备 33010602011771号