要实现通过自然语言查询本地MES数据库中的实时生产数据

要实现通过自然语言查询本地MES数据库中的实时生产数据,可以按照以下架构实现:

```mermaid

```

graph TD
A[用户输入] --> B{NLP解析}
B --> C[SQL生成]
C --> D[安全验证]
D --> E[数据库查询]
E --> F[结果处理]
F --> G[自然语言输出]

 

以下是具体实现方案:

1. 系统架构搭建

```python
# config.yaml
database:
type: mysql
host: 192.168.1.100
port: 3306
user: mes_user
password: secure_password
dbname: production_db
allowed_tables: ["work_orders", "production_lines"]
max_rows: 1000

model:
local_path: /models/deepseek-chat-7b
device: cuda:0
max_length: 512
```

 

2. 核心处理模块
```python

import torch
from sqlalchemy import create_engine, MetaData
from transformers import AutoTokenizer, AutoModelForCausalLM

class MESQueryProcessor:
def __init__(self, config_path):
# 初始化数据库连接
self.db_engine = create_engine(
f"{config['database']['type']}://{config['database']['user']}:{config['database']['password']}"
f"@{config['database']['host']}:{config['database']['port']}/{config['database']['dbname']}"
)
self.metadata = MetaData()
self.metadata.reflect(bind=self.db_engine)

# 加载本地模型
self.tokenizer = AutoTokenizer.from_pretrained(config['model']['local_path'])
self.model = AutoModelForCausalLM.from_pretrained(
config['model']['local_path'],
device_map=config['model']['device'],
torch_dtype=torch.float16
)

# 构建Schema提示
self.schema_prompt = self._build_schema_prompt()

def _build_schema_prompt(self):
"""生成数据库结构描述"""
schema_info = []
for table in self.metadata.tables.values():
if table.name in config['database']['allowed_tables']:
columns = [f"{col.name} ({str(col.type)})" for col in table.columns]
schema_info.append(f"表 {table.name}: {', '.join(columns)}")
return "\n".join(schema_info)

def generate_sql(self, question):
"""生成安全的SQL查询"""
prompt = f"""
数据库结构:
{self.schema_prompt}

请将以下问题转换为安全且符合规范的SQL查询:
Q: {question}
A: SELECT
"""
inputs = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)
outputs = self.model.generate(
inputs.input_ids,
max_length=config['model']['max_length'],
temperature=0.1,
num_return_sequences=1
)
generated_sql = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
return self._sanitize_sql(generated_sql)

def _sanitize_sql(self, sql):
"""SQL安全过滤"""
# 删除危险操作
forbidden_keywords = ['INSERT', 'UPDATE', 'DELETE', 'DROP', 'TRUNCATE']
for keyword in forbidden_keywords:
if keyword in sql.upper():
raise SecurityError(f"检测到非法操作: {keyword}")
# 限制返回行数
if 'LIMIT' not in sql.upper():
sql += f" LIMIT {config['database']['max_rows']}"
return sql

def execute_query(self, sql):
"""执行查询并格式化结果"""
with self.db_engine.connect() as conn:
result = conn.execute(sql)
columns = result.keys()
data = result.fetchall()
return {"columns": columns, "data": data}

def process_query(self, question):
"""完整处理流程"""
try:
generated_sql = self.generate_sql(question)
query_result = self.execute_query(generated_sql)
return self.format_response(query_result)
except Exception as e:
return f"查询失败: {str(e)}"

def format_response(self, result):
"""结果自然语言化"""
columns = result['columns']
data = result['data']

# 生成自然语言描述
summary = f"找到 {len(data)} 条记录:\n"
summary += ", ".join(columns) + "\n"

# 添加关键数据示例
for row in data[:3]:
summary += " | ".join(str(x) for x in row) + "\n"
if len(data) > 3:
summary += f"...(共 {len(data)} 条)"
return summary

# 初始化处理器
processor = MESQueryProcessor("config.yaml")
```

 

3. 安全防护层实现
```python

from sqlparse import parse, tokens
from sqlparse.sql import IdentifierList, Identifier

class SQLValidator:
@staticmethod
def validate(sql):
"""多层SQL验证"""
parsed = parse(sql)[0]

# 验证语句类型
if parsed.get_type() != 'SELECT':
raise SecurityError("只允许SELECT查询")

# 验证授权表访问
allowed_tables = config['database']['allowed_tables']
for token in parsed.tokens:
if isinstance(token, IdentifierList):
for identifier in token.get_identifiers():
if identifier.get_real_name() not in allowed_tables:
raise SecurityError(f"禁止访问表 {identifier.get_real_name()}")
elif isinstance(token, Identifier):
if token.get_real_name() not in allowed_tables:
raise SecurityError(f"禁止访问表 {token.get_real_name()}")
```

 

4. 使用示例
```python

if __name__ == "__main__":
question = "请列出所有状态为正在生产的工单,包含工单号、产品型号和当前进度"

response = processor.process_query(question)
print("自然语言回答:")
print(response)

# 输出示例:
"""
找到 15 条记录:
work_order_id, product_model, progress
WO-202307-001 | Model-X1 | 75%
WO-202307-002 | Model-X2 | 63%
WO-202307-003 | Model-X3 | 89%
...(共 15 条)

 


"""
```

5. 性能优化措施
- 查询缓存机制:
```python

from functools import lru_cache

class CachedProcessor(MESQueryProcessor):
@lru_cache(maxsize=100)
def process_query(self, question):
return super().process_query(question)
```

 

- 实时数据刷新:
```python

import threading
import time

class RealTimeUpdater:
def __init__(self, processor):
self.processor = processor
self.cache = {}
self.running = True
threading.Thread(target=self._update_loop).start()

def _update_loop(self):
while self.running:
# 每5分钟刷新关键表
try:
with self.processor.db_engine.connect() as conn:
for table in config['database']['allowed_tables']:
conn.execute(f"SELECT 1 FROM {table} LIMIT 1")
time.sleep(300)
except Exception as e:
print(f"刷新失败: {str(e)}")

 

```

6. 生产环境部署建议

syntax = "proto3";

service MESQuery {
    rpc Query (QueryRequest) returns (QueryResponse);
}

message QueryRequest {
    string question = 1;
}

message QueryResponse {
    string answer = 1;
    string sql = 2;
    int32 result_count = 3;
}

 

- 容器化部署配置(Dockerfile):

```dockerfile
FROM nvidia/cuda:12.1.1-base-ubuntu22.04

RUN apt-get update && \
apt-get install -y python3.10 python3-pip && \
update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1

COPY requirements.txt .
RUN pip install -r requirements.txt

ENV PYTHONUNBUFFERED=1
EXPOSE 50051

CMD ["python", "grpc_server.py"]
```

 

7. 权限管理系统集成
```python

from ldap3 import Server, Connection

class AuthManager:
    def __init__(self):
        self.ldap_server = Server('ldap://company.com')
        
    def check_permission(self, user, question):
        # 连接LDAP验证用户权限
        with Connection(self.ldap_server, user=user.uid, password=user.password) as conn:
            if conn.bind():
                # 检查用户所属组是否有权限
                if 'mes_viewer' in user.groups:
                    return True
        return False

 

关键安全措施:
1. SQL注入防护
- 自动生成的SQL必须通过多层验证
- 严格限制查询权限(只读账户)
- 使用参数化查询接口

2. 数据泄露防护
- 结果脱敏处理
- 自动过滤敏感字段
- 查询日志审计

3. 性能保障
- 自动索引建议
- 查询超时控制(30秒)
- 资源使用监控

典型查询示例:
```

用户输入: "显示今天良品率低于95%的生产线"
转换SQL: SELECT line_id, product_type, yield_rate 
FROM production_lines 
WHERE DATE(record_time) = CURDATE() 
AND yield_rate < 0.95
LIMIT 1000

 


```

系统优势:
1. 自然语言交互:无需SQL专业知识
2. 实时数据:直接连接生产数据库
3. 安全可控:多层防护机制
4. 高性能:GPU加速的模型推理
5. 可扩展:支持多数据库类型

升级建议:
1. 增加上下文理解能力("上次查询的工单")
2. 实现自动报表生成功能
3. 添加异常数据预警机制
4. 集成语音输入输出功能
5. 开发移动端访问接口

监控指标建议:
- 查询响应时间(P99 < 3秒)
- 模型推理准确率(>95%)
- 数据库负载(CPU < 60%)
- 错误率(< 0.5%)
- 并发处理能力(>50 QPS)

通过此方案,可以实现:
1. 生产数据的即时自然语言查询
2. 自动生成合规的SQL语句
3. 多重安全防护机制
4. 生产环境级别的可靠性
5. 与企业现有系统的无缝集成

posted on 2025-03-05 16:49  不悔梦归处  阅读(26)  评论(0)    收藏  举报

导航