• 博客园logo
  • 会员
  • 众包
  • 新闻
  • 博问
  • 闪存
  • 赞助商
  • HarmonyOS
  • Chat2DB
    • 搜索
      所有博客
    • 搜索
      当前博客
  • 写随笔 我的博客 短消息 简洁模式
    用户头像
    我的博客 我的园子 账号设置 会员中心 简洁模式 ... 退出登录
    注册 登录

littlesuccess

  • 博客园
  • 联系
  • 订阅
  • 管理

公告

View Post

书生开源大模型训练营-第5讲-笔记

1、大模型部署的背景

a、大模型部署的挑战

  • 对设备的要求:存储大
  • 推理:生成token的速度、动态shape、内存管理
  • 服务:高并发下的吞吐量、单用户的响应时间

b、技术方案:

  • 模型并行、量化
  • Transfomer优化、推理优化

2、LMDeploy介绍

a、是在N卡上的全流程方案,包括轻量化、推理和服务,当前还没有cover移动端。核心是推理引擎turbomind

 b、为什么要量化?

  • 降低显存要求:权重、KV Cache
  • 提升推理速度
  • 推理阶段,Decoder Only模型的瓶颈是内存访问,A100上实测batch_size=128时瓶颈由内存访问过渡到计算

c、TurboMind推理引擎

  • Continous Batch Inferrence
  • 有状态的推理
  • Blocked K/V cache
  • 高性能cuda kernel优化

d、推理服务api server

3、动手

3.1、环境准备

查看显卡资源使用情况:

 实时查看GPU的使用情况:

 创建虚拟环境:

/root/share/install_conda_env_internlm_base.sh lmdeploy

查看虚拟环境:

 激活虚拟环境lmdeploy

 Python检查一下 PyTorch:

 安装lmdeploy最新稳定版:

 

 

验证lmdeploy已安装好

 

 

3.2、服务部署

a、模型转化为 TurboMind 的格式:

在线转换:

 

 离线转换:

(lmdeploy) root@intern-studio-069640:~/ch5# lmdeploy convert internlm-chat-7b /root/share/model_repos/internlm-chat-7b
create workspace in directory ./workspace
copy triton model templates from "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/turbomind/triton_models" to "./workspace/triton_models"
copy service_docker_up.sh from "/root/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/turbomind/service_docker_up.sh" to "./workspace"
model_name             internlm-chat-7b
model_format           None
inferred_model_format  hf
model_path             /root/share/model_repos/internlm-chat-7b
tokenizer_path         /root/share/model_repos/internlm-chat-7b/tokenizer.model
output_format          fp16
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
*** splitting layers.0.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.0.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.0.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.0.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.0.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.0.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.0.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.1.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.1.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.1.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.1.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.1.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.1.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.1.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.2.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.2.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.2.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.2.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.2.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.2.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.2.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.3.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.3.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.3.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.3.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.3.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.3.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.3.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.4.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.4.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.4.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.4.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.4.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.4.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.4.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.5.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.5.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.5.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.5.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.5.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.5.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.5.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.6.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.6.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.6.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.6.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.6.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.6.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.6.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.7.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.7.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.7.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.7.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.7.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.7.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.7.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.8.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.8.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.8.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.8.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.8.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.8.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.8.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.9.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                       
*** splitting layers.9.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                            
*** splitting layers.9.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                            
### copying layers.9.attention.wo.bias, shape=torch.Size([4096])                                                                                         
*** splitting layers.9.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.9.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                       
*** splitting layers.9.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                        
*** splitting layers.10.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.10.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.10.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.10.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.10.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.10.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.10.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.11.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.11.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.11.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.11.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.11.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.11.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.11.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.12.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.12.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.12.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.12.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.12.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.12.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.12.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.13.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.13.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.13.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.13.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.13.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.13.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.13.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.14.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.14.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.14.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.14.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.14.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.14.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.14.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.15.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.15.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.15.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.15.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.15.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.15.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.15.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.16.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.16.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.16.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.16.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.16.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.16.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.16.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.17.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.17.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.17.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.17.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.17.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.17.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.17.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.18.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.18.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.18.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.18.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.18.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.18.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.18.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.19.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.19.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.19.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.19.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.19.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.19.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.19.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.20.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.20.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.20.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.20.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.20.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.20.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.20.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.21.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.21.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.21.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.21.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.21.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.21.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.21.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.22.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.22.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.22.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.22.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.22.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.22.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.22.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.23.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.23.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.23.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.23.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.23.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.23.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.23.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.24.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.24.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.24.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.24.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.24.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.24.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.24.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.25.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.25.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.25.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.25.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.25.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.25.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.25.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.26.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.26.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.26.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.26.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.26.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.26.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.26.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.27.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.27.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.27.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.27.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.27.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.27.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.27.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.28.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.28.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.28.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.28.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.28.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.28.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.28.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.29.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.29.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.29.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.29.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.29.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.29.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.29.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.30.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.30.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.30.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.30.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.30.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.30.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.30.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
*** splitting layers.31.attention.w_qkv.weight, shape=torch.Size([4096, 12288]), split_dim=-1, tp=1                                                      
*** splitting layers.31.attention.wo.weight, shape=torch.Size([4096, 4096]), split_dim=0, tp=1                                                           
*** splitting layers.31.attention.w_qkv.bias, shape=torch.Size([1, 12288]), split_dim=-1, tp=1                                                           
### copying layers.31.attention.wo.bias, shape=torch.Size([4096])                                                                                        
*** splitting layers.31.feed_forward.w1.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.31.feed_forward.w3.weight, shape=torch.Size([4096, 11008]), split_dim=-1, tp=1                                                      
*** splitting layers.31.feed_forward.w2.weight, shape=torch.Size([11008, 4096]), split_dim=0, tp=1                                                       
Convert to turbomind format: 100%|███████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:40<00:00,  1.28s/it]
(lmdeploy) root@intern-studio-069640:~/ch5# 

 本地命令行推理:

先尝试本地对话:

 运用 lmdepoy 进行服务化:

 

 做一个本地端口映射,即可在本地Macbook上查看API的信息:

 

 网页演示:

 做好本地端口映射:

 即可试用!

 

 

3.3、模型量化

posted on 2024-02-18 20:50  littlesuccess  阅读(40)  评论(0)    收藏  举报

刷新页面返回顶部
 
博客园  ©  2004-2025
浙公网安备 33010602011771号 浙ICP备2021040463号-3