ollama 命令

本实验基于小模型 smollm2:135m ，下载只需要270MB,内存占用几百M ，运行也非常快。对于熟悉ollama会非常方便，不建议使用大模型测试

下载模型: ollama pull smollm2:135m

PS D:\ollama源码-go> ollama pull smollm2:135m
pulling manifest
pulling f535f83ec568: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 270 MB
pulling fbacade46b4d: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏   68 B
pulling d502d55c1d60: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏  675 B
pulling 58d1e17ffe51: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏  11 KB
pulling f02dd72bb242: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏   59 B
pulling b0f58c4c1a3c: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏  561 B
verifying sha256 digest
writing manifest
success

列出本地模型使用 ollama list 或 ollama ls

PS D:\ollama源码-go> ollama list
NAME                        ID              SIZE      MODIFIED
smollm2:135m-2k-ctx-test    57be70d27195    270 MB    45 minutes ago
smollm2:135m                9077fe9d2ae1    270 MB    About an hour ago
nomic-embed-text:latest     0a109f422b47    274 MB    2 hours ago
mymodel:latest              b00fc97c00a3    1.9 GB    17 hours ago
qwen3-vl:2b                 0635d9d857d4    1.9 GB    17 hours ago
gemma3:latest               a2af6cc3eb7f    3.3 GB    17 hours ago
llama3:latest               365c0bd3c000    4.7 GB    25 hours ago
deepseek-r1:8b              6995872bfe4c    5.2 GB    38 hours ago
gemma3:4b                   a2af6cc3eb7f    3.3 GB    3 weeks ago

一、 有多种指令以实现相对目标。要知道可以用 Ollama 执行哪些子命令，可以执行以下作：

1)ollama --help

PS D:\ollama源码-go>  ollama --help
Large language model runner

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama        #启动ollama,提供rest api服务
  create      Create a model      #创建模型
  show        Show information for a model  #显示模型信息
  run         Run a model    #运行一个模型
  stop        Stop a running model  #停止运行model
  pull        Pull a model from a registry   #下载一个模型从 ollama上
  push        Push a model to a registry   #上传一个模型到ollama
  signin      Sign in to ollama.com     #注册ollama.com账号
  signout     Sign out from ollama.com   #注销账号
  list        List models   #列出本地下载的模型
  ps          List running models  #列出正在运行的模型
  cp          Copy a model    #复制一个模型
  rm          Remove a model   #删除一个模型
  help        Help about any command  
Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.
很多命令类似Docker

如果你需要特定的子命令你可以
2).ollama run --help

PS D:\ollama源码-go> ollama run --help
Run a model

Usage:
  ollama run MODEL [PROMPT] [flags]

Flags:
      --dimensions int          Truncate output embeddings to specified dimension (embedding models only)
      --format string           Response format (e.g. json)
  -h, --help                    help for run
      --hidethinking            Hide thinking output (if provided)
      --insecure                Use an insecure registry
      --keepalive string        Duration to keep a model loaded (e.g. 5m)
      --nowordwrap              Don't wrap words to the next line automatically
      --think string[="true"]   Enable thinking mode: true/false or high/medium/low for supported models
      --truncate                For embedding models: truncate inputs exceeding context length (default: true). Set --truncate=false to error instead
      --verbose                 Show timings for response

Environment Variables:
      OLLAMA_HOST                IP Address for the ollama server (default 127.0.0.1:11434)
      OLLAMA_NOHISTORY           Do not preserve readline history

2)ollama run 命令
Ollama 运行模型使用ollama run 命令。执行ollama run 模型名称，该命令如果没有该模型会去下载对应模型：等待下载完成后，我们在终端中，输入以下命令来加载模型并进行交互例如: ollama run smollm2:135m ，如果你只是下载请用 ollama pull smollm2.135m

3).Ollama serve 命令
他会开始启动GIN服务器(go语言编写)，他可以配置用很多环境变量，例如OLLAMA_DEBUG 来启用或关闭debug,OLLAMA_HOST来指定host,OLLAMA_MAX_QUEUE 来配置最大的队列请求

PS D:\ollama源码-go> ollama serve --help
Start ollama

Usage:
  ollama serve [flags]
Aliases:
  serve, start
Flags:
  -h, --help   help for serve
Environment Variables:
      OLLAMA_DEBUG               Show additional debug information (e.g. OLLAMA_DEBUG=1)
      OLLAMA_HOST                IP Address for the ollama server (default 127.0.0.1:11434)
      OLLAMA_CONTEXT_LENGTH      Context length to use unless otherwise specified (default: 4096)
      OLLAMA_KEEP_ALIVE          The duration that models stay loaded in memory (default "5m")
      OLLAMA_MAX_LOADED_MODELS   Maximum number of loaded models per GPU
      OLLAMA_MAX_QUEUE           Maximum number of queued requests
      OLLAMA_MODELS              The path to the models directory
      OLLAMA_NUM_PARALLEL        Maximum number of parallel requests
      OLLAMA_NOPRUNE             Do not prune model blobs on startup
      OLLAMA_ORIGINS             A comma separated list of allowed origins
      OLLAMA_SCHED_SPREAD        Always schedule model across all GPUs
      OLLAMA_FLASH_ATTENTION     Enabled flash attention
      OLLAMA_KV_CACHE_TYPE       Quantization type for the K/V cache (default: f16)
      OLLAMA_LLM_LIBRARY         Set LLM library to bypass autodetection
      OLLAMA_GPU_OVERHEAD        Reserve a portion of VRAM per GPU (bytes)
      OLLAMA_LOAD_TIMEOUT        How long to allow model loads to stall before giving up (default "5m")

Gin作为ollama底层服务器，为下载（拉取）的模型添加 API 层。无论是命令行方式还是其他方式使用大模型推理的服务， Gin 服务器默认运行在11434端口上 4). ollama create --help

PS D:\ollama源码-go> ollama create --help
Create a model

Usage:
  ollama create MODEL [flags]
Flags:
  -f, --file string       Name of the Modelfile (default "Modelfile")
  -h, --help              help for create
  -q, --quantize string   Quantize model to this level (e.g. q4_K_M)

Environment Variables:
      OLLAMA_HOST                IP Address for the ollama server (default 127.0.0.1:11434)

用ollama create 这个命令可以根据下载的模型，修改一些模型参数。例如：你可以设定一个参数 PARAMETER temperature 0.1 , 指定要利用哪个模型 From smollm2:135m

1) 创建类似的modelfile文件是下面这样，注意modelfile文件名称没有特殊要求，我起名模型的名称为 : smollm2:135m-16k-ctx-test，modelfile名称为: modelfile-s

最好里面没有中文，否则会出现问题。

FROM smollm2:135m  
PARAMETER temperature 0.1  
PARAMETER num_ctx 2048

2). 创建模型
执行上面的内容执行类似命令如：
ollama create smollm2:135m-2k-ctx-test -f modelfile-s

执行如下过程.....
 PS D:\ollama源码-go>  ollama create smollm2:135m-2k-ctx-test  -f ./modelfile-s
gathering model components
using existing layer sha256:f535f83ec568d040f88ddc04a199fa6da90923bbb41d4dcaed02caa924d6ef57
using existing layer sha256:fbacade46b4da804e0398c339c64b944d4b954452adf77cf050b49420116129e
using existing layer sha256:d502d55c1d609104ae6127aee92eb940e51e15c56dfb26dbd067e2771ee746f1
using existing layer sha256:58d1e17ffe5109a7ae296caafcadfdbe6a7d176f0bc4ab01e12a689b0499d8bd
creating new layer sha256:d2054dded080375cb7e97e9bf7938bcd54e591897508760466879a25a0a179fc
writing manifest
success
PS D:\ollama源码-go> ollama list
NAME                        ID              SIZE      MODIFIED
smollm2:135m-2k-ctx-test    57be70d27195    270 MB    29 seconds ago
smollm2:135m                9077fe9d2ae1    270 MB    37 minutes ago

PS D:\ollama源码-go> ollama run  smollm2:135m-2k-ctx-test
>>> /show info
  Model
    architecture        llama
    parameters          134.52M
    context length      8192
    embedding length    576
    quantization        F16

  Capabilities
    completion

  Parameters
    num_ctx        2048
    stop           "<|im_start|>"
    stop           "<|im_end|>"
    temperature    0.2

对比之前的:
Use """ to begin a multi-line message.

>>> /show info
  Model
    architecture        llama
    parameters          134.52M
    context length      8192
    embedding length    576
    quantization        F16

  Capabilities
    completion

  Parameters
    stop    "<|im_start|>"
    stop    "<|im_end|>"

当然你也可以从cli api方式执行

#在cygwin 终端上执行
curl http://localhost:11434/api/create -d '{ \
  "model": "mymodel", \
  "from": "qwen3-vl:2b",\
  "system": "我的模型" \
}'
PS D:\ollama源码-go> ollama show mymodel:latest
  Model
    architecture        qwen3vl
    parameters          2.1B
    context length      262144
    embedding length    2048
    quantization        Q4_K_M

  Capabilities
    completion
    vision
    tools
    thinking

  Parameters
    top_k          20
    top_p          0.95
    temperature    1

  System
    我的模型

3)cp 模型

PS D:\ollama源码-go> ollama cp smollm2:135m-2k-ctx-test  smollm2:135m-2k-ctx-test-cp
copied 'smollm2:135m-2k-ctx-test' to 'smollm2:135m-2k-ctx-test-cp'
PS D:\ollama源码-go> ollama ls
NAME                           ID              SIZE      MODIFIED
smollm2:135m-2k-ctx-test-cp    57be70d27195    270 MB    8 seconds ago
smollm2:135m-2k-ctx-test       57be70d27195    270 MB    59 minutes ago
smollm2:135m                   9077fe9d2ae1    270 MB    2 hours ago
nomic-embed-text:latest        0a109f422b47    274 MB    2 hours ago
mymodel:latest                 b00fc97c00a3    1.9 GB    17 hours ago
qwen3-vl:2b                    0635d9d857d4    1.9 GB    17 hours ago
gemma3:latest                  a2af6cc3eb7f    3.3 GB    17 hours ago
llama3:latest                  365c0bd3c000    4.7 GB    25 hours ago
deepseek-r1:8b                 6995872bfe4c    5.2 GB    38 hours ago
gemma3:4b                      a2af6cc3eb7f    3.3 GB    3 weeks ago

4) rm 模型

PS D:\ollama源码-go> ollama rm smollm2:135m-2k-ctx-test-cp
deleted 'smollm2:135m-2k-ctx-test-cp'
PS D:\ollama源码-go> ollama ls
NAME                        ID              SIZE      MODIFIED
smollm2:135m-2k-ctx-test    57be70d27195    270 MB    About an hour ago
smollm2:135m                9077fe9d2ae1    270 MB    2 hours ago
nomic-embed-text:latest     0a109f422b47    274 MB    2 hours ago
mymodel:latest              b00fc97c00a3    1.9 GB    17 hours ago
qwen3-vl:2b                 0635d9d857d4    1.9 GB    17 hours ago
gemma3:latest               a2af6cc3eb7f    3.3 GB    17 hours ago
llama3:latest               365c0bd3c000    4.7 GB    25 hours ago
deepseek-r1:8b              6995872bfe4c    5.2 GB    38 hours ago
gemma3:4b                   a2af6cc3eb7f    3.3 GB    3 weeks ago
里面不在有 smollm2:135m-2k-ctx-test-cp 该模型

二、运行smollm2:135m 1.这个模型相对比较小，占用内存也很少，也容易测试， ollama run smollm2:135m

PS D:\ollama源码-go> ollama run smollm2:135m
>>> /show info
  Model
    architecture        llama
    parameters          134.52M
    context length      8192
    embedding length    576
    quantization        F16

  Capabilities
    completion

  Parameters
    stop    "<|im_start|>"
    stop    "<|im_end|>"

  System
    You are a helpful AI assistant named SmolLM, trained by Hugging Face

  License
    Apache License
    Version 2.0, January 2004
    ...  

  效果等同  ollama show  smollm2:135m

2. /? 命令

 PS D:\ollama源码-go> ollama run smollm2:135m-2k-ctx-test
  >>> /?
 Available Commands:
 /set Set session variables   #设置临时变量如  /set parameter num_ctx 2048
 /show Show model information  #显示模型的信息
 /load <model> Load a session or model 
 /save <model> Save your current session
 /clear Clear session context
 /bye Exit
 /?, /help Help for a command
 /? shortcuts Help for keyboard shortcuts

Use """ to begin a multi-line message.
注意: /set 实际还有很多，比如
/set think

3.退出模型
/bye 或 ctrl+d 快捷键

PS D:\ollama源码-go> ollama run smollm2:135m
>>> /bye
PS D:\ollama源码-go>

4. 查看目前正在运行的模型，注意必须是正在交互着的模型，也就是模型响应阶段，而不是ollama run smollm2:135m 启动模型，如

PS D:\ollama源码-go> ollama run smollm2:135m
>>> what is python ?  （键入问题后，马上打开另一个dos窗口,执行ollama ps ，一旦模型响应完毕，ollama ps 你是看不到运行的一些数据的 ）
Python is an incredibly powerful and versatile programming language. It's widely used for web development, data analysis, game development,
artificial intelligence (AI), scientific computing, automation tasks, mobile app development, and more.
1. What is Python?
2. Its origins and development:
3. Basic syntax and features:
4. Popular libraries and frameworks:
5. Applications you can write in Python:
6. Best practices for writing good code:

5.输入的内容可以多行，请使用 """ 输入内容 """

PS D:\ollama源码-go> ollama run  smollm2:135m-2k-ctx-test
>>> """ what
... is
... python"""
Python is an interpreted programming language that was developed in the late 1980s and early 1990s. It has been widely used for a variety of
purposes including web development, data analysis, machine learning, artificial intelligence, and more.

6.多种方式调用模型

1.命令行方式直接交互
PS D:\ollama源码-go> ollama run  smollm2:135m-2k-ctx-test "what is python"

2.进入模型界面
PS D:\ollama源码-go> ollama run  smollm2:135m-2k-ctx-test
>>> what is python
3. 调用cli api方式

  curl http://localhost:11434/api/create -d '{ \
   "model": "mymodel", \
   "from": "qwen3-vl:2b",\
   "system": "我的模型" \
 }'

4. ollama python方式

  import ollama
  #ollama版本为0.21版本
  content = "请给我设计一个卡片。"
  #model为我本地的模型名称 qwen3-vl:2b
  response = ollama.chat(model = 'qwen3-vl:2b',
                       messages = [{'role': 'user', 'content': content}])
  result = response['message']['content']
  print(result)

7. 想查看某个命令，具体的其他参数

PS D:\ollama源码-go> ollama run  smollm2:135m-2k-ctx-test
>>> /?
Available Commands:
  /set            Set session variables
  /show           Show model information
  /load <model>   Load a session or model
  /save <model>   Save your current session
  /clear          Clear session context
  /bye            Exit
  /?, /help       Help for a command
  /? shortcuts    Help for keyboard shortcuts
#想查看某个命令，具体的其他参数
>>> /show
Available Commands:
  /show info         Show details for this model
  /show license      Show model license
  /show modelfile    Show Modelfile for this model
  /show parameters   Show parameters for this model
  /show system       Show system message
  /show template     Show prompt template

>>> /set
Available Commands:
/set parameter ... Set a parameter
/set system <string> Set system message
/set history Enable history
/set nohistory Disable history
/set wordwrap Enable wordwrap
/set nowordwrap Disable wordwrap
/set format json Enable JSON mode
/set noformat Disable formatting
/set verbose Show LLM stats
/set quiet Disable LLM stats
/set think Enable thinking
/set nothink Disable thinking

>>> /? shortcuts
Available keyboard shortcuts:
Ctrl + a Move to the beginning of the line (Home)
Ctrl + e Move to the end of the line (End)
Alt + b Move back (left) one word
Alt + f Move forward (right) one word
Ctrl + k Delete the sentence after the cursor
Ctrl + u Delete the sentence before the cursor
Ctrl + w Delete the word before the cursor

Ctrl + l Clear the screen
Ctrl + c Stop the model from responding
Ctrl + d Exit ollama (/bye)

参考：
　　　https://docs.ollama.com/modelfile
ollama api指南 https://github.com/ollama/ollama/blob/main/docs/api.md

posted @ 2025-12-02 09:15 jinzi 阅读(25) 评论(0) 收藏举报

刷新页面返回顶部