ollama 命令
本实验基于小模型 smollm2:135m ,下载只需要270MB,内存占用几百M ,运行也非常快。对于熟悉ollama会非常方便,不建议使用大模型测试
下载模型: ollama pull smollm2:135m
PS D:\ollama源码-go> ollama pull smollm2:135m
pulling manifest
pulling f535f83ec568: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 270 MB
pulling fbacade46b4d: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 68 B
pulling d502d55c1d60: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 675 B
pulling 58d1e17ffe51: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling f02dd72bb242: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 59 B
pulling b0f58c4c1a3c: 100% ▕██████████████████████████████████████████████████████████████████████████████████████▏ 561 B
verifying sha256 digest
writing manifest
success
列出本地模型 使用 ollama list 或 ollama ls
PS D:\ollama源码-go> ollama list NAME ID SIZE MODIFIED smollm2:135m-2k-ctx-test 57be70d27195 270 MB 45 minutes ago smollm2:135m 9077fe9d2ae1 270 MB About an hour ago nomic-embed-text:latest 0a109f422b47 274 MB 2 hours ago mymodel:latest b00fc97c00a3 1.9 GB 17 hours ago qwen3-vl:2b 0635d9d857d4 1.9 GB 17 hours ago gemma3:latest a2af6cc3eb7f 3.3 GB 17 hours ago llama3:latest 365c0bd3c000 4.7 GB 25 hours ago deepseek-r1:8b 6995872bfe4c 5.2 GB 38 hours ago gemma3:4b a2af6cc3eb7f 3.3 GB 3 weeks ago
一、 有多种指令以实现相对目标。要知道可以用 Ollama 执行哪些子命令,可以执行以下作:
1)ollama --help
PS D:\ollama源码-go> ollama --help Large language model runner Usage: ollama [flags] ollama [command] Available Commands: serve Start ollama #启动ollama,提供rest api服务 create Create a model #创建模型 show Show information for a model #显示模型信息 run Run a model #运行一个模型 stop Stop a running model #停止运行model pull Pull a model from a registry #下载一个模型从 ollama上 push Push a model to a registry #上传一个模型到ollama signin Sign in to ollama.com #注册ollama.com账号 signout Sign out from ollama.com #注销账号 list List models #列出本地下载的模型 ps List running models #列出正在运行的模型 cp Copy a model #复制一个模型 rm Remove a model #删除一个模型 help Help about any command Flags: -h, --help help for ollama -v, --version Show version information Use "ollama [command] --help" for more information about a command.
很多命令类似Docker
如果你需要特定的子命令你可以
2).ollama run --help
PS D:\ollama源码-go> ollama run --help Run a model Usage: ollama run MODEL [PROMPT] [flags] Flags: --dimensions int Truncate output embeddings to specified dimension (embedding models only) --format string Response format (e.g. json) -h, --help help for run --hidethinking Hide thinking output (if provided) --insecure Use an insecure registry --keepalive string Duration to keep a model loaded (e.g. 5m) --nowordwrap Don't wrap words to the next line automatically --think string[="true"] Enable thinking mode: true/false or high/medium/low for supported models --truncate For embedding models: truncate inputs exceeding context length (default: true). Set --truncate=false to error instead --verbose Show timings for response Environment Variables: OLLAMA_HOST IP Address for the ollama server (default 127.0.0.1:11434) OLLAMA_NOHISTORY Do not preserve readline history
2)ollama run 命令
Ollama 运行模型使用ollama run 命令。 执行ollama run 模型名称 ,该命令如果没有该模型会去下载对应模型: 等待下载完成后,我们在终端中,输入以下命令来加载模型并进行交互 例如: ollama run smollm2:135m ,如果你只是下载请用 ollama pull smollm2.135m
3).Ollama serve 命令
他会开始启动GIN服务器(go语言编写),他可以配置用很多环境变量,例如OLLAMA_DEBUG 来启用或关闭debug,OLLAMA_HOST来指定host,OLLAMA_MAX_QUEUE 来配置最大的队列请求
PS D:\ollama源码-go> ollama serve --help Start ollama Usage: ollama serve [flags] Aliases: serve, start Flags: -h, --help help for serve Environment Variables: OLLAMA_DEBUG Show additional debug information (e.g. OLLAMA_DEBUG=1) OLLAMA_HOST IP Address for the ollama server (default 127.0.0.1:11434) OLLAMA_CONTEXT_LENGTH Context length to use unless otherwise specified (default: 4096) OLLAMA_KEEP_ALIVE The duration that models stay loaded in memory (default "5m") OLLAMA_MAX_LOADED_MODELS Maximum number of loaded models per GPU OLLAMA_MAX_QUEUE Maximum number of queued requests OLLAMA_MODELS The path to the models directory OLLAMA_NUM_PARALLEL Maximum number of parallel requests OLLAMA_NOPRUNE Do not prune model blobs on startup OLLAMA_ORIGINS A comma separated list of allowed origins OLLAMA_SCHED_SPREAD Always schedule model across all GPUs OLLAMA_FLASH_ATTENTION Enabled flash attention OLLAMA_KV_CACHE_TYPE Quantization type for the K/V cache (default: f16) OLLAMA_LLM_LIBRARY Set LLM library to bypass autodetection OLLAMA_GPU_OVERHEAD Reserve a portion of VRAM per GPU (bytes) OLLAMA_LOAD_TIMEOUT How long to allow model loads to stall before giving up (default "5m")
Gin作为ollama底层服务器,为下载(拉取)的模型添加 API 层。无论是命令行方式还是其他方式使用大模型推理的服务, Gin 服务器默认运行在11434端口上
4). ollama create --help
PS D:\ollama源码-go> ollama create --help Create a model Usage: ollama create MODEL [flags] Flags: -f, --file string Name of the Modelfile (default "Modelfile") -h, --help help for create -q, --quantize string Quantize model to this level (e.g. q4_K_M) Environment Variables: OLLAMA_HOST IP Address for the ollama server (default 127.0.0.1:11434)
用ollama create 这个命令可以根据下载的模型,修改一些模型参数。 例如: 你可以 设定一个参数 PARAMETER temperature 0.1 , 指定要利用哪个模型 From smollm2:135m
1) 创建类似的modelfile文件是下面这样,注意modelfile文件名称没有特殊要求,我起名模型的名称为 : smollm2:135m-16k-ctx-test,modelfile名称为: modelfile-s
最好里面没有中文,否则会出现问题。
FROM smollm2:135m
PARAMETER temperature 0.1
PARAMETER num_ctx 2048
2). 创建模型
执行上面的内容执行类似命令如:
ollama create smollm2:135m-2k-ctx-test -f modelfile-s
执行如下过程.....
PS D:\ollama源码-go> ollama create smollm2:135m-2k-ctx-test -f ./modelfile-s
gathering model components
using existing layer sha256:f535f83ec568d040f88ddc04a199fa6da90923bbb41d4dcaed02caa924d6ef57
using existing layer sha256:fbacade46b4da804e0398c339c64b944d4b954452adf77cf050b49420116129e
using existing layer sha256:d502d55c1d609104ae6127aee92eb940e51e15c56dfb26dbd067e2771ee746f1
using existing layer sha256:58d1e17ffe5109a7ae296caafcadfdbe6a7d176f0bc4ab01e12a689b0499d8bd
creating new layer sha256:d2054dded080375cb7e97e9bf7938bcd54e591897508760466879a25a0a179fc
writing manifest
success
PS D:\ollama源码-go> ollama list
NAME ID SIZE MODIFIED
smollm2:135m-2k-ctx-test 57be70d27195 270 MB 29 seconds ago
smollm2:135m 9077fe9d2ae1 270 MB 37 minutes ago
PS D:\ollama源码-go> ollama run smollm2:135m-2k-ctx-test
>>> /show info
Model
architecture llama
parameters 134.52M
context length 8192
embedding length 576
quantization F16
Capabilities
completion
Parameters
num_ctx 2048
stop "<|im_start|>"
stop "<|im_end|>"
temperature 0.2
对比之前的:
Use """ to begin a multi-line message.
>>> /show info
Model
architecture llama
parameters 134.52M
context length 8192
embedding length 576
quantization F16
Capabilities
completion
Parameters
stop "<|im_start|>"
stop "<|im_end|>"
当然你也可以从cli api方式执行
#在cygwin 终端上执行 curl http://localhost:11434/api/create -d '{ \ "model": "mymodel", \ "from": "qwen3-vl:2b",\ "system": "我的模型" \ }' PS D:\ollama源码-go> ollama show mymodel:latest Model architecture qwen3vl parameters 2.1B context length 262144 embedding length 2048 quantization Q4_K_M Capabilities completion vision tools thinking Parameters top_k 20 top_p 0.95 temperature 1 System 我的模型
3)cp 模型
PS D:\ollama源码-go> ollama cp smollm2:135m-2k-ctx-test smollm2:135m-2k-ctx-test-cp copied 'smollm2:135m-2k-ctx-test' to 'smollm2:135m-2k-ctx-test-cp' PS D:\ollama源码-go> ollama ls NAME ID SIZE MODIFIED smollm2:135m-2k-ctx-test-cp 57be70d27195 270 MB 8 seconds ago smollm2:135m-2k-ctx-test 57be70d27195 270 MB 59 minutes ago smollm2:135m 9077fe9d2ae1 270 MB 2 hours ago nomic-embed-text:latest 0a109f422b47 274 MB 2 hours ago mymodel:latest b00fc97c00a3 1.9 GB 17 hours ago qwen3-vl:2b 0635d9d857d4 1.9 GB 17 hours ago gemma3:latest a2af6cc3eb7f 3.3 GB 17 hours ago llama3:latest 365c0bd3c000 4.7 GB 25 hours ago deepseek-r1:8b 6995872bfe4c 5.2 GB 38 hours ago gemma3:4b a2af6cc3eb7f 3.3 GB 3 weeks ago
4) rm 模型
PS D:\ollama源码-go> ollama rm smollm2:135m-2k-ctx-test-cp deleted 'smollm2:135m-2k-ctx-test-cp' PS D:\ollama源码-go> ollama ls NAME ID SIZE MODIFIED smollm2:135m-2k-ctx-test 57be70d27195 270 MB About an hour ago smollm2:135m 9077fe9d2ae1 270 MB 2 hours ago nomic-embed-text:latest 0a109f422b47 274 MB 2 hours ago mymodel:latest b00fc97c00a3 1.9 GB 17 hours ago qwen3-vl:2b 0635d9d857d4 1.9 GB 17 hours ago gemma3:latest a2af6cc3eb7f 3.3 GB 17 hours ago llama3:latest 365c0bd3c000 4.7 GB 25 hours ago deepseek-r1:8b 6995872bfe4c 5.2 GB 38 hours ago gemma3:4b a2af6cc3eb7f 3.3 GB 3 weeks ago
里面不在有 smollm2:135m-2k-ctx-test-cp 该模型
二、运行smollm2:135m
1.这个模型相对比较小,占用内存也很少,也容易测试, ollama run smollm2:135m
PS D:\ollama源码-go> ollama run smollm2:135m >>> /show info Model architecture llama parameters 134.52M context length 8192 embedding length 576 quantization F16 Capabilities completion Parameters stop "<|im_start|>" stop "<|im_end|>" System You are a helpful AI assistant named SmolLM, trained by Hugging Face License Apache License Version 2.0, January 2004 ... 效果等同 ollama show smollm2:135m
2. /? 命令
PS D:\ollama源码-go> ollama run smollm2:135m-2k-ctx-test >>> /? Available Commands: /set Set session variables #设置临时变量如 /set parameter num_ctx 2048 /show Show model information #显示模型的信息 /load <model> Load a session or model /save <model> Save your current session /clear Clear session context /bye Exit /?, /help Help for a command /? shortcuts Help for keyboard shortcuts Use """ to begin a multi-line message.
注意: /set 实际还有很多,比如
/set think
3.退出模型
/bye 或 ctrl+d 快捷键
PS D:\ollama源码-go> ollama run smollm2:135m >>> /bye PS D:\ollama源码-go>
4. 查看目前正在运行的模型,注意必须是正在交互着的模型,也就是模型响应阶段,而不是ollama run smollm2:135m 启动模型,如
PS D:\ollama源码-go> ollama run smollm2:135m >>> what is python ? (键入问题后,马上打开另一个dos窗口,执行ollama ps ,一旦模型响应完毕,ollama ps 你是看不到运行的一些数据的 ) Python is an incredibly powerful and versatile programming language. It's widely used for web development, data analysis, game development, artificial intelligence (AI), scientific computing, automation tasks, mobile app development, and more. 1. What is Python? 2. Its origins and development: 3. Basic syntax and features: 4. Popular libraries and frameworks: 5. Applications you can write in Python: 6. Best practices for writing good code:
5.输入的内容可以多行,请使用 """ 输入内容 """
PS D:\ollama源码-go> ollama run smollm2:135m-2k-ctx-test >>> """ what ... is ... python""" Python is an interpreted programming language that was developed in the late 1980s and early 1990s. It has been widely used for a variety of purposes including web development, data analysis, machine learning, artificial intelligence, and more.
6.多种方式调用模型
1.命令行方式直接交互 PS D:\ollama源码-go> ollama run smollm2:135m-2k-ctx-test "what is python" 2.进入模型界面 PS D:\ollama源码-go> ollama run smollm2:135m-2k-ctx-test >>> what is python 3. 调用cli api方式
curl http://localhost:11434/api/create -d '{ \
"model": "mymodel", \
"from": "qwen3-vl:2b",\
"system": "我的模型" \
}'
4. ollama python方式
import ollama
#ollama版本为0.21版本
content = "请给我设计一个卡片。"
#model为我本地的模型名称 qwen3-vl:2b
response = ollama.chat(model = 'qwen3-vl:2b',
messages = [{'role': 'user', 'content': content}])
result = response['message']['content']
print(result)
7. 想查看某个命令,具体的其他参数
PS D:\ollama源码-go> ollama run smollm2:135m-2k-ctx-test >>> /? Available Commands: /set Set session variables /show Show model information /load <model> Load a session or model /save <model> Save your current session /clear Clear session context /bye Exit /?, /help Help for a command /? shortcuts Help for keyboard shortcuts #想查看某个命令,具体的其他参数 >>> /show Available Commands: /show info Show details for this model /show license Show model license /show modelfile Show Modelfile for this model /show parameters Show parameters for this model /show system Show system message /show template Show prompt template
>>> /set
Available Commands:
/set parameter ... Set a parameter
/set system <string> Set system message
/set history Enable history
/set nohistory Disable history
/set wordwrap Enable wordwrap
/set nowordwrap Disable wordwrap
/set format json Enable JSON mode
/set noformat Disable formatting
/set verbose Show LLM stats
/set quiet Disable LLM stats
/set think Enable thinking
/set nothink Disable thinking
>>> /? shortcuts
Available keyboard shortcuts:
Ctrl + a Move to the beginning of the line (Home)
Ctrl + e Move to the end of the line (End)
Alt + b Move back (left) one word
Alt + f Move forward (right) one word
Ctrl + k Delete the sentence after the cursor
Ctrl + u Delete the sentence before the cursor
Ctrl + w Delete the word before the cursor
Ctrl + l Clear the screen
Ctrl + c Stop the model from responding
Ctrl + d Exit ollama (/bye)
参考:
https://docs.ollama.com/modelfile
ollama api指南 https://github.com/ollama/ollama/blob/main/docs/api.md

浙公网安备 33010602011771号