6 Ways For Running A Local LLM

https://semaphoreci.com/blog/local-llm

1. Hugging Face and Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch


tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium", padding_side='left')
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

# source: https://huggingface.co/microsoft/DialoGPT-medium

# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

    # pretty print last output tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

2. LangChain

from langchain.llms.huggingface_pipeline import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="microsoft/DialoGPT-medium", task="text-generation", pipeline_kwargs={"max_new_tokens": 200, "pad_token_id": 50256},
)

from langchain.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | hf

question = "What is electroencephalography?"

print(chain.invoke({"question": question})

3. Llama.cpp

Llama.cpp is a C and C++ based inference engine for LLMs, optimized for Apple silicon and running Meta’s Llama2 models.

Once we clone the repository and build the project, we can run a model with:
$ ./main -m /path/to/model-file.gguf -p "Hi there!"

4. Llamafile

Llamafile, developed by Mozilla, offers a user-friendly alternative for running LLMs. Llamafile is known for its portability and the ability to create single-file executables.

Once we download llamafile and any GGUF-formatted model, we can start a local browser session with:
$ ./llamafile -m /path/to/model.gguf

5. Ollama

Ollama is a more user-friendly alternative to Llama.cpp and Llamafile. You download an executable that installs a service on your machine. Once installed, you open a terminal and run:
$ ollama run llama2
Ollama will download the model and start an interactive session.

6. GPT4ALL

GPT4ALL is an easy-to-use desktop application with an intuitive GUI. It supports local model running and offers connectivity to OpenAI with an API key. It stands out for its ability to process local documents for context, ensuring privacy.

posted @ 2024-05-01 19:15 lightsong 阅读(37) 评论(0) 收藏举报

刷新页面返回顶部

Stay Hungry,Stay Foolish!

lightsong

{Web: [React, Vue, NodeJS, HTTP]，DevOps:[Jenkins,Docker,K8S], Languages:[Python, JS, C, Lua, Shell, Groovy]}, AI:[LLM, langchain，langraph]

6 Ways For Running A Local LLM

6 Ways For Running A Local LLM

1. Hugging Face and Transformers

2. LangChain

3. Llama.cpp

4. Llamafile

5. Ollama

6. GPT4ALL

公告