用 API 运行 Llama2

通过llama.cpp与羊驼聊天的网页界面- 详解 Serge 的启动使用

 

Llama 2 is a language model from Meta AI. It’s the first open source language model of the same caliber as OpenAI’s models.

With Replicate, you can run Llama 2 in the cloud with one line of code.

Contents

Running Llama 2 with JavaScript

You can run Llama 2 with our official JavaScript client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

const output = await replicate.run(
  "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
  {
    input: {
      prompt:
        "Write a poem about open source machine learning in the style of Mary Oliver.",
    },
  }
);
 

Running Llama 2 with Python

You can run Llama 2 with our official Python client:

import replicate
output = replicate.run(
    "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
    input={"prompt": ...}
)
# The replicate/llama-2-70b-chat model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    print(item)
 

Running Llama 2 with cURL

Your can call the HTTP API directly with tools like cURL:

curl -s -X POST \
  -d '{"version": "2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1", "input": {"prompt": "Write a poem..."}}' \
  -H "Authorization: Token $REPLICATE_API_TOKEN" \
  "https://api.replicate.com/v1/predictions"
 

You can also run Llama using other Replicate client libraries for Golang, Swift, Elixir, and others.

Choosing which model to use

There are four variant Llama 2 models on Replicate, each with their own strengths:

  • replicate/llama-2-70b-chat: 70 billion parameter model fine-tuned on chat completions. If you want to build a chat bot with the best accuracy, this is the one to use.
  • replicate/llama-2-70b: 70 billion parameter base model. Use this if you want to do other kinds of language completions, like completing a user’s writing.
  • a16z-infra/llama-2-13b-chat: 13 billion parameter model fine-tuned on chat completions. Use this if you’re building a chat bot and would prefer it to be faster and cheaper at the expense of accuracy.
  • a16z-infra/llama-2-7b-chat: 7 billion parameter model fine-tuned on chat completions. This is an even smaller, faster model.

What's the difference between these? Learn more in our blog post comparing 7b, 13b, and 70b.

Example chat app

If you want a place to start, we’ve built a demo chat app in Next.js that can be deployed on Vercel:

Take a look at the GitHub README to learn how to customize and deploy it.

Fine-tune Llama 2

Because Llama 2 is open source, you can train it on more data to teach it new things, or learn a particular style.

Replicate makes this easy. Take a look at our guide to fine-tune Llama 2.

Run Llama 2 locally

You can also run Llama 2 without an internet connection. We wrote a comprehensive guide to running Llama on your M1/M2 Mac, on Windows, on Linux, or even your phone.

Keep up to speed

Happy hacking! 🦙

 

Tool:Online ChatAi

Ref:https://replicate.com/blog/run-llama-2-with-an-api

posted on 2023-08-20 01:44  ercom  阅读(688)  评论(0)    收藏  举报