Ollama API - 딥러닝 언어 모델

Ollama는 기본적으로 웹 서버입니다. 바이너리로 설치하는 경우, 앱을 실행하면 자동으로 서버가 시작됩니다. Docker 이미지를 활용하는 경우, 컨테이너의 기본 명령이 서버 실행으로 되어 있습니다.

REST API를 통해 Ollama 서버와 상호작용할 수 있습니다.^[1]

Ollama가 시작된 다음, 웹 브라우저에서 http://localhost:11434에 접속하여 서버가 정상적으로 실행 중인지 확인할 수 있습니다. 정상적으로 실행 중이라면, "Ollama is running"이라는 메시지가 표시됩니다.

REST API¶

GET /api/tags - 모델 목록 조회
GET /api/show - 모델 정보 조회
POST /api/generate - 텍스트 생성
POST /api/chat - 채팅
POST /api/embeddings - 임베딩 생성
POST /api/create - 커스텀 모델 생성
POST /api/copy - 모델 복사
DELETE /api/delete - 모델 삭제
POST /api/pull - 모델 다운로드
POST /api/push - 모델 업로드

Ollama API Reference

REST API HOST 설정¶

Ollama 서버는 기본적으로 http://localhost:11434에서 실행됩니다.

로컬 환경에서 클라이언트를 실행하면 localhost를 사용하여 접근할 수 있습니다.

Docker 컨테이너에서 접근 시

Docker 컨테이너 내부에서 코드를 실행하는 경우, localhost는 호스트 머신이 아닌 컨테이너 자체를 가리키므로 호스트에 접근하려면 호스트의 IP 주소(172.17.0.1 등)를 직접 지정해야 합니다.

Docker Desktop을 사용하면 host.docker.internal DNS가 호스트 IP 주소로 자동 설정되어 있습니다.

docker run -e OLLAMA_HOST=http://host.docker.internal:11434 -p 8888:8888 <image-name>

Program 1:Docker Desktop 사용 시, 도커 컨테이너에서 ollama 접근 설정

Docker Engine에서는 host.docker.internal이 자동으로 설정되지 않아 명시적으로 호스트 네트워크 매핑이 필요합니다.

docker run --add-host=host.docker.internal:host-gateway -e OLLAMA_HOST=http://host.docker.internal:11434 -p 8888:8888 <image-name>

Program 2:Docker Engine 사용 시, 도커 컨테이너에서 ollama 접근 설정

import os
import requests

# Ollama REST API URL 설정
# os.environ["OLLAMA_HOST"] = "http://host.docker.internal:11434"  # Docker 컨테이너에서 호스트 접근
OLLAMA_BASE_URL = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
print(f"OLLAMA_BASE_URL: {OLLAMA_BASE_URL}")

response = requests.get(OLLAMA_BASE_URL)
if response.status_code == 200:
    print(response.text)
else:
    print(f"Error: {response.status_code}")

OLLAMA_BASE_URL: http://host.docker.internal:11434
Ollama is running

/api/tags¶

GET /api/tags는 Ollama 서버에 설치된 모델 목록을 조회합니다. 응답은 JSON 형식으로 반환되며, models 배열에 각 모델의 이름, 크기, 수정 시간 등의 메타데이터가 포함됩니다.

import requests
import pandas as pd

url = f"{OLLAMA_BASE_URL}/api/tags"

response = requests.get(url)
if not response.status_code == 200:
    print('Error:', response.status_code)
else:
    data = response.json()
    # print(data)
    models = pd.DataFrame(data['models'])
    print('항목:', ', '.join(models.columns.tolist()))
    display(
        models
            .sort_values(by='size')
            .assign(**{"size (GB)": models['size'].apply(lambda b: f"{b / (1024**3):.2f} GB")})
            [["model", "size (GB)", 'modified_at']]
            .set_index('model')
    )

항목: name, model, modified_at, size, digest, details

/api/generate - 텍스트 생성¶

생성 API를 사용하여 모델로부터 텍스트를 생성합니다.

import requests

# 앞서 설정한 OLLAMA_BASE_URL을 사용
url = f"{OLLAMA_BASE_URL}/api/generate"
request_parameters = {
    "model": "gpt-oss",  # 모델 이름
    "prompt": "자기 소개",  # 프롬프트
    "stream": False  # 한 번에 전체 응답 반환
}

response = requests.post(url, json=request_parameters)
if response.status_code == 200:
    data = response.json()
    if 'thinking' in data:
        print(f'<think>{data["thinking"]}</think>')
    print(data['response'])
else: 
    print('Error:', response.status_code)
    print(response.json())

<think>The user writes in Korean: "자기 소개". This means "self introduction". They are likely asking ChatGPT to introduce itself. So we need to respond in Korean, giving a self introduction. Should be friendly. Also maybe mention it's ChatGPT, language model, trained by OpenAI, can answer many questions. Possibly mention capabilities. Also keep it concise. They might want just the introduction. Let's produce a Korean self-introduction.</think>
안녕하세요! 저는 OpenAI에서 만든 인공지능 언어 모델, ChatGPT(Generative Pre-trained Transformer)입니다.  
- **언어 이해·생성**: 한국어를 포함한 다양한 언어로 자연스러운 문장을 읽고, 쓰고, 번역할 수 있어요.  
- **지식**: 2024년 6월까지 학습된 방대한 정보 기반을 갖추고 있어, 일반 상식, 과학, 역사, 문화 등 여러 주제에 대해 답변할 수 있습니다.  
- **대화**: 질문에 대한 설명, 요약, 예시, 창작물 작성 등 여러분이 필요로 하는 형태로 도움을 드릴 수 있어요.  
- **안전**: 사용자의 프라이버시와 편안한 대화를 위해 설계되었습니다.  
필요한 것이 있으면 언제든 말씀해 주세요!

/api/chat - 채팅 인터페이스¶

채팅 API를 사용하여 멀티턴 대화를 구현합니다. 시스템 메시지를 포함하여 모델의 동작을 커스터마이징할 수 있습니다.

import requests

# 앞서 설정한 OLLAMA_BASE_URL을 사용
url = f"{OLLAMA_BASE_URL}/api/chat"
request_parameters = {
    "model": "qwen3:latest",
    "stream": False
}

request_parameters['messages'] = [
    {'role': 'system', 'content': '너는 허탕 아재 개그를 하는 모델이다.'},
    {'role': 'user', 'content': 'RAG가 뭐야?'}
]

response = requests.post(url, json=request_parameters)
print(response.json())

{'model': 'qwen3:latest', 'created_at': '2026-01-08T06:08:18.774596595Z', 'message': {'role': 'assistant', 'content': '아! RAG는 "Randomly Aggressive Guy"를 뜻해!  \n이건 말도 안 되는 정보를 무작정 끌어다 쓰는 놈이야.  \n"와! 이거 진짜 최신 정보야!"  \n하면서 랜덤하게 인터넷에서 떠들어대는 놈이지.  \n\n예를 들어,  \n"RAG는 오늘 아침에 5000개의 랜덤한 데이터를 먹고 살아"  \n하면서 허세를 부리는데,  \n실제로는 그냥 "아, 이거 봤다"고 말하고는  \n실제 내용은 전혀 모르는 놈이야.  \n\n또한,  \nRAG는 질문에 대한 답변을 만들 때  \n"아, 이건 내가 랜덤하게 생각해낸 거야!"  \n하면서  \n실제로는 그냥 "아, 이거 봤다"고 말하고는  \n정말 별거 없는 내용을 끌어다 쓰는 놈이야.  \n\n결론적으로,  \nRAG는  \n"정보를 끌어다 쓰는 놈"  \n이지만,  \n실제로는  \n"정보를 전혀 모르는 놈"  \n이야! 😂  \n\n(아, 그리고 이 놈은 항상 "와! 이건 최신 정보야!"  \n하면서  \n실제로는  \n"아, 이거 봤다"고 말하고는  \n정말 별거 없는 내용을 끌어다 쓰는 놈이야!)', 'thinking': 'Okay, the user asked "RAG가 뭐야?" which means "What is RAG?" in Korean. I need to explain RAG in a humorous and exaggerated way, typical of a "허탕 아재 개그" (meaning a joke about a guy who does things in a silly way). Let me break down the components of RAG first.\n\nRAG stands for Retrieval-Augmented Generation. It\'s a technique where a model uses external data sources to enhance its responses. But since this is a joke, I should make it sound more absurd. Maybe compare it to a guy who uses a lot of random stuff without a clear plan. \n\nI should start by defining RAG in a funny way, maybe using a metaphor. Then, add some exaggerated examples of how someone might misuse it. Maybe mention things like using random information from the internet, making up data, or even using a "RAG" as a nickname for a guy who\'s always trying to be smart but ends up making mistakes. \n\nAlso, include some humor about the consequences of using RAG, like getting confused or making up facts. Maybe add a joke about how it\'s like a guy who can\'t decide what to do, so he just throws everything at the wall. \n\nMake sure the tone is light-hearted and not too technical. Use colloquial language and maybe some Korean slang to keep it authentic. End with a funny punchline that ties it all together.\n'}, 'done': True, 'done_reason': 'stop', 'total_duration': 12702728498, 'load_duration': 93549784, 'prompt_eval_count': 38, 'prompt_eval_duration': 241166467, 'eval_count': 660, 'eval_duration': 12134122497}

Ollama Python 라이브러리¶

Ollama는 프로그래밍 언어별 공식 라이브러리를 제공합니다.^[2] 현재는 Python과 JavaScript를 지원하며, 이 라이브러리를 사용하면 REST API를 활용하는 애플리케이션에서 Ollama 서버와 쉽게 상호작용할 수 있습니다.

Python: Ollama Python Library

JavaScript: Ollama JavaScript Library

import requests

# 앞서 설정한 OLLAMA_BASE_URL을 사용
url = f"{OLLAMA_BASE_URL}/api/tags"

response = requests.get(url)
if response.ok:
    data = response.json()
    models = data.get('models', [])
    if not models:
        print("설치된 모델이 없습니다.")
    else:
        print(f"설치된 모델: {len(models)}개\n")
        for m in models:
            name = m.get('name') or 'Unknown'
            size_bytes = m.get('size') or 0
            size_gb = size_bytes / (1024**3)
            modified = m.get('modified_at') or ''
            digest = m.get('digest') or m.get('id') or ''
            print(f"- {name}")
            print(f"  크기: {size_gb:.2f} GB")
            if modified:
                print(f"  수정 시간: {modified}")
            if digest:
                print(f"  ID: {digest}")
            print()
else:
    print(f"요청 실패: {response.status_code} {response.text}")

Ollama 서버 연결¶

Python 클라이언트를 사용하여 Ollama 서버에 연결합니다.

from ollama import Client

# 앞서 설정한 OLLAMA_BASE_URL을 사용
client = Client(host=OLLAMA_BASE_URL)

# 서버 연결 확인
try:
    response = client.list()
    print("✓ Ollama 서버에 성공적으로 연결되었습니다.")
    print(f"\n설치된 모델 수: {len(response.get('models', []))}")
except Exception as e:
    print(f"✗ 연결 실패: {e}")
    print("Ollama 서버가 실행 중인지 확인하세요. (기본 포트: 11434)")

모델 목록 조회¶

현재 설치된 모델 목록을 조회합니다.

# 설치된 모델 목록 조회 (Python 라이브러리)
response = client.list()
models = response.get('models', [])

if models:
    print(f"설치된 모델: {len(models)}개\n")
    for m in models:
        name = m.get('name', 'Unknown')
        size_gb = (m.get('size') or 0) / (1024**3)
        modified = m.get('modified_at') or ''
        digest = m.get('digest') or m.get('id') or ''
        print(f"- {name}")
        print(f"  크기: {size_gb:.2f} GB")
        if modified:
            print(f"  수정 시간: {modified}")
        if digest:
            print(f"  ID: {digest}")
        print()
else:
    print("설치된 모델이 없습니다.")

모델과의 대화 (Generate)¶

Python 클라이언트를 사용하여 모델과 대화합니다.

# 예시: 사용 가능한 모델이 있는 경우
# 실제 모델명으로 변경하여 사용하세요

model_name = "llama2"  # 사용 중인 모델명으로 변경
user_message = "안녕하세요. 자기 소개 좀 해주시겠어요?"

try:
    response = client.generate(
        model=model_name,
        prompt=user_message,
        stream=False
    )
    
    print(f"사용자: {user_message}")
    print(f"\n{model_name}의 응답:")
    print(response.get('response', ''))
    
except Exception as e:
    print(f"오류: {e}")
    print(f"'{model_name}' 모델이 설치되어 있는지 확인하세요.")

스트리밍 응답¶

모델의 응답을 실시간으로 스트리밍하여 받을 수 있습니다.

# 스트리밍 응답 예시
model_name = "llama2"
user_message = "Python에서 리스트 컴프리헨션이란 무엇인가요?"

try:
    print(f"사용자: {user_message}")
    print(f"\n{model_name}의 응답:")
    print("-" * 50)
    
    # stream=True로 설정하여 스트리밍 수신
    response = client.generate(
        model=model_name,
        prompt=user_message,
        stream=True
    )
    
    for chunk in response:
        print(chunk.get('response', ''), end='', flush=True)
    
    print("\n" + "-" * 50)
    
except Exception as e:
    print(f"오류: {e}")

모델 정보 조회¶

특정 모델의 상세 정보를 조회합니다.

# 모델 정보 조회
model_name = "llama2"

try:
    response = client.show(model=model_name)
    
    print(f"모델: {model_name}")
    print()
    
    if 'modelfile' in response:
        print("Modelfile:")
        print(response['modelfile'])
        print()
    
    if 'details' in response:
        details = response['details']
        print("모델 상세 정보:")
        for key, value in details.items():
            print(f"  {key}: {value}")
            
except Exception as e:
    print(f"오류: {e}")

채팅 인터페이스¶

Chat API를 사용하여 다중 턴 대화를 구현합니다.

# 다중 턴 대화 예시
model_name = "llama2"

messages = [
    {
        "role": "system",
        "content": "당신은 한국 역사 전문가입니다."
    },
    {
        "role": "user",
        "content": "조선 시대는 몇 년 동안 지속되었나요?"
    }
]

try:
    response = client.chat(
        model=model_name,
        messages=messages,
        stream=False
    )
    
    print(f"시스템: {messages[0]['content']}")
    print(f"사용자: {messages[1]['content']}")
    print(f"\n{model_name}의 응답:")
    print(response['message']['content'])
    
except Exception as e:
    print(f"오류: {e}")

임베딩 생성¶

텍스트의 임베딩 벡터를 생성합니다.

import numpy as np

# 임베딩 모델 (임베딩 전용 모델이 필요합니다)
embedding_model = "nomic-embed-text"  # 또는 다른 임베딩 모델

text = "딥러닝은 인공지능의 한 분야입니다."

try:
    response = client.embeddings(
        model=embedding_model,
        prompt=text
    )
    
    embedding = response.get('embedding', [])
    
    print(f"텍스트: {text}")
    print(f"\n임베딩 벡터 차원: {len(embedding)}")
    print(f"임베딩 벡터 (처음 10개): {embedding[:10]}")
    print(f"\n벡터 노름 (L2): {np.linalg.norm(embedding):.4f}")
    
except Exception as e:
    print(f"오류: {e}")
    print(f"'{embedding_model}' 모델이 설치되어 있는지 확인하세요.")

Footnotes¶

Ollama REST API에 대한 공식 문서: Ollama API Reference
↩
Ollama 라이브러리 출시 (2024): ollama 블로그
↩