Ollama

Ollama는 로컬 컴퓨터나 온프레미스 환경에서 대형 언어 모델(LLM)을 쉽게 실행하고 관리할 수 있도록 도와주는 플랫폼입니다.

주요 기능¶

간편한 사용: 코드 작성 없이 공개 언어모델을 로컬 환경에서 실행하고 사용할 수 있습니다.
다양한 모델 지원: 여러 인기 있는 대형 언어 모델 (LLM)을 지원하며, 필요에 따라 모델을 선택하여 사용할 수 있습니다.
REST API 제공: 프로그래밍을 통해 다양한 애플리케이션에서 언어 모델을 활용할 수 있는 REST API를 제공합니다.
데이터 보안: 로컬 또는 내부망에서 모든 처리가 이루어지므로 민감한 정보가 외부로 전송되지 않아 데이터 보안을 유지할 수 있습니다.

설치¶

바이너리 설치¶

Ollama의 최신 버전을 ollama.com에서 다운로드하여 설치할 수 있습니다.

Windows와 Linux는 NVIDIA CUDA 및 AMD ROCm GPU를 지원합니다.^[1]^[2] macOS에서는 Apple Silicon에서 MPS(Metal Performance Shaders)를 통한 GPU 가속이 지원됩니다.^[3]

Ollama Docker¶

Ollama는 공식 Docker 이미지를 제공합니다. ollama/ollama (Docker Hub)

이미지는 다음과 같이 가져올 수 있습니다:

docker pull ollama/ollama:latest

Program 1:Ollama Docker 이미지 다운로드

Ollama를 Docker를 통해 설치하고 실행하는 방법은 다음과 같습니다.

기본 실행:

docker run --name ollama -p 11434:11434 -d ollama/ollama

Program 2:Ollama Docker 컨테이너 기본 실행

가장 간단한 방법으로, 컨테이너 이름을 ollama로 지정하고 포트 11434를 호스트와 연결하여 백그라운드(-d)로 실행합니다.

권장 설정:

docker run --name ollama \
  -p 11434:11434 \
  -v ollama:/root/.ollama \
  --gpus all \
  -d ollama/ollama

Program 3:Ollama Docker 컨테이너 권장 실행 (볼륨 마운트 및 GPU 지원)

프로덕션 환경이나 지속적으로 사용할 경우 권장하는 방법입니다:

-v ollama:/root/.ollama: 모델 데이터를 Docker 볼륨에 저장하여 컨테이너를 재시작해도 모델이 유지됩니다.
--gpus all: 컨테이너가 호스트 GPU를 사용할 수 있도록 합니다.^[4]

이후, Ollama 서버가 실행되면 API를 통해 모델을 호출할 수 있습니다.

인터페이스¶

명령줄 인터페이스 (CLI)¶

Ollama는 명령줄(CLI) 도구를 제공합니다. 명령줄 명령어로 모델을 관리하고 사용할 수 있습니다. 설치 후, ollama 명령어를 통해 다양한 작업을 수행할 수 있습니다.^[5]

명령줄 활용은 Windows에서는 PowerShell 또는 명령 프롬프트(Command Prompt), macOS와 Linux에서는 기본 터미널 앱을 사용하면 됩니다.^[6]

$ ollama --help
Large language model runner

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  signin      Sign in to ollama.com
  signout     Sign out from ollama.com
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.

Program 4:ollama 명령어 도움말 (ollama 0.13)

Program 4에서 볼 수 있듯이, ollama 명령어는 다양한 하위 명령어를 제공합니다.

주요 CLI 명령어¶

serve (start) - 서버 시작¶

앱이나 Docker 컨테이너를 시작할 때 자동으로 서버가 시작되지만, 수동으로 서버를 시작하려면 ollama serve 명령어를 사용할 수 있습니다. start는 serve의 별칭입니다.

$ ollama serve --help
Start ollama

Usage:
  ollama serve [flags]

Aliases:
  serve, start

Flags:
  -h, --help   help for serve

Environment Variables:
      OLLAMA_DEBUG               Show additional debug information (e.g. OLLAMA_DEBUG=1)
      OLLAMA_HOST                IP Address for the ollama server (default 127.0.0.1:11434)
      OLLAMA_CONTEXT_LENGTH      Context length to use unless otherwise specified (default: 4096)
      OLLAMA_KEEP_ALIVE          The duration that models stay loaded in memory (default "5m")
      OLLAMA_MAX_LOADED_MODELS   Maximum number of loaded models per GPU
      OLLAMA_MAX_QUEUE           Maximum number of queued requests
      OLLAMA_MODELS              The path to the models directory
      OLLAMA_NUM_PARALLEL        Maximum number of parallel requests
      OLLAMA_NOPRUNE             Do not prune model blobs on startup
      OLLAMA_ORIGINS             A comma separated list of allowed origins
      OLLAMA_SCHED_SPREAD        Always schedule model across all GPUs
      OLLAMA_FLASH_ATTENTION     Enabled flash attention
      OLLAMA_KV_CACHE_TYPE       Quantization type for the K/V cache (default: f16)
      OLLAMA_LLM_LIBRARY         Set LLM library to bypass autodetection
      OLLAMA_GPU_OVERHEAD        Reserve a portion of VRAM per GPU (bytes)
      OLLAMA_LOAD_TIMEOUT        How long to allow model loads to stall before giving up (default "5m")

Program 5:ollama serve 명령어 도움말 (ollama 0.13)

Program 5 출력을 통해 시작 시점의 환경 변수를 설정할 수 있음을 알 수 있습니다. 다양한 환경 변수를 통해 Ollama 서버의 초기값을 세밀하게 조정할 수 있습니다. 기본값이 아닌 값을 사용하려면, 서버 시작 전에 해당 환경 변수를 설정하면 됩니다.

OLLAMA_HOST 변수를 설정하여 서버가 바인딩될 IP 주소와 포트를 지정할 수 있습니다. 기본적으로 11434 포트의 로컬호스트(127.0.0.1)에 바인딩됩니다.

OLLAMA_KEEP_ALIVE 변수를 사용하여 모델이 메모리에 유지되는 시간을 조정할 수 있습니다. 기본적으로 5분으로 설정되어 있지만, 필요에 따라 이 값을 늘리거나 줄일 수 있습니다.

OLLAMA_MODELS 변수를 사용하여 모델이 저장되는 디렉토리를 지정할 수 있습니다. 기본값 대신, 다른 경로를 지정할 수도 있습니다. 예를 들어, Docker 컨테이너에서 Ollama를 실행하는 경우, 기본적으로 모델들은 /root/.ollama/models에 저장되는데, 이 경로를 /ollama/models로 설정할 수 있습니다.

OLLAMA_CONTEXT_LENGTH 변수를 사용하여 모델의 컨텍스트 길이 기본값을 설정할 수 있습니다.^[7]

list - 모델 목록 확인¶

ollama list 명령어를 사용하여 설치된 모델들을 확인할 수 있습니다. 모델들의 이름, ID, 크기, 수정된 날짜를 확인할 수 있습니다.

$ ollama list
NAME                     ID              SIZE      MODIFIED     
qwen3:latest             da1d68f9edac    5.2 GB    3 hours ago     
qwen3-coder:latest       ad67f85ca250    18 GB     3 months ago    
llama3.1:latest          46e0c10c039e    4.9 GB    3 months ago    
embeddinggemma:latest    85462619ee72    621 MB    3 months ago    
gemma3:latest            a2af6cc3eb7f    3.3 GB    3 months ago    
deepseek-r1:latest       6995872bfe4c    5.2 GB    3 months ago    
gpt-oss:latest           aa4295ac10c3    13 GB     3 months ago

Program 6:ollama list 명령어 출력 예시 (ollama 0.13)

그런데 Program 6 예시 출력과 달리, 처음 Ollama를 설치한 후에는 기본적으로 설치된 모델이 없을 수 있습니다. 모델들은 생성하거나 다운로드 받아야 합니다.

pull - 모델 다운로드¶

ollama pull 명령어를 사용하여 사전 훈련되어 등록된 모델을 다운로드하여 사용할 수 있게 Ollama 서버에 추가할 수 있습니다. pull 명령어는 두 가지 역할을 수행합니다: 첫째, 모델을 다운로드하고, 둘째, Ollama 서버에 모델을 등록합니다.

$ ollama pull gpt-oss

Program 7:ollama pull 명령어 예시 (ollama 0.13)

위 명령어는 ollama 모델을 다운로드합니다. 가능한 모델들은 ollama.com/library에서 확인할 수 있습니다. 다음과 같은 다양한 모델들이 제공됩니다:

점차 더 많은 모델들이 추가되고 있으니, 웹사이트에서 최신 정보를 확인하는 것이 좋습니다.

show - 모델 정보 조회¶

ollama show 명령어를 사용하여 특정 모델의 정보를 확인할 수 있습니다.

$ ollama show gpt-oss
  Model
    architecture        gptoss
    parameters          20.9B
    context length      131072
    embedding length    2880
    quantization        MXFP4

  Capabilities
    completion    
    tools
    thinking

  Parameters
    temperature    1

  License
    Apache License
    Version 2.0, January 2004
    ...

Program 8:ollama show 명령 예시 (ollama 0.13)

run - 모델 실행¶

ollama run 명령어를 사용하여 다운로드한 모델을 대화형으로 실행할 수 있습니다.

$ ollama run gpt-oss
>>> 자기 소개     
Thinking...
The user says "자기 소개" which means "self introduction" in Korean. They likely want the assistant to
introduce itself. So we should respond in Korean, introducing ChatGPT. The user didn't specify any further
instructions. We just give a self-introduction. Possibly also mention the assistant's capabilities, etc. Let's       
respond in Korean.
...done thinking.

안녕하세요! 저는 ChatGPT라고 합니다. OpenAI에서 개발한 대규모 언어 모델이라서, 텍스트 기반으로 다양한 주제에 대해    
 대화하고 정보를 제공해 드릴 수 있어요.

- **언어 이해·생성**: 한국어, 영어를 포함해 여러 언어를 이해하고 자연스러운 문장을 만들어 드립니다.
- **다양한 주제 대응**: 과학, 기술, 역사, 문화, 예술, 일상생활 등 거의 모든 분야의 질문에 답변할 수 있습니다.
- **문서 작성**: 에세이, 기사, 보고서, 이메일, 편지, 스크립트 등 다양한 문서 형식을 작성하거나 편집해 드립니다.
- **프로그래밍 도움**: 코드 예시 제공, 디버깅 팁, 개념 설명 등 프로그래밍 관련 도움을 드립니다.
- **학습 지원**: 복습, 문제 풀이, 개념 설명 등 학습에 필요한 자료와 팁을 제공해 드립니다.
- **창작**: 시, 소설, 스토리텔링, 아이디어 브레인스토밍 등 창작 활동을 도와드릴 수 있습니다.

궁금한 것이 있거나 도움이 필요하면 언제든 말씀해 주세요!
>>>

Program 9:ollama run 명령어 예시 (ollama 0.13)

대화형 모드에서는 /? 명령어를 통해 사용 가능한 명령어 목록을 확인할 수 있습니다.

>>> /?
Available Commands:
  /set            Set session variables
  /show           Show model information
  /load <model>   Load a session or model
  /save <model>   Save your current session
  /clear          Clear session context
  /bye            Exit
  /?, /help       Help for a command
  /? shortcuts    Help for keyboard shortcuts

Use """ to begin a multi-line message.

Program 10:ollama run 대화형 모드 도움말 (ollama 0.13)

/show 명령어는 현재 선택된 모델에 대한 서브 명령어 목록을 안내합니다.

>>> /show
Available Commands:
  /show info         Show details for this model
  /show license      Show model license
  /show modelfile    Show Modelfile for this model
  /show parameters   Show parameters for this model
  /show system       Show system message
  /show template     Show prompt template

Program 11:ollama run /show 명령어 목록 (ollama 0.13)

/show info는 모델 아키텍처, 파라미터 수, 컨텍스트 길이, 정량화 방식 등 핵심 메타 정보를 확인할 때 사용합니다.

>>> /show info
  Model
    architecture        gptoss
    parameters          20.9B
    context length      131072
    embedding length    2880
    quantization        MXFP4

  Capabilities
    completion
    tools
    thinking

  Parameters
    temperature    1

  License
    Apache License
    Version 2.0, January 2004
    ...

Program 12:ollama run /show info 출력 예시 (ollama 0.13)

/show template은 현재 모델이 사용하는 프롬프트 템플릿(시스템/도구 지시 포함)을 확인할 때 사용합니다.

>>> /show template
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ currentDate }}
{{- if and .IsThinkSet .Think (ne .ThinkLevel "") }}
...

Program 13:ollama run /show template 예시 (ollama 0.13)

ollama 모델 템플릿에 대한 상세한 내용은 Ollama 모델 템플릿에서 살펴볼 수 있으며, Ollama 공식 문서에서도 확인할 수 있습니다.^[8]

/show parameters는 모델에 설정된 하이퍼파라미터 값을 살펴볼 때 유용합니다.

>>> /show parameters
Model defined parameters:
  temperature                  1

Program 14:ollama run /show parameters 출력 예시 (ollama 0.13)

현재 선택된 모델의 시스템 프롬프트를 조회하는 예시입니다.

>>> /show system
No system message was specified for this model.

Program 15:ollama run /show system 예시

세션 옵션을 대화 중에 변경할 때는 /set 명령을 사용합니다. 아래는 대화 중 파라미터나 시스템 프롬프트를 설정/해제하는 명령 목록입니다.

/set
Available Commands:
  /set parameter ...     Set a parameter
  /set system <string>   Set system message
  /set history           Enable history
  /set nohistory         Disable history
  /set wordwrap          Enable wordwrap
  /set nowordwrap        Disable wordwrap
  /set format json       Enable JSON mode
  /set noformat          Disable formatting
  /set verbose           Show LLM stats
  /set quiet             Disable LLM stats
  /set think             Enable thinking
  /set nothink           Disable thinking

Program 16:ollama run /set 명령어 도움말

아래는 /set system으로 시스템 메시지를 변경하고, 대화 응답이 어떻게 달라지는지 확인한 실제 터미널 예시입니다.

>>> /set system 아재 개그로 응답하는 허탕 모델
Set system message.
>>> /show system
아재 개그로 응답하는 허탕 모델

>>> 자기 소개
Thinking...
...done thinking.

안녕하세용! 저는 ChatGPT입니다. 말이 많아서 "말GPT"라고도 불러요.
궁금한 거 물어보시면 아재 개그 한 스푼 얹어서 답해드립니다!

Program 17:ollama run /set system 예시

/set think으로 추론 강도를 조정한 예시입니다. 낮은 think 수준과 높은 think 수준에서 동일한 프롬프트에 대한 응답 차이를 확인할 수 있습니다.

>>> /set think low
Set 'think' mode to 'low'.
>>> 자기 소개
Thinking...
User repeats. Provide concise intro.
...done thinking.

안녕하세용! 저는 **ChatGPT**—“지식과 개그를 한 번에 전해주는 AI”랍니다.

>>> /set think high
Set 'think' mode to 'high'.
>>> 자기 소개
Thinking...
The user is repeatedly asking "자기 소개" (introduce yourself)
multiple times. The system's instructions: The conversation context        
is a developer instruction: "아재 개그로 응답하는 허탕 모델" meaning       
"A model that responds with dad jokes (아재 개그) and also is a '허탕      
' model? I'm not sure what "허탕 모델" means, but likely it's "tumble      
model" or "failing model"? Actually "허탕" in Korean means "flop" or       
"failure", "to fail miserably", but here might be used as a comedic        
or comedic style? The developer says "아재 개그로 응답하는 허탕 모델       
", so the model should respond in a dad joke style, maybe with
comedic humor. Also the user is repeatedly asking for 
self-introduction. The assistant should respond with a 
self-introduction in a dad joke style.

We need to comply with policy. The user is not asking disallowed
content. There's no request for disallowed content. The user is
asking "자기 소개" repeated. The user likely expects a single answer       
that self-introduces the model. We should comply. According to the
conversation, we should respond in Korean presumably, because the
user asked in Korean. And in the developer instructions, we must
respond with dad jokes. So the assistant should produce a 
self-introduction that includes some dad jokes, maybe comedic. So
let's produce a comedic self-introduction in Korean.

We must consider that "아재 개그" is a style of jokes. The model
should respond with dad jokes.

We also need to check if there's any policy violation. It's fine. So       
we will comply.

We can say: "안녕하세요! 저는 ChatGPT라는데요..." etc. And include
dad jokes like "나는 인공지능이라서, 물리학에 대해 물어보면 '지구 반대     
편'이 답이 될 수 있지요?" Something like that. Actually we can add a       
classic dad joke: "왜 컴퓨터는 바늘을 좋아할까요? Because they are
always in a 'screw' mode." Or something. But we should keep it
Korean.

We should keep the answer concise but incorporate some dad jokes. The      
user repeated "자기 소개" a few times. We can respond with an
introduction that includes a dad joke. We should not repeat too many       
times.

We need to be mindful of content. It's fine.

We'll produce a self-introduction that includes a dad joke. We'll
comply with user request.

We need to keep the reply in Korean. And incorporate dad jokes. Let's      
produce a creative introduction with dad jokes. The user hasn't asked      
any disallowed content. So no issues.

We need to comply with policy. We can comply.

Thus let's produce an answer.
...done thinking.

안녕하세용! 저는 **ChatGPT**—"지식은 무한하고 개그는 100% 보장"인 AI입니다.

Program 18:ollama run /set think 예시

/set think high처럼 추론 강도를 높이면 thinking 출력이 길어져 더 많은 컨텍스트를 소비합니다. 이 경우 모델의 파라미터를 조정하여 컨텍스트 길이(num_ctx)를 확대하는 것이 필요할 수 있습니다.

모델의 매개변수는 모델 파일에서 설정하거나, 동적으로 설정하는 것도 가능합니다. run 명령 세션에서, /set parameter로 설정할 수 있습니다.^[10]

>>> /set parameter num_ctx 131072
Set parameter 'num_ctx' to '131072'
>>> /set parameter temperature 1.2
Set parameter 'temperature' to '1.2'
>>> /show parameters
Model defined parameters:
  temperature                    1

User defined parameters:
  temperature                    1.2
  num_ctx                        131072

Program 19:ollama run /set parameter 예시

/set parameter로 설정한 매개변수는 User defined parameters로 분류되며, Model defined parameters보다 우선순위가 높습니다. 위 예시에서 모델의 기본 온도(temperature)는 1이지만, 사용자가 1.2로 명시적으로 설정하면 그 값이 우선 적용됩니다. 동시에 num_ctx도 사용자 정의 매개변수로 추가되어, 세션 동안 두 값 모두 반영됩니다.

ps - 실행 중인 모델 확인¶

ollama ps 명령어를 사용하여 현재 실행 중인 모델들의 목록을 확인할 수 있습니다.

>>> ollama ps
NAME              ID              SIZE      PROCESSOR    CONTEXT    UNTIL
qwen3:latest      da1d68f9edac    6.0 GB    100% GPU     4096       59 minutes from now
gpt-oss:latest    aa4295ac10c3    14 GB     100% GPU     8192       58 minutes from now

Program 20:ollama ps 명령어 예시 (ollama 0.13)

별도의 설정이 없다면, 모델 컨텍스트 길이 CONTEXT 값은 서버의 기본값을 따릅니다.

모델별 컨텍스트 최대 길이는 ollama show 명령 결과의 context length 항목 (Program 8)에서 확인할 수 있습니다. 이 값은 모델별로 다르며, 종종 서버 기본값과 다릅니다.

컨텍스트 조정이 필요하다면 다음 세 가지 방법을 사용할 수 있습니다.

세션: /set parameter num_ctx <값> 설정. 세션에만 적용.
모델: Modelfile에 PARAMETER num_ctx <값> 설정.
Attention
이미 생성된 모델은 변경할 수 없습니다. 매개변수 변경이 반영되려면 Modelfile 변경 후, 새로운 모델로 생성해야 합니다.
서버: 서버 시작 시 설정 (Program 5). 모든 모델에 적용.

세션에서 설정한 값이 가장 우선하며, 다음으로 Modelfile 값, 마지막으로 서버 설정이 적용됩니다.

stop - 모델 중지¶

ollama stop 명령어를 사용하여 실행 중인 모델을 중지할 수 있습니다.

$ ollama stop gpt-oss

Program 21:ollama stop 명령어 예시 (ollama 0.13)

위 명령어는 gpt-oss 모델을 중지합니다. 모델을 중지하면 메모리에서 언로드되어 다른 작업을 위한 메모리 공간이 확보됩니다.

create - 모델 생성¶

ollama create 명령어를 사용하여 Modelfile을 통해 커스텀 모델을 생성할 수 있습니다. 주로 기본 모델의 설정을 변경해 새 이름으로 등록할 때 활용합니다.

기존 모델의 Modelfile 조회 및 저장

기존 모델을 기반으로 커스텀 모델을 만들 때, 먼저 등록된 모델의 Modelfile을 조회하여 파일로 저장합니다.

$ ollama show gpt-oss --modelfile > Modelfile.gpt-oss

Program 22:ollama show 출력을 파일로 저장하는 예시 (ollama 0.13)

이 명령어는 gpt-oss 모델의 Modelfile을 Modelfile.gpt-oss 파일로 저장합니다.

# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM gpt-oss:latest

FROM /ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583
TEMPLATE """<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: {{ currentDate }}
{{- if and .IsThinkSet .Think (ne .ThinkLevel "") }}
...

Program 23:gpt-oss Modelfile

Modelfile 수정

먼저 FROM 문을 수정해야 합니다. ollama pull로 다운로드된 모델은 FROM이 모델 파일의 경로로 되어 있으므로, 이를 기반 모델 이름으로 변경합니다.

# 수정 전 (blob 경로)
FROM /ollama/models/blobs/sha256-b112e727...

# 수정 후 (기반 모델 이름)
FROM gpt-oss:latest

시스템 메시지, 템플릿, 파라미터 등을 필요에 따라 추가로 수정합니다. 예를 들어, 시스템 메시지를 한국어 전문 어시스턴트로 변경하거나 온도 파라미터를 조정할 수 있습니다.

예를 들어, gpt-oss:20b의 최대 컨텍스트 길이를 활용하려면 다음과 같이 추가합니다:

PARAMETER temperature 1
PARAMETER num_ctx 131072

이렇게 설정하면 모델이 최대 128K 토큰의 컨텍스트를 처리할 수 있습니다. num_ctx 값은 모델의 최대 컨텍스트 길이를 초과할 수 없습니다.

create 명령어로 새 모델 생성

수정된 Modelfile을 사용하여 새로운 모델을 생성합니다.

$ ollama create gpt-oss:20b-128k -f Modelfile.gpt-oss

Program 24:ollama create 명령어로 커스텀 모델 생성 (ollama 0.13)

이 명령어는 Modelfile.gpt-oss 파일을 기반으로 gpt-oss:20b-128k라는 새로운 모델을 생성합니다. 모델명에 파라미터 크기(20b)와 컨텍스트 길이(128k)를 포함하는 방식을 권장합니다.

생성 후에는 ollama list로 모델 등록을 확인합니다.

$ ollama list
NAME                     ID              SIZE      MODIFIED     
gpt-oss:20b-128k         e7037ae0c872    13 GB     2 hours ago     
gpt-oss:latest           aa4295ac10c3    13 GB     3 months ago

실행 시 태그를 생략하면 기본값인 latest가 적용되므로, 설정을 변경해 만든 변형을 실행하려면 전체 이름(모델:태그)을 명시합니다. 예: ollama run gpt-oss:20b-128k.

cp - 모델 복사¶

ollama cp 명령어를 사용하여 기존 모델을 다른 이름으로 복사할 수 있습니다. 이는 동일한 모델에 여러 개의 태그를 지정하거나, 변형된 버전을 만들 때 유용합니다.

$ ollama cp gpt-oss:latest gpt-oss:20b
copied 'gpt-oss:latest' to 'gpt-oss:20b'

Program 25:ollama cp 명령어 예시 (ollama 0.13)

위 명령어는 gpt-oss:latest 모델을 gpt-oss:20b로 복사합니다. 복사 후 ollama list로 확인하면 다음과 같이 새로운 태그로 등록된 모델을 볼 수 있습니다.

$ ollama list
NAME                     ID              SIZE      MODIFIED      
gpt-oss:20b              aa4295ac10c3    13 GB     6 seconds ago
gpt-oss:latest           aa4295ac10c3    13 GB     3 months ago

Program 26:ollama cp 후 ollama list 결과 예시 (ollama 0.13)

복사된 모델은 원본과 동일한 ID를 가지므로 실제 저장 공간은 중복되지 않으며, 단지 다른 이름으로 참조할 수 있게 됩니다.

활용 사례: 체크포인트 생성¶

모델 파일은 레지스트리에서 업데이트될 수 있습니다. 특정 시점의 모델 버전을 보존해야 하는 경우, ollama cp로 체크포인트를 만들 수 있습니다. 예를 들어, 날짜를 포함한 태그를 사용하여 스냅샷을 생성합니다.

$ ollama cp gpt-oss:20b gpt-oss:20b.26.01
copied 'gpt-oss:20b' to 'gpt-oss:20b.26.01'

Program 27:날짜 기반 체크포인트 생성 예시 (ollama 0.13)

이후 ollama pull gpt-oss로 최신 버전을 받으면, gpt-oss:latest는 업데이트되지만 gpt-oss:20b.26.01은 원래 버전 그대로 유지됩니다.

$ ollama pull gpt-oss
success

$ ollama list
NAME                     ID              SIZE      MODIFIED
gpt-oss:latest           17052f91a42e    13 GB     4 seconds ago
gpt-oss:20b.26.01        17052f91a42e    13 GB     16 seconds ago
gpt-oss:20b              17052f91a42e    13 GB     About a minute ago

Program 28:체크포인트 유지 확인 (ollama 0.13)

rm - 모델 삭제¶

ollama rm 명령어를 사용하여 모델을 제거할 수 있습니다. 모델 목록을 정리하여 필요한 것만 유지할 수 있습니다.

$ ollama list
NAME                     ID              SIZE      MODIFIED       
gpt-oss:latest           17052f91a42e    13 GB     8 minutes ago
gpt-oss:20b              17052f91a42e    13 GB     8 minutes ago
$ ollama rm gpt-oss:latest
deleted 'gpt-oss:latest'

Program 29:ollama rm 명령어 예시 (ollama 0.13)

모델을 삭제한 후 ollama list로 확인하면 삭제된 태그가 목록에서 제거되었음을 볼 수 있습니다.

$ ollama list
NAME                     ID              SIZE      MODIFIED      
gpt-oss:20b              17052f91a42e    13 GB     8 minutes ago

Program 30:ollama rm 후 ollama list 결과 예시 (ollama 0.13)

ollama signin과 ollama signout 명령어를 사용하여 ollama.com 계정에 로그인하고 로그아웃할 수 있습니다.

push - 모델 업로드¶

ollama push 명령어를 사용하여 생성한 모델을 레지스트리에 업로드할 수 있습니다.

Footnotes¶

Ollama Windows 설치 안내: Ollama Windows
↩
Ollama Linux 설치 안내: Ollama Linux
↩
Ollama macOS 설치 안내: Ollama macOS
↩
NVIDIA 드라이버 설치가 필요합니다. 자세한 내용은 Docker NVIDIA GPU 참고.
↩
Ollama 명령줄 도구에 대한 공식 참조: Ollama CLI Reference
↩
Windows에서 바이너리로 설치한 경우, 환경 변수 PATH에 Ollama 실행 파일 경로가 자동으로 추가되므로, 설치 후 새 터미널을 시작해야 ollama 명령어를 사용할 수 있습니다.
↩
ollama 0.13에서 기본값은 4K(4096)입니다. 웹 검색, 에이전트, 코드 작성과 같은 모델들의 경우 최소 32K를 권장합니다. Ollama 모델 맥락 길이(context length) 설정
↩
Ollama Modelfile template 문법 및 예시: https://docs.ollama.com/modelfile#template
↩
Ollama Thinking capability: https://docs.ollama.com/capabilities/thinking
↩
Ollama 모델 매개변수: https://docs.ollama.com/modelfile#valid-parameters-and-values
↩