unsloth - 딥러닝 언어 모델

pip install unsloth

import unsloth

print('Unsloth', unsloth.__version__)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
Unsloth 2025.9.6

모델 선택¶

from unsloth import FastLanguageModel

model_name = 'unsloth/Qwen3-4B-unsloth-bnb-4bit'
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name, max_seq_length=2048, dtype=None, load_in_4bit=True)

==((====))==  Unsloth 2025.9.6: Fast Qwen3 patching. Transformers: 4.55.4.
   \\   /|    NVIDIA GeForce RTX 3090. Num GPUs = 1. Max memory: 23.999 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.6. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

미세조정 설정¶

model = FastLanguageModel.get_peft_model(
    model,
    r = 8, # 클수록 성능이 증가하지만, 메모리 사용량도 증가합니다.
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                  "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 8, # rank와 동일하거나 두 배 값을 권장합니다.
    lora_dropout = 0, # 대체로 0 권장
    bias = "none",    # 임의 값 가능하지만, "none" 권장
    use_gradient_checkpointing = "unsloth", # "unsloth" 권장. True/False도 가능
    random_state = 2025, # 재현성을 위해 난수 초기값 (0 이상 정수)
)

Unsloth 2025.9.6 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers.

데이터셋¶

from datasets import load_dataset

dataset = load_dataset('vicgalle/alpaca-gpt4', split='train')

sample = dataset[0]
for key, value in sample.items():
    print(f'{key}: {value}\n')

instruction: Give three tips for staying healthy.

input: 

output: 1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.

3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.

text: Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Give three tips for staying healthy.

### Response:
1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.

3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.

import pandas as pd

samples = dataset.take(10)
pd.DataFrame(samples)

ChatGPT 스타일 대화 형식¶

from unsloth import to_sharegpt

merged_dataset = to_sharegpt(
    dataset,
    merged_prompt='{instruction}[[\nYour input is:\n{input}]]',
    output_column_name='output',
    conversation_extension=3, # 합칠 대화 수 (무작위 선택)
)

sample = merged_dataset[0]
sample['conversations']

[{'from': 'human', 'value': 'Give three tips for staying healthy.'},
 {'from': 'gpt',
  'value': '1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.'},
 {'from': 'human', 'value': 'Describe what a monotheistic religion is.'},
 {'from': 'gpt',
  'value': 'A monotheistic religion is a type of religion that believes in the existence of only one supreme and all-powerful deity, who is considered the creator and ruler of the universe. This deity is worshiped as the ultimate and only divine being, and followers of such religions often see their deity as omniscient, omnipotent, and omnibenevolent. Some of the most widely practiced monotheistic religions in the world today include Christianity, Islam, and Judaism, among others. The concept of monotheism differs from polytheism, which believes in the existence of multiple gods, and from atheism, which denies the existence of any deity.'},
 {'from': 'human', 'value': 'How does one add a chart to a document?'},
 {'from': 'gpt',
  'value': 'To add a chart to a document, follow these steps:\n\n1. Open the document where you want to insert the chart.\n2. Click the location where you want to insert the chart.\n3. In most word processors, you can go to the **Insert** tab, where you can find a **Chart** option. Click on it.\n4. A new window will appear, and you\'ll be able to select the chart type that you want to use (column, pie, line, bar, area, scatter, etc.).\n5. Once you\'ve selected your chart type, you’ll be prompted to enter your data into a spreadsheet. You can either type your data in manually or copy it from an existing data source.\n6. Edit your chart data and customize its appearance to fit your document\'s style or branding.\n7. When you’re happy with how the chart looks, click "OK" or "Insert" to add the chart to your document.\n8. Optionally, you can add a chart title or labels to the axes to make the chart easier to understand.\n\nThese instructions may vary depending on the type of word processor you are using.'}]

멀티턴 대화 형식¶

from unsloth import standardize_sharegpt

standardized_dataset = standardize_sharegpt(merged_dataset)

sample = standardized_dataset[0]
for conversation in sample['conversations']:
    role = conversation['role']
    content = conversation['content']
    print(f'{role}: {content}\n')

user: Give three tips for staying healthy.

assistant: 1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.

3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.

user: Describe what a monotheistic religion is.

assistant: A monotheistic religion is a type of religion that believes in the existence of only one supreme and all-powerful deity, who is considered the creator and ruler of the universe. This deity is worshiped as the ultimate and only divine being, and followers of such religions often see their deity as omniscient, omnipotent, and omnibenevolent. Some of the most widely practiced monotheistic religions in the world today include Christianity, Islam, and Judaism, among others. The concept of monotheism differs from polytheism, which believes in the existence of multiple gods, and from atheism, which denies the existence of any deity.

user: How does one add a chart to a document?

assistant: To add a chart to a document, follow these steps:

1. Open the document where you want to insert the chart.
2. Click the location where you want to insert the chart.
3. In most word processors, you can go to the **Insert** tab, where you can find a **Chart** option. Click on it.
4. A new window will appear, and you'll be able to select the chart type that you want to use (column, pie, line, bar, area, scatter, etc.).
5. Once you've selected your chart type, you’ll be prompted to enter your data into a spreadsheet. You can either type your data in manually or copy it from an existing data source.
6. Edit your chart data and customize its appearance to fit your document's style or branding.
7. When you’re happy with how the chart looks, click "OK" or "Insert" to add the chart to your document.
8. Optionally, you can add a chart title or labels to the axes to make the chart easier to understand.

These instructions may vary depending on the type of word processor you are using.

Chat Template¶

from unsloth import apply_chat_template

chat_template = """다음 대화는 어떤 작업을 수행하는지 설명합니다. 각 요청에 대해 적절한 응답을 생성하세요.

### 작업 설명:
{INPUT}

### 응답:
{OUTPUT}"""

train_dataset = apply_chat_template(
    standardized_dataset,
    tokenizer=tokenizer,
    chat_template=chat_template,
    # default_system_message = "You are a helpful assistant", << [OPTIONAL]
)

Unsloth: We automatically added an EOS token to stop endless generations.

모델 훈련¶

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    packing = False, # Can make training 5x faster for short sequences.
    args = SFTConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        num_train_epochs = 1,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "qwen3-finetuned",
        report_to = "none", # Use this for WandB etc
    ),
)

C 컴파일러 필요

우분투와 같은 Debian 계열에서 컴파일러 설치

sudo apt update
sudo apt install -y build-essential

import torch

gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA GeForce RTX 3090. Max memory = 23.999 GB.
5.441 GB of memory reserved.

trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 52,002 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 16,515,072 of 4,038,983,168 (0.41% trained)

Unsloth: Will smartly offload gradients to save VRAM!

import torch

used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

191.4002 seconds used for training.
3.19 minutes used for training.
Peak reserved memory = 5.441 GB.
Peak reserved memory for training = 0.0 GB.
Peak reserved memory % of max memory = 22.672 %.
Peak reserved memory for training % of max memory = 0.0 %.

모델 저장¶

model_path = "qwen3-lora-finetuned"
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

('qwen3-lora-finetuned/tokenizer_config.json',
 'qwen3-lora-finetuned/special_tokens_map.json',
 'qwen3-lora-finetuned/chat_template.jinja',
 'qwen3-lora-finetuned/vocab.json',
 'qwen3-lora-finetuned/merges.txt',
 'qwen3-lora-finetuned/added_tokens.json',
 'qwen3-lora-finetuned/tokenizer.json')

모델 활용¶

모델 적재¶

from pathlib import Path
from unsloth import FastLanguageModel

model_path = Path("qwen3-lora-finetuned")
if model_path.exists():
    model_path = str(model_path)
    print(f"모델 불러오기: {model_path}")
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_path, max_seq_length=2048, dtype=None, load_in_4bit=True)
    FastLanguageModel.for_inference(model) # 2배 빠른 추론 속도

모델 불러오기: qwen3-lora-finetuned
==((====))==  Unsloth 2025.9.6: Fast Qwen3 patching. Transformers: 4.55.4.
   \\   /|    NVIDIA GeForce RTX 3090. Num GPUs = 1. Max memory: 23.999 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.6. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

FastLanguageModel.for_inference(model)
messages = [
    {
        "role": "user", 
        "content": "Continue the fibonacci sequence! Your input is 1, 1, 2, 3, 5, 8,"
    },
]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer

text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(
    input_ids, streamer=text_streamer, max_new_tokens=128, pad_token_id=tokenizer.eos_token_id)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

13, 21, 34, 55, 89, 144, 233, 377, 610, 987.<|im_end|>

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [
    {"role": "user",      "content": "Continue the fibonacci sequence! Your input is 1, 1, 2, 3, 5, 8"},
    {"role": "assistant", "content": "The fibonacci sequence continues as 13, 21, 34, 55 and 89."},
    {"role": "user",      "content": "What is France's tallest tower called?"},
]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids, streamer=text_streamer, max_new_tokens=128, pad_token_id=tokenizer.eos_token_id)

The tallest tower in France is called the Tour de la Défense. It is located in Paris and is part of the city's new administrative center. The tower is 210 meters tall and has 52 floors.<|im_end|>