Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:PacktPublishing LLM Engineers Handbook FastLanguageModel From Pretrained

From Leeroopedia


Field Value
Implementation Name FastLanguageModel From Pretrained
Type Wrapper Doc (Unsloth external API)
Source File llm_engineering/model/finetuning/finetune.py:L29-43 (within load_model())
Workflow LLM_Finetuning
Repo PacktPublishing/LLM-Engineers-Handbook
Implements Principle:PacktPublishing_LLM_Engineers_Handbook_Quantized_Model_Loading

Function Signature

FastLanguageModel.from_pretrained(
    model_name: str,
    max_seq_length: int,
    load_in_4bit: bool,
) -> tuple[model, tokenizer]

Import

from unsloth import FastLanguageModel

Description

FastLanguageModel.from_pretrained() is an Unsloth library method that loads a pre-trained language model and its corresponding tokenizer from a HuggingFace model identifier. It wraps the standard HuggingFace AutoModelForCausalLM.from_pretrained() with additional optimizations including fused attention kernels, memory-efficient loading, and optional 4-bit quantization via bitsandbytes.

This method is called within the load_model() function in the repository's fine-tuning pipeline.

Parameters

Parameter Type Value in Repo Description
model_name str HuggingFace model ID (e.g., "meta-llama/Meta-Llama-3.1-8B") The HuggingFace Hub identifier or local path for the pre-trained model.
max_seq_length int 2048 Maximum sequence length the model will handle during fine-tuning. Determines positional encoding size and memory allocation.
load_in_4bit bool False Whether to load model weights in 4-bit NF4 quantization. Set to False in this repository.

Returns

A tuple of (model, tokenizer):

  • model: The loaded language model (with Unsloth optimizations applied), ready for LoRA adapter injection.
  • tokenizer: The corresponding tokenizer configured for the model.

Key Code in Repository

# From llm_engineering/model/finetuning/finetune.py (within load_model())

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_id,
    max_seq_length=max_seq_length,
    load_in_4bit=False,
)

Notes on Repository Usage

  • load_in_4bit=False: In this repository, 4-bit quantization is not enabled at load time. The model is loaded in its default precision (typically BF16/FP16). This suggests the target instance (SageMaker ml.g5.2xlarge with 24GB VRAM) has sufficient memory for the chosen model size.
  • max_seq_length=2048: The sequence length is set to 2048 tokens, which is a reasonable default for fine-tuning instruction-following models.
  • model_id: The model identifier is configured externally and passed into the load_model() function.

External Dependencies

Package Purpose
unsloth Optimized model loading and inference
transformers Underlying HuggingFace model/tokenizer classes
bitsandbytes 4-bit quantization support (used when load_in_4bit=True)

External References

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment