Implementation:PacktPublishing LLM Engineers Handbook FastLanguageModel From Pretrained
Appearance
| Field | Value |
|---|---|
| Implementation Name | FastLanguageModel From Pretrained |
| Type | Wrapper Doc (Unsloth external API) |
| Source File | llm_engineering/model/finetuning/finetune.py:L29-43 (within load_model())
|
| Workflow | LLM_Finetuning |
| Repo | PacktPublishing/LLM-Engineers-Handbook |
| Implements | Principle:PacktPublishing_LLM_Engineers_Handbook_Quantized_Model_Loading |
Function Signature
FastLanguageModel.from_pretrained(
model_name: str,
max_seq_length: int,
load_in_4bit: bool,
) -> tuple[model, tokenizer]
Import
from unsloth import FastLanguageModel
Description
FastLanguageModel.from_pretrained() is an Unsloth library method that loads a pre-trained language model and its corresponding tokenizer from a HuggingFace model identifier. It wraps the standard HuggingFace AutoModelForCausalLM.from_pretrained() with additional optimizations including fused attention kernels, memory-efficient loading, and optional 4-bit quantization via bitsandbytes.
This method is called within the load_model() function in the repository's fine-tuning pipeline.
Parameters
| Parameter | Type | Value in Repo | Description |
|---|---|---|---|
model_name |
str |
HuggingFace model ID (e.g., "meta-llama/Meta-Llama-3.1-8B") |
The HuggingFace Hub identifier or local path for the pre-trained model. |
max_seq_length |
int |
2048 |
Maximum sequence length the model will handle during fine-tuning. Determines positional encoding size and memory allocation. |
load_in_4bit |
bool |
False |
Whether to load model weights in 4-bit NF4 quantization. Set to False in this repository.
|
Returns
A tuple of (model, tokenizer):
- model: The loaded language model (with Unsloth optimizations applied), ready for LoRA adapter injection.
- tokenizer: The corresponding tokenizer configured for the model.
Key Code in Repository
# From llm_engineering/model/finetuning/finetune.py (within load_model())
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_id,
max_seq_length=max_seq_length,
load_in_4bit=False,
)
Notes on Repository Usage
load_in_4bit=False: In this repository, 4-bit quantization is not enabled at load time. The model is loaded in its default precision (typically BF16/FP16). This suggests the target instance (SageMakerml.g5.2xlargewith 24GB VRAM) has sufficient memory for the chosen model size.max_seq_length=2048: The sequence length is set to 2048 tokens, which is a reasonable default for fine-tuning instruction-following models.model_id: The model identifier is configured externally and passed into theload_model()function.
External Dependencies
| Package | Purpose |
|---|---|
unsloth |
Optimized model loading and inference |
transformers |
Underlying HuggingFace model/tokenizer classes |
bitsandbytes |
4-bit quantization support (used when load_in_4bit=True)
|
External References
- Unsloth GitHub Repository
- QLoRA Paper (Dettmers et al., 2023)
- HuggingFace Transformers Model Documentation
See Also
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment