Implementation:Intel Ipex llm vLLM Generate
Appearance
| Knowledge Sources | |
|---|---|
| Domains | NLP, Inference |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Wrapper for vLLM's SamplingParams and LLM.generate() for batch text generation on Intel XPU.
Description
This is a Wrapper Doc for the standard vLLM SamplingParams and llm.generate() API, used through the IPEX-LLM XPU engine. SamplingParams configures generation behavior (temperature, top_p). The generate() method accepts a list of prompts and returns RequestOutput objects. The underlying engine uses PagedAttention optimized for Intel XPU.
External Reference
Usage
Use after initializing the IPEXLLMClass engine to generate text completions from prompts.
Code Reference
Source Location
- Repository: IPEX-LLM
- File: python/llm/example/GPU/vLLM-Serving/offline_inference.py
- Lines: 34-63
Signature
from vllm import SamplingParams
sampling_params = SamplingParams(
temperature: float = 0.8,
top_p: float = 0.95,
)
outputs = llm.generate(
prompts: List[str],
sampling_params: SamplingParams
) -> List[RequestOutput]
Import
from vllm import SamplingParams
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| prompts | List[str] | Yes | Batch of input prompt strings |
| temperature | float | No | Sampling temperature (default 0.8, 0 for greedy) |
| top_p | float | No | Nucleus sampling threshold (default 0.95) |
Outputs
| Name | Type | Description |
|---|---|---|
| outputs | List[RequestOutput] | Each contains .prompt and .outputs[0].text |
Usage Examples
from vllm import SamplingParams
from ipex_llm.vllm.xpu.engine import IPEXLLMClass as LLM
# Initialize engine
llm = LLM(model="meta-llama/Llama-2-7b-chat-hf",
device="xpu", load_in_low_bit="fp8",
max_model_len=2000, max_num_batched_tokens=2000)
# Configure sampling
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Batch generation
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment