Implementation:Ggml org Ggml Sycl copy
Appearance
| Knowledge Sources | |
|---|---|
| Domains | ML_Infrastructure, GPU_Compute |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
SYCL tensor copy kernels supporting format conversion across a wide range of type combinations including f32, f16, i16, i32, and quantized formats.
Description
cpy.cpp implements the GGML_OP_CPY (copy) and GGML_OP_DUP (duplicate) operations for the SYCL backend. The module provides:
- Per-element copy functions: Simple type conversion functions (cpy_1_f32_f32, cpy_1_f32_f16, cpy_1_f16_f32, cpy_1_i16_i16, cpy_1_i32_i32) that handle single-element format conversion using SYCL vector conversion intrinsics.
- Strided copy kernel: The cpy_f32_f16 template kernel maps a flat element index back to 4D tensor coordinates (i00-i03 for source, i10-i13 for destination), computes byte offsets using per-dimension strides, and invokes the appropriate per-element copy function. This supports non-contiguous memory layouts.
- Block-wise quantized copy: cpy_blck_q_q for same-type quantized block copies, and cpy_blck_q8_0_f32 for dequantizing q8_0 blocks to f32 during copy.
- Dispatch function: ggml_sycl_cpy selects the correct kernel based on the source and destination type combination, supporting f32->f32, f32->f16, f16->f16, f16->f32, f32->q8_0, f32->q4_0, f32->q4_1, and quantized-to-quantized copies.
Usage
Called by the main SYCL backend when the compute graph contains CPY or DUP operations. The copy function handles both same-type copies and format conversions, including quantization during copy.
Code Reference
Source Location
- Repository: GGML
- File: src/ggml-sycl/cpy.cpp
- Lines: 602
Signatures
// Per-element copy functions
static void cpy_1_f32_f32(const char * cxi, char * cdsti);
static void cpy_1_f32_f16(const char * cxi, char * cdsti);
static void cpy_1_f16_f16(const char * cxi, char * cdsti);
static void cpy_1_f16_f32(const char * cxi, char * cdsti);
// Strided copy kernel template
template <cpy_kernel_t cpy_1>
static void cpy_f32_f16(const char * cx, char * cdst, const int ne,
const int ne00, const int ne01, const int ne02,
const int nb00, const int nb01, const int nb02, const int nb03,
const int ne10, const int ne11, const int ne12,
const int nb10, const int nb11, const int nb12, const int nb13,
const sycl::nd_item<3> & item_ct1);
// Public entry points
void ggml_sycl_cpy(ggml_backend_sycl_context & ctx, const ggml_tensor * src0, const ggml_tensor * src1);
void ggml_sycl_dup(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ctx | ggml_backend_sycl_context & | Yes | SYCL backend context providing the device queue |
| src0 | const ggml_tensor * | Yes | Source tensor to copy from |
| src1 | const ggml_tensor * | Yes | Destination tensor (for cpy) defining target shape and type |
Outputs
| Name | Type | Description |
|---|---|---|
| src1->data | void * | Destination buffer with copied and optionally type-converted data |
Usage Examples
// Copy with format conversion (f32 -> f16):
// The backend dispatcher calls this when encountering a CPY node
ggml_sycl_cpy(sycl_ctx, f32_tensor, f16_tensor);
// Duplicate a tensor (same type copy):
ggml_sycl_dup(sycl_ctx, dst_tensor);
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment