Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlc ai Mlc llm GPT BigCode Model

From Leeroopedia


Knowledge Sources
Domains Model_Architecture, LLM
Last Updated 2026-02-09 19:00 GMT

Overview

Implements the GPTBigCode architecture for causal language modeling within the MLC LLM framework, featuring multi-query attention and learned positional embeddings.

Description

This module provides the TVM Relax-based implementation of the GPTBigCode model architecture (used in models like StarCoder). Unlike most modern transformer architectures in MLC LLM, GPTBigCode has several distinctive characteristics:

  • Multi-query attention (MQA): Uses a single key-value head (num_kv_heads = 1) shared across all query heads, significantly reducing KV cache memory requirements.
  • Learned positional embeddings: Uses a separate nn.Embedding layer (wpe) for position embeddings instead of rotary position embeddings (RoPE). The KV cache is created with RopeMode.NONE.
  • LayerNorm: Uses standard LayerNorm instead of RMSNorm that is common in newer architectures.
  • Biased projections: All linear layers (attention and MLP) include bias terms.
  • GeLU activation: The MLP uses standard GeLU activation without gating.
  • No tensor parallelism for attention: Explicitly asserts that tensor_parallel_shards == 1 for attention due to the single KV head design.

The model consists of GPTBigCodeModel (embedding + transformer blocks + final LayerNorm), wrapped by GPTBigCodeForCausalLM which adds the language modeling head and inference methods.

Usage

Use this module when compiling GPTBigCode-family models (e.g., StarCoder, SantaCoder) for deployment with MLC LLM. The model is identified by the gpt_bigcode model type in configuration files.

Code Reference

Source Location

Signature

@dataclasses.dataclass
class GPTBigCodeConfig(ConfigBase):
    n_embd: int
    n_inner: int
    n_head: int
    n_layer: int
    n_positions: int
    layer_norm_epsilon: float
    vocab_size: int
    context_window_size: int = 0
    prefill_chunk_size: int = 0
    tensor_parallel_shards: int = 1
    max_batch_size: int = 1
    ...

class GPTBigCodeForCausalLM(nn.Module):
    def __init__(self, config: GPTBigCodeConfig): ...
    def embed(self, input_ids: Tensor): ...
    def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache): ...
    def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache): ...
    def batch_prefill(self, input_embeds, logit_positions, paged_kv_cache): ...
    def batch_decode(self, input_embeds, paged_kv_cache): ...
    def batch_verify(self, input_embeds, paged_kv_cache): ...
    def create_paged_kv_cache(self, ...): ...
    def get_default_spec(self): ...

Import

from mlc_llm.model.gpt_bigcode.gpt_bigcode_model import GPTBigCodeConfig, GPTBigCodeForCausalLM

I/O Contract

Primary Classes

Class Role Key Characteristics
GPTBigCodeConfig Model configuration Uses GPT-2 style naming (n_embd, n_inner, n_head, n_layer, n_positions)
GPTBigCodeMLP Feed-forward network Standard GeLU activation, c_fc/c_proj with bias
GPTBigCodeAttention Multi-query attention Single KV head, fused QKV projection (c_attn), biased projections
GPTBigCodeBlock Transformer block Pre-norm with LayerNorm (ln_1, ln_2)
GPTBigCodeModel Core model Token embeddings (wte) + position embeddings (wpe) + blocks + final LayerNorm
GPTBigCodeForCausalLM Top-level model Adds lm_head for causal LM

Forward Methods

Method Input Output
embed Tensor[seq_len] (int32) Tensor[1, seq_len, n_embd]
prefill Tensor[1, seq_len, n_embd], PagedKVCache (Tensor[1, 1, vocab_size], PagedKVCache)
decode Tensor[1, 1, n_embd], PagedKVCache (Tensor[1, 1, vocab_size], PagedKVCache)
batch_prefill Tensor[1, seq_len, n_embd], Tensor[batch_size], PagedKVCache (Tensor, PagedKVCache)
batch_decode Tensor[batch_size, 1, n_embd], PagedKVCache (Tensor, PagedKVCache)

Configuration Mapping

GPTBigCode Field Standard Equivalent Description
n_embd hidden_size Embedding dimension
n_inner intermediate_size MLP hidden dimension
n_head num_attention_heads Number of query heads
n_layer num_hidden_layers Number of transformer blocks
n_positions max_position_embeddings Maximum sequence length for learned position embeddings

Usage Examples

# Creating a GPTBigCode config
config = GPTBigCodeConfig(
    n_embd=2048,
    n_inner=8192,
    n_head=16,
    n_layer=24,
    n_positions=2048,
    layer_norm_epsilon=1e-5,
    vocab_size=49280,
)
model = GPTBigCodeForCausalLM(config)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment