Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Haifengl Smile PerfectHash

From Leeroopedia


Knowledge Sources
Domains Hashing, Data Structures, Algorithms
Last Updated 2026-02-08 22:00 GMT

Overview

PerfectHash is a class that constructs a collision-free hash function for a fixed set of string keywords, mapping each keyword to its index in the input array with constant worst-case lookup time.

Description

PerfectHash implements a perfect hash function for a set of strings. A perfect hash function maps distinct elements in a set to a set of integers with no collisions, making it an injective function. The implementation uses the Fredman, Komlos, and Szemeredi (1984) construction approach: it computes character-based hash parameters (kvals) that produce unique hash values for all keywords. The algorithm iteratively resolves collisions by adjusting the hash parameters for distinguishing characters. An optional character position selection array allows hashing based on specific character positions within the keywords for optimization.

The class is Serializable and stores the original keyword array, the hash table mapping hash codes to indices, character hash parameters, and optionally the selected character positions.

Usage

Use PerfectHash when you have a fixed, known set of string keywords and need O(1) worst-case lookup time. Common applications include lexers, parsers, compilers, keyword recognition in natural language processing, and any scenario where a static set of strings must be mapped to integer indices.

Code Reference

Source Location

Signature

public class PerfectHash implements Serializable {
    // Constructors
    public PerfectHash(String... keywords);
    public PerfectHash(int[] select, String... keywords);

    // Lookup
    public int get(String key);
}

Import

import smile.hash.PerfectHash;

I/O Contract

Inputs

Name Type Required Description
keywords String[] Yes The fixed set of unique string keywords. Must not be empty and must not contain duplicates.
select int[] No Character positions within keywords to use for hash computation. If null, all characters are used.
key String Yes (for get()) The string to look up in the hash.

Outputs

Name Type Description
get(key) int The index of the keyword in the original array, or -1 if the key is not in the set.

Usage Examples

Basic Usage

import smile.hash.PerfectHash;

// Create a perfect hash for a set of keywords
String[] keywords = {"apple", "banana", "cherry", "date", "elderberry"};
PerfectHash hash = new PerfectHash(keywords);

// Lookup - returns the original index
int idx1 = hash.get("cherry");      // 2
int idx2 = hash.get("apple");       // 0
int idx3 = hash.get("unknown");     // -1 (not in set)

With Character Position Selection

import smile.hash.PerfectHash;

// Use only the first and third character positions for hashing
String[] keywords = {"int", "long", "float", "double", "char"};
int[] select = {0, 2};
PerfectHash hash = new PerfectHash(select, keywords);

int idx = hash.get("float"); // 2

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment