Implementation:Haifengl Smile PerfectHash
| Knowledge Sources | |
|---|---|
| Domains | Hashing, Data Structures, Algorithms |
| Last Updated | 2026-02-08 22:00 GMT |
Overview
PerfectHash is a class that constructs a collision-free hash function for a fixed set of string keywords, mapping each keyword to its index in the input array with constant worst-case lookup time.
Description
PerfectHash implements a perfect hash function for a set of strings. A perfect hash function maps distinct elements in a set to a set of integers with no collisions, making it an injective function. The implementation uses the Fredman, Komlos, and Szemeredi (1984) construction approach: it computes character-based hash parameters (kvals) that produce unique hash values for all keywords. The algorithm iteratively resolves collisions by adjusting the hash parameters for distinguishing characters. An optional character position selection array allows hashing based on specific character positions within the keywords for optimization.
The class is Serializable and stores the original keyword array, the hash table mapping hash codes to indices, character hash parameters, and optionally the selected character positions.
Usage
Use PerfectHash when you have a fixed, known set of string keywords and need O(1) worst-case lookup time. Common applications include lexers, parsers, compilers, keyword recognition in natural language processing, and any scenario where a static set of strings must be mapped to integer indices.
Code Reference
Source Location
- Repository: Haifengl_Smile
- File: base/src/main/java/smile/hash/PerfectHash.java
- Lines: 1-309
Signature
public class PerfectHash implements Serializable {
// Constructors
public PerfectHash(String... keywords);
public PerfectHash(int[] select, String... keywords);
// Lookup
public int get(String key);
}
Import
import smile.hash.PerfectHash;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| keywords | String[] | Yes | The fixed set of unique string keywords. Must not be empty and must not contain duplicates. |
| select | int[] | No | Character positions within keywords to use for hash computation. If null, all characters are used. |
| key | String | Yes (for get()) | The string to look up in the hash. |
Outputs
| Name | Type | Description |
|---|---|---|
| get(key) | int | The index of the keyword in the original array, or -1 if the key is not in the set. |
Usage Examples
Basic Usage
import smile.hash.PerfectHash;
// Create a perfect hash for a set of keywords
String[] keywords = {"apple", "banana", "cherry", "date", "elderberry"};
PerfectHash hash = new PerfectHash(keywords);
// Lookup - returns the original index
int idx1 = hash.get("cherry"); // 2
int idx2 = hash.get("apple"); // 0
int idx3 = hash.get("unknown"); // -1 (not in set)
With Character Position Selection
import smile.hash.PerfectHash;
// Use only the first and third character positions for hashing
String[] keywords = {"int", "long", "float", "double", "char"};
int[] select = {0, 2};
PerfectHash hash = new PerfectHash(select, keywords);
int idx = hash.get("float"); // 2