Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai Named Entity Recognition

From Leeroopedia


Knowledge Sources
Domains Machine Learning, NLP, Named Entity Recognition, Transformers
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for applying token classification models to extract named entities from text provided by txtai.

Description

Entity extends HFPipeline and applies a token classifier to extract entity/label combinations from text. It supports two backends: the standard Hugging Face Transformers token-classification pipeline and GLiNER models for zero-shot entity extraction. The pipeline auto-detects GLiNER models by checking for a gliner_config.json file. Results can be returned as (entity, entity_type, score) tuples or flattened to a list of entity strings, optionally joined into a single string. Label filtering and score thresholds are supported.

Usage

Use Entity when you need to extract named entities (persons, organizations, locations, etc.) from text. It is suitable for both standard NER with pre-trained token classification models and zero-shot NER using GLiNER models with custom entity type labels.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/pipeline/text/entity.py

Signature

class Entity(HFPipeline):
    def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs)
    def __call__(self, text, labels=None, aggregate="simple", flatten=None, join=False, workers=0)
    def isgliner(self, path)
    def execute(self, text, labels, aggregate, workers)
    def accept(self, etype, labels)

Import

from txtai.pipeline.text.entity import Entity

I/O Contract

Inputs

Name Type Required Description
text str or list Yes Input text or list of texts to extract entities from.
labels list No List of entity type labels to accept. Defaults to None (all accepted). For GLiNER without labels, defaults to ["person", "organization", "location"].
aggregate str No Method to combine multi-token entities: "simple" (default), "first", "average", or "max".
flatten bool or float No If set, flattens output to a list of entity strings. If a float, only entities with scores >= that value are kept.
join bool No If True and flatten is set, joins flattened entity strings with spaces. Defaults to False.
workers int No Number of concurrent workers for data processing. Defaults to 0.

Outputs

Name Type Description
result list When flatten is not set: list of (entity, entity_type, score) tuples. When flatten is set: list of entity strings. When flatten and join are set: a single joined string. For list input, returns a 2D list.

Usage Examples

from txtai.pipeline.text.entity import Entity

# Standard NER pipeline
ner = Entity("dslim/bert-base-NER", gpu=True)

# Extract entities as tuples
result = ner("John Smith works at Google in New York")
# Returns: [("John Smith", "PER", 0.99), ("Google", "ORG", 0.97), ("New York", "LOC", 0.95)]

# Flatten to entity names only
result = ner("John Smith works at Google", flatten=True)
# Returns: ["John Smith", "Google"]

# Flatten and join into a single string
result = ner("John Smith works at Google", flatten=True, join=True)
# Returns: "John Smith Google"

# Filter by entity type
result = ner("John Smith works at Google in New York", labels=["PER"])
# Returns: [("John Smith", "PER", 0.99)]

# GLiNER zero-shot NER
ner = Entity("urchade/gliner_medium-v2.1")
result = ner("Apple released the iPhone in Cupertino", labels=["company", "product", "city"])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment