Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn DictVectorizer

From Leeroopedia


Knowledge Sources
Domains Feature Extraction, Data Preprocessing
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for transforming lists of feature-value mappings to vectors provided by scikit-learn.

Description

DictVectorizer turns lists of mappings (dict-like objects) of feature names to feature values into NumPy arrays or scipy.sparse matrices for use with scikit-learn estimators. When feature values are strings, it performs binary one-hot encoding. When feature values are numeric, they are used directly. Features that do not occur in a sample will have a zero value in the resulting array or matrix.

Usage

Use DictVectorizer when your input data is in the form of dictionaries mapping feature names to values, such as data extracted from JSON or database records. It is especially useful when working with heterogeneous feature types and sparse feature spaces.

Code Reference

Source Location

Signature

class DictVectorizer(TransformerMixin, BaseEstimator):
    def __init__(self, *, dtype=np.float64, separator="=", sparse=True, sort=True):

Import

from sklearn.feature_extraction import DictVectorizer

I/O Contract

Inputs

Name Type Required Description
dtype dtype No The type of feature values. Passed to NumPy array or scipy.sparse matrix constructors. Default is np.float64.
separator str No Separator string used when constructing new features for one-hot coding. Default is "=".
sparse bool No Whether transform should produce scipy.sparse matrices. Default is True.
sort bool No Whether feature_names_ and vocabulary_ should be sorted when fitting. Default is True.

Outputs

Name Type Description
X_transformed ndarray or sparse matrix of shape (n_samples, n_features) The vectorized feature matrix.
vocabulary_ dict A dictionary mapping feature names to feature indices.
feature_names_ list A list of feature names (e.g., "f=ham" and "f=spam").

Usage Examples

Basic Usage

from sklearn.feature_extraction import DictVectorizer

v = DictVectorizer(sparse=False)
D = [{"foo": 1, "bar": 2}, {"foo": 3, "baz": 1}]
X = v.fit_transform(D)
print(X)
# array([[2., 0., 1.],
#        [0., 1., 3.]])
print(v.feature_names_)
# ['bar', 'baz', 'foo']

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment