Implementation:Scikit learn Scikit learn DictVectorizer

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Feature Extraction, Data Preprocessing
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for transforming lists of feature-value mappings to vectors provided by scikit-learn.

Description

DictVectorizer turns lists of mappings (dict-like objects) of feature names to feature values into NumPy arrays or scipy.sparse matrices for use with scikit-learn estimators. When feature values are strings, it performs binary one-hot encoding. When feature values are numeric, they are used directly. Features that do not occur in a sample will have a zero value in the resulting array or matrix.

Usage

Use DictVectorizer when your input data is in the form of dictionaries mapping feature names to values, such as data extracted from JSON or database records. It is especially useful when working with heterogeneous feature types and sparse feature spaces.

Code Reference

Source Location

Repository: scikit-learn
File: sklearn/feature_extraction/_dict_vectorizer.py

Signature

class DictVectorizer(TransformerMixin, BaseEstimator):
    def __init__(self, *, dtype=np.float64, separator="=", sparse=True, sort=True):

Import

from sklearn.feature_extraction import DictVectorizer

I/O Contract

Inputs

Name	Type	Required	Description
dtype	dtype	No	The type of feature values. Passed to NumPy array or scipy.sparse matrix constructors. Default is np.float64.
separator	str	No	Separator string used when constructing new features for one-hot coding. Default is "=".
sparse	bool	No	Whether transform should produce scipy.sparse matrices. Default is True.
sort	bool	No	Whether feature_names_ and vocabulary_ should be sorted when fitting. Default is True.

Outputs

Name	Type	Description
X_transformed	ndarray or sparse matrix of shape (n_samples, n_features)	The vectorized feature matrix.
vocabulary_	dict	A dictionary mapping feature names to feature indices.
feature_names_	list	A list of feature names (e.g., "f=ham" and "f=spam").

Usage Examples

Basic Usage

from sklearn.feature_extraction import DictVectorizer

v = DictVectorizer(sparse=False)
D = [{"foo": 1, "bar": 2}, {"foo": 3, "baz": 1}]
X = v.fit_transform(D)
print(X)
# array([[2., 0., 1.],
#        [0., 1., 3.]])
print(v.feature_names_)
# ['bar', 'baz', 'foo']

Related Pages

Principle:Scikit_learn_Scikit_learn_Text_Feature_Extraction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment