Implementation:Scikit learn Scikit learn DictVectorizer
| Knowledge Sources | |
|---|---|
| Domains | Feature Extraction, Data Preprocessing |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for transforming lists of feature-value mappings to vectors provided by scikit-learn.
Description
DictVectorizer turns lists of mappings (dict-like objects) of feature names to feature values into NumPy arrays or scipy.sparse matrices for use with scikit-learn estimators. When feature values are strings, it performs binary one-hot encoding. When feature values are numeric, they are used directly. Features that do not occur in a sample will have a zero value in the resulting array or matrix.
Usage
Use DictVectorizer when your input data is in the form of dictionaries mapping feature names to values, such as data extracted from JSON or database records. It is especially useful when working with heterogeneous feature types and sparse feature spaces.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/feature_extraction/_dict_vectorizer.py
Signature
class DictVectorizer(TransformerMixin, BaseEstimator):
def __init__(self, *, dtype=np.float64, separator="=", sparse=True, sort=True):
Import
from sklearn.feature_extraction import DictVectorizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| dtype | dtype | No | The type of feature values. Passed to NumPy array or scipy.sparse matrix constructors. Default is np.float64. |
| separator | str | No | Separator string used when constructing new features for one-hot coding. Default is "=". |
| sparse | bool | No | Whether transform should produce scipy.sparse matrices. Default is True. |
| sort | bool | No | Whether feature_names_ and vocabulary_ should be sorted when fitting. Default is True. |
Outputs
| Name | Type | Description |
|---|---|---|
| X_transformed | ndarray or sparse matrix of shape (n_samples, n_features) | The vectorized feature matrix. |
| vocabulary_ | dict | A dictionary mapping feature names to feature indices. |
| feature_names_ | list | A list of feature names (e.g., "f=ham" and "f=spam"). |
Usage Examples
Basic Usage
from sklearn.feature_extraction import DictVectorizer
v = DictVectorizer(sparse=False)
D = [{"foo": 1, "bar": 2}, {"foo": 3, "baz": 1}]
X = v.fit_transform(D)
print(X)
# array([[2., 0., 1.],
# [0., 1., 3.]])
print(v.feature_names_)
# ['bar', 'baz', 'foo']