Feature Engine

Feature engineering package with sklearn like functionality

Installation

From PyPI using pip:

pip install feature_engine

From Anaconda:

conda install -c conda-forge feature_engine

Or simply clone it:

git clone https://github.com/feature-engine/feature_engine.git

Example Usage

Python ์„ ์‚ฌ์šฉํ•œ ๋งŽ์€ ML ์ฝ”๋“œ ์ž‘์—…์—์„œ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ž‘์„ฑํ•˜๊ฒŒ ๋˜๋Š” feature engineering ์œ ํ‹ธ ํ•จ์ˆ˜๋ฅผ ๋ชจ์•„๋†“์€ ํŒจํ‚ค์ง€. ์˜ˆ๋ฅผ ๋“ค์–ด, ์•„๋ž˜ ์ฝ”๋“œ๋Š” ์ผ์ • ๊ฐœ์ˆ˜ ๋ฏธ๋งŒ์˜ label ์„ ์ทจํ•ฉํ•ด์„œ ๋ณ„๋„์˜ label ๋กœ ์žฌ์ •์˜ ํ•˜๋Š” ์ฝ”๋“œ. ์ด๋ฅผ feature_engine ํŒจํ‚ค์ง€์˜ RareLabelEncoder ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ”๋“œ ๋ช‡ ์ค„๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ.

>>> import pandas as pd
>>> from feature_engine.encoding import RareLabelEncoder

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

Find more examples in our Jupyter Notebook Gallery or in the documentation.

Last updated