Auto-Pytorch

AutoDL

Auto-PyTorch๋Š” ํ”„๋ผ์ด๋ถ€๋ฅดํฌ์™€ ํ•œ๋…ธ๋ฒ„ ๋Œ€ํ•™๊ต AutoML ๊ทธ๋ฃน์—์„œ ๊ฐœ๋ฐœํ•œ ์˜คํ”ˆ์†Œ์Šค AutoML ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ์ด๋Š” SMAC์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์—ฌ ๋„คํŠธ์›Œํฌ ์•„ํ‚คํ…์ฒ˜์™€ ํ›ˆ๋ จ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ตœ์ ํ™”ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ํƒญ๋ฃฐ๋Ÿฌ ๋ฐ์ดํ„ฐ(๋ถ„๋ฅ˜, ํšŒ๊ท€) ๋ฐ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ(์˜ˆ์ธก)๋ฅผ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Auto-PyTorch์˜ ์ตœ์‹  ๋ฒ„์ „์—๋Š” ์‚ฌ์šฉ์„ฑ, ๊ฒฌ๊ณ ์„ฑ ๋ฐ ํšจ์œจ์„ฑ์„ ๋”์šฑ ๊ฐœ์„ ํ•˜๋Š” ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. Auto-PyTorch์˜ ์›Œํฌํ”Œ๋กœ์šฐ๋Š” ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

  1. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ

  2. ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ

  3. ๊ธฐ์ค€์„  ํ‰๊ฐ€

  4. SMAC์— ์˜ํ•œ ํƒ์ƒ‰

  5. ์ œ๊ณต๋œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ์ตœ์ƒ์˜ ์•™์ƒ๋ธ” ๊ตฌ์„ฑํ•˜๊ธฐ

์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” PyPI ์„ค์น˜ ๋ฐ ์ˆ˜๋™ ์„ค์น˜๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์‹œ๊ณ„์—ด ์˜ˆ์ธก์šฉ Auto-PyTorch๋Š” ์ถ”๊ฐ€์ ์ธ ์ข…์†์„ฑ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ํƒญ๋ฃฐ๋Ÿฌ ๋ฐ์ดํ„ฐ ๋ฐ ์‹œ๊ณ„์—ด ์˜ˆ์ธก ์ž‘์—…์— ๋Œ€ํ•œ ์˜ˆ์ œ๊ฐ€ ์ œ๊ณต๋˜๋ฉฐ, ๊ฒ€์ƒ‰ ๊ณต๊ฐ„ ๋ฐ ์ฝ”๋“œ ๋ณ‘๋ ฌํ™”๋ฅผ ๋ณ€๊ฒฝํ•˜์—ฌ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉ์ž ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํ”„๋กœ๊ทธ๋žจ์€ ์•„ํŒŒ์น˜ ๋ผ์ด์„ ์Šค 2.0์˜ ์กฐ๊ฑด์— ๋”ฐ๋ผ ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค.

Examples

In a nutshell:

from autoPyTorch.api.tabular_classification import TabularClassificationTask

# data and metric imports
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
X, y = sklearn.datasets.load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = \
        sklearn.model_selection.train_test_split(X, y, random_state=1)

# initialise Auto-PyTorch api
api = TabularClassificationTask()

# Search for an ensemble of machine learning algorithms
api.search(
    X_train=X_train,
    y_train=y_train,
    X_test=X_test,
    y_test=y_test,
    optimize_metric='accuracy',
    total_walltime_limit=300,
    func_eval_time_limit_secs=50
)

# Calculate test accuracy
y_pred = api.predict(X_test)
score = api.score(y_pred, y_test)
print("Accuracy score", score)

For Time Series Forecasting Tasks

from autoPyTorch.api.time_series_forecasting import TimeSeriesForecastingTask

# data and metric imports
from sktime.datasets import load_longley
targets, features = load_longley()

# define the forecasting horizon
forecasting_horizon = 3

# Dataset optimized by APT-TS can be a list of np.ndarray/ pd.DataFrame where each series represents an element in the 
# list, or a single pd.DataFrame that records the series
# index information: to which series the timestep belongs? This id can be stored as the DataFrame's index or a separate
# column
# Within each series, we take the last forecasting_horizon as test targets. The items before that as training targets
# Normally the value to be forecasted should follow the training sets
y_train = [targets[: -forecasting_horizon]]
y_test = [targets[-forecasting_horizon:]]

# same for features. For uni-variant models, X_train, X_test can be omitted and set as None
X_train = [features[: -forecasting_horizon]]
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
# we could also omit X_test
known_future_features = list(features.columns)
X_test = [features[-forecasting_horizon:]]

start_times = [targets.index.to_timestamp()[0]]
freq = '1Y'

# initialise Auto-PyTorch api
api = TimeSeriesForecastingTask()

# Search for an ensemble of machine learning algorithms
api.search(
    X_train=X_train,
    y_train=y_train,
    X_test=X_test, 
    optimize_metric='mean_MAPE_forecasting',
    n_prediction_steps=forecasting_horizon,
    memory_limit=16 * 1024,  # Currently, forecasting models use much more memories
    freq=freq,
    start_times=start_times,
    func_eval_time_limit_secs=50,
    total_walltime_limit=60,
    min_num_test_instances=1000,  # proxy validation sets. This only works for the tasks with more than 1000 series
    known_future_features=known_future_features,
)

# our dataset could directly generate sequences for new datasets
test_sets = api.dataset.generate_test_seqs()

# Calculate test accuracy
y_pred = api.predict(test_sets)
score = api.score(y_pred, y_test)
print("Forecasting score", score)

Last updated