Auto-Pytorch
AutoDL
Auto-PyTorch๋ ํ๋ผ์ด๋ถ๋ฅดํฌ์ ํ๋ ธ๋ฒ ๋ํ๊ต AutoML ๊ทธ๋ฃน์์ ๊ฐ๋ฐํ ์คํ์์ค AutoML ํ๋ ์์ํฌ์ ๋๋ค. ์ด๋ SMAC์ ๊ธฐ๋ฐ์ผ๋ก ํ์ฌ ๋คํธ์ํฌ ์ํคํ ์ฒ์ ํ๋ จ ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ๊ฒฐํฉํ์ฌ ์ต์ ํํ๋๋ฐ ์ฌ์ฉ๋ฉ๋๋ค. ์ด ํ๋ ์์ํฌ๋ ํญ๋ฃฐ๋ฌ ๋ฐ์ดํฐ(๋ถ๋ฅ, ํ๊ท) ๋ฐ ์๊ณ์ด ๋ฐ์ดํฐ(์์ธก)๋ฅผ ์ง์ํ๊ธฐ ์ํด ์ค๊ณ๋์์ต๋๋ค. Auto-PyTorch์ ์ต์ ๋ฒ์ ์๋ ์ฌ์ฉ์ฑ, ๊ฒฌ๊ณ ์ฑ ๋ฐ ํจ์จ์ฑ์ ๋์ฑ ๊ฐ์ ํ๋ ์๋ก์ด ๊ธฐ๋ฅ์ด ํฌํจ๋์ด ์์ต๋๋ค. Auto-PyTorch์ ์ํฌํ๋ก์ฐ๋ ๋ค์ ๋จ๊ณ๋ก ๊ตฌ์ฑ๋ฉ๋๋ค:
์ ๋ ฅ ๋ฐ์ดํฐ ์ ํจ์ฑ ๊ฒ์ฌ
๋ฐ์ดํฐ์ ์์ฑ
๊ธฐ์ค์ ํ๊ฐ
SMAC์ ์ํ ํ์
์ ๊ณต๋ ๋ฐ์ดํฐ์ ์ ๋ํด ์ต์์ ์์๋ธ ๊ตฌ์ฑํ๊ธฐ
์ด ํ๋ ์์ํฌ๋ PyPI ์ค์น ๋ฐ ์๋ ์ค์น๋ฅผ ์ง์ํฉ๋๋ค. ์๊ณ์ด ์์ธก์ฉ Auto-PyTorch๋ ์ถ๊ฐ์ ์ธ ์ข ์์ฑ์ด ํ์ํฉ๋๋ค. ํญ๋ฃฐ๋ฌ ๋ฐ์ดํฐ ๋ฐ ์๊ณ์ด ์์ธก ์์ ์ ๋ํ ์์ ๊ฐ ์ ๊ณต๋๋ฉฐ, ๊ฒ์ ๊ณต๊ฐ ๋ฐ ์ฝ๋ ๋ณ๋ ฌํ๋ฅผ ๋ณ๊ฒฝํ์ฌ ํ๋ ์์ํฌ๋ฅผ ์ฌ์ฉ์ ์ง์ ํ ์ ์์ต๋๋ค. ์ด ํ๋ก๊ทธ๋จ์ ์ํ์น ๋ผ์ด์ ์ค 2.0์ ์กฐ๊ฑด์ ๋ฐ๋ผ ๋ฐฐํฌ๋ฉ๋๋ค.
Examples
In a nutshell:
from autoPyTorch.api.tabular_classification import TabularClassificationTask
# data and metric imports
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
X, y = sklearn.datasets.load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = \
sklearn.model_selection.train_test_split(X, y, random_state=1)
# initialise Auto-PyTorch api
api = TabularClassificationTask()
# Search for an ensemble of machine learning algorithms
api.search(
X_train=X_train,
y_train=y_train,
X_test=X_test,
y_test=y_test,
optimize_metric='accuracy',
total_walltime_limit=300,
func_eval_time_limit_secs=50
)
# Calculate test accuracy
y_pred = api.predict(X_test)
score = api.score(y_pred, y_test)
print("Accuracy score", score)For Time Series Forecasting Tasks
from autoPyTorch.api.time_series_forecasting import TimeSeriesForecastingTask
# data and metric imports
from sktime.datasets import load_longley
targets, features = load_longley()
# define the forecasting horizon
forecasting_horizon = 3
# Dataset optimized by APT-TS can be a list of np.ndarray/ pd.DataFrame where each series represents an element in the
# list, or a single pd.DataFrame that records the series
# index information: to which series the timestep belongs? This id can be stored as the DataFrame's index or a separate
# column
# Within each series, we take the last forecasting_horizon as test targets. The items before that as training targets
# Normally the value to be forecasted should follow the training sets
y_train = [targets[: -forecasting_horizon]]
y_test = [targets[-forecasting_horizon:]]
# same for features. For uni-variant models, X_train, X_test can be omitted and set as None
X_train = [features[: -forecasting_horizon]]
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
# we could also omit X_test
known_future_features = list(features.columns)
X_test = [features[-forecasting_horizon:]]
start_times = [targets.index.to_timestamp()[0]]
freq = '1Y'
# initialise Auto-PyTorch api
api = TimeSeriesForecastingTask()
# Search for an ensemble of machine learning algorithms
api.search(
X_train=X_train,
y_train=y_train,
X_test=X_test,
optimize_metric='mean_MAPE_forecasting',
n_prediction_steps=forecasting_horizon,
memory_limit=16 * 1024, # Currently, forecasting models use much more memories
freq=freq,
start_times=start_times,
func_eval_time_limit_secs=50,
total_walltime_limit=60,
min_num_test_instances=1000, # proxy validation sets. This only works for the tasks with more than 1000 series
known_future_features=known_future_features,
)
# our dataset could directly generate sequences for new datasets
test_sets = api.dataset.generate_test_seqs()
# Calculate test accuracy
y_pred = api.predict(test_sets)
score = api.score(y_pred, y_test)
print("Forecasting score", score)Last updated