Auto-Pytorch

AutoDL

GitHub - automl/Auto-PyTorch: Automatic architecture search and hyperparameter optimization for PyTorchGitHub

Auto-PyTorch는 프라이부르크와 한노버 대학교 AutoML 그룹에서 개발한 오픈소스 AutoML 프레임워크입니다. 이는 SMAC을 기반으로 하여 네트워크 아키텍처와 훈련 하이퍼파라미터를 결합하여 최적화하는데 사용됩니다. 이 프레임워크는 탭룰러 데이터(분류, 회귀) 및 시계열 데이터(예측)를 지원하기 위해 설계되었습니다. Auto-PyTorch의 최신 버전에는 사용성, 견고성 및 효율성을 더욱 개선하는 새로운 기능이 포함되어 있습니다. Auto-PyTorch의 워크플로우는 다음 단계로 구성됩니다:

입력 데이터 유효성 검사
데이터셋 생성
기준선 평가
SMAC에 의한 탐색
제공된 데이터셋에 대해 최상의 앙상블 구성하기

이 프레임워크는 PyPI 설치 및 수동 설치를 지원합니다. 시계열 예측용 Auto-PyTorch는 추가적인 종속성이 필요합니다. 탭룰러 데이터 및 시계열 예측 작업에 대한 예제가 제공되며, 검색 공간 및 코드 병렬화를 변경하여 프레임워크를 사용자 지정할 수 있습니다. 이 프로그램은 아파치 라이선스 2.0의 조건에 따라 배포됩니다.

Examples

In a nutshell:

from autoPyTorch.api.tabular_classification import TabularClassificationTask

# data and metric imports
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
X, y = sklearn.datasets.load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = \
        sklearn.model_selection.train_test_split(X, y, random_state=1)

# initialise Auto-PyTorch api
api = TabularClassificationTask()

# Search for an ensemble of machine learning algorithms
api.search(
    X_train=X_train,
    y_train=y_train,
    X_test=X_test,
    y_test=y_test,
    optimize_metric='accuracy',
    total_walltime_limit=300,
    func_eval_time_limit_secs=50
)

# Calculate test accuracy
y_pred = api.predict(X_test)
score = api.score(y_pred, y_test)
print("Accuracy score", score)

For Time Series Forecasting Tasks

from autoPyTorch.api.time_series_forecasting import TimeSeriesForecastingTask

# data and metric imports
from sktime.datasets import load_longley
targets, features = load_longley()

# define the forecasting horizon
forecasting_horizon = 3

# Dataset optimized by APT-TS can be a list of np.ndarray/ pd.DataFrame where each series represents an element in the 
# list, or a single pd.DataFrame that records the series
# index information: to which series the timestep belongs? This id can be stored as the DataFrame's index or a separate
# column
# Within each series, we take the last forecasting_horizon as test targets. The items before that as training targets
# Normally the value to be forecasted should follow the training sets
y_train = [targets[: -forecasting_horizon]]
y_test = [targets[-forecasting_horizon:]]

# same for features. For uni-variant models, X_train, X_test can be omitted and set as None
X_train = [features[: -forecasting_horizon]]
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
# we could also omit X_test
known_future_features = list(features.columns)
X_test = [features[-forecasting_horizon:]]

start_times = [targets.index.to_timestamp()[0]]
freq = '1Y'

# initialise Auto-PyTorch api
api = TimeSeriesForecastingTask()

# Search for an ensemble of machine learning algorithms
api.search(
    X_train=X_train,
    y_train=y_train,
    X_test=X_test, 
    optimize_metric='mean_MAPE_forecasting',
    n_prediction_steps=forecasting_horizon,
    memory_limit=16 * 1024,  # Currently, forecasting models use much more memories
    freq=freq,
    start_times=start_times,
    func_eval_time_limit_secs=50,
    total_walltime_limit=60,
    min_num_test_instances=1000,  # proxy validation sets. This only works for the tasks with more than 1000 series
    known_future_features=known_future_features,
)

# our dataset could directly generate sequences for new datasets
test_sets = api.dataset.generate_test_seqs()

# Calculate test accuracy
y_pred = api.predict(test_sets)
score = api.score(y_pred, y_test)
print("Forecasting score", score)

PreviousAuto-Sklearn NextAutoGluon

Last updated 2 years ago