๊ด€๋ฆฌ ๋ฉ”๋‰ด

Doby's Lab

ValueError: ctypes objects containing pointers cannot be pickled ๋ณธ๋ฌธ

Code about AI/etc

ValueError: ctypes objects containing pointers cannot be pickled

๋„๋น„(Doby) 2023. 8. 10. 17:01

๐Ÿค” Problem

XGBRegressor์™€ MultiOutputRegressor๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ๋‹ค์ค‘ํšŒ๊ท€์ถœ๋ ฅ ๋ชจ๋ธ์„ ๋งŒ๋“ค์—ˆ๊ณ , ์ด ๋ชจ๋ธ์˜ ๊ธฐ์กด ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ default ๊ฐ’ ์„ธํŒ…์—์„œ ํŠœ๋‹์„ ์‹œ์ž‘ํ•  ๋•Œ, ๋ฐœ์ƒํ•œ ์—๋Ÿฌ์— ๋Œ€ํ•ด ์ •๋ฆฌํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

XGB๋Š” xgboost์—์„œ ์ œ๊ณตํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

from xgboost import XGBRegressor

๋‘ ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•˜๊ธฐ ์œ„ํ•ด ์•„๋ž˜์™€ ๊ฐ™์€ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ–ˆ์—ˆ์Šต๋‹ˆ๋‹ค.

dtrain = xgb.DMatrix(data=train_X, label=train_Y)
dtest = xgb.DMatrix(data=test_X, label=test_Y)

params = {
    'max_depth':3,
    'eta':0.1,
    'objective':'reg:squarederror',
    'eval_metric':'rsme',
}
num_rounds = 400

wlist = [(dtrain,'train'),(dtest,'eval')]

xgb_model = XGBRegressor(params=params,
                        num_boost_round=num_rounds,
                        early_stopping_rounds=100,
                        evals=wlist)

multioutput_model = MultiOutputRegressor(xgb_model)

multioutput_model.fit(train_X.values, train_Y.values)

ํ•˜์ง€๋งŒ, ์ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—๋Ÿฌ๋ฅผ ๋ฐœ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

ValueError: ctypes objects containing pointers cannot be pickled

์ด ์—๋Ÿฌ๊ฐ€ ๋งํ•˜๋Š” ๋ฐ”๋Š” ์ง์—ญํ–ˆ์„ ๋•Œ, 'ํฌ์ธํ„ฐ๋ฅผ ํฌํ•จํ•œ C์–ธ์–ด ๊ฐ์ฒด๋ฅผ ์ง๋ ฌํ™”(pickling)์‹œํ‚ฌ ์ˆ˜ ์—†๋‹ค'๋Š” ๋œป์ž…๋‹ˆ๋‹ค.

 

์—ฌ๊ธฐ์„œ ์บ์น˜ํ•ด์•ผ ํ•  ๋ถ€๋ถ„์€ C์–ธ์–ด ๊ฐ์ฒด์™€ ์ง๋ ฌํ™”๋ผ๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค.

โœ… ์ง๋ ฌํ™” (Serialization, pickling)

์ง๋ ฌํ™”๋ž€ ๋ชจ๋ธ์˜ ์ƒํƒœ์™€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ €์žฅํ•˜์—ฌ ๋‚˜์ค‘์— ์žฌ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ๊ณต์œ ํ•˜๋Š” ๊ณผ์ •์„ ๋งํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚จ ํ›„์— ์ €์žฅํ•ด ๋‘๋ฉด, ๋‚˜์ค‘์— ์ถ”๊ฐ€์ ์ธ ํ›ˆ๋ จ์ด ์—†์ด ๋‹ค๋ฅธ ํ™˜๊ฒฝ์—์„œ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ํŽธ๋ฆฌํ•œ ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

์ด๋Ÿฐ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ ํŒŒ์ด์ฌ์—์„œ๋Š” pickle ํ˜น์€ joblib ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

(๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ด๋ฆ„์ด pickle์ด๋ผ๋Š” ์ด์œ ์— ๋น„๋กฏํ•˜์—ฌ pickling์ด๋ผ๋Š” ๋ง์„ ์ง๋ ฌํ™”๋ผ๋Š” ๋œป์œผ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.)

๋ชจ๋ธ์„ ๋ฐ”์ด๋„ˆ๋ฆฌ ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์•„๋ž˜์™€ ๊ฐ™์ด ์ €์žฅ์„ ํ•  ๋•Œ๋Š” 'wb'๋กœ argument๋ฅผ ์ง€์ •ํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

import pickle

# Save Model
with open('model.pkl', 'wb') as file:
    pickle.dump(model, file)

# Load Model
with open('model.pkl', 'rb') as file:
    loaded_model = pickle.load(file)

๐Ÿ˜€ Solution

์ œ๊ฐ€ C์–ธ์–ด ๊ฐ์ฒด๋ฅผ ์บ์น˜ํ•ด์•ผ ํ•  ๋ถ€๋ถ„์ด๋ผ๊ณ  ๋งํ•œ ์ด์œ ๋Š” Wrapper๋ผ๋Š” ๊ฐœ๋… ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

โœ… Wrapper

๋ž˜ํผ๋ž€ ํ”„๋กœ๊ทธ๋ž˜๋ฐ์—์„œ ๋‹ค๋ฅธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋‚˜ ๋ชจ๋“ˆ์„ ๋” ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ํ™•์žฅํ•˜๊ธฐ ์œ„ํ•ด ์ค‘๊ฐ„์— ๊ฐ์‹ธ์ฃผ๋Š” ์—ญํ• ์„ ํ•˜๋Š” ์ฝ”๋“œ๋‚˜ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋ž˜ํผ๋Š” ๊ธฐ์กด์˜ ๊ธฐ๋Šฅ์„ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š์œผ๋ฉด์„œ ํŽธ๋ฆฌํ•˜๊ฒŒ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

 

XGB์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ž˜ํผ์— ๋Œ€ํ•ด 2๊ฐ€์ง€๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

1๏ธโƒฃ ์‚ฌ์ดํ‚ท๋Ÿฐ ๋ž˜ํผ (Scikit-Learn Wrapper)

์‚ฌ์ดํ‚ท๋Ÿฐ ๋ž˜ํผ๋Š” ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ ์ผ๊ด€๋œ ์ธํ„ฐํŽ˜์ด์Šค์™€ ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๊ณ  ์กฐํ•ฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ด์ค๋‹ˆ๋‹ค. ์‚ฌ์ดํ‚ท๋Ÿฐ ๋ž˜ํผ๋Š” ํ†ต์ผ๋œ ๋ฉ”์„œ๋“œ์™€ API๋ฅผ ์ œ๊ณตํ•˜์—ฌ ๋ชจ๋ธ์„ ์ดˆ๊ธฐํ™”, ํ›ˆ๋ จ, ์˜ˆ์ธก, ํ‰๊ฐ€ ๋“ฑ์˜ ์ž‘์—…์„ ๊ฐ„ํŽธํ•˜๊ฒŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค.

2๏ธโƒฃ ํŒŒ์ด์ฌ ๋ž˜ํผ (Python Wrapper)

ํŒŒ์ด์ฌ ๋ž˜ํผ๋Š” ๋‹ค๋ฅธ ์–ธ์–ด๋กœ ์ž‘์„ฑ๋œ ์ฝ”๋“œ๋‚˜ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํŒŒ์ด์ฌ์—์„œ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ฐ์‹ธ์ฃผ๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ํŒŒ์ด์ฌ ๋ž˜ํผ๋Š” C๋‚˜ C++๋กœ ์ž‘์„ฑ๋œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํŒŒ์ด์ฌ ์ฝ”๋“œ์—์„œ ํ˜ธ์ถœํ•˜๊ณ  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

2๏ธโƒฃ ๋ž˜ํผ์˜ ์ฐจ์ด๊ฐ€ ๋“œ๋Ÿฌ๋‚˜๋Š” ๊ณณ์€?

๊ทธ๋ž˜์„œ ๋‘˜์˜ ์ฐจ์ด๋Š” ์–ด๋–ป๊ฒŒ ๋“œ๋Ÿฌ๋‚˜๋Š”๊ฐ€๊ฐ€ ๊ถ๊ธˆํ•ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด, ๊ฐ™์€ XGB๋ฅผ ์“ฐ๋Š”๋ฐ '์ด๊ฒŒ ๋ฌด์Šจ ์†Œ๋ฆฌ์ธ๊ฐ€' ์‹ถ์—ˆ์Šต๋‹ˆ๋‹ค.

๋ž˜ํผ๋ผ๋Š” ๊ฒƒ์€ ๋ง ๊ทธ๋Œ€๋กœ ์ธํ„ฐํŽ˜์ด์Šค์™€ ๊ธฐ๋Šฅ์„ ๋œปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋Ÿฐ ์ฐจ์ด๊ฐ€ ๋‚˜๋Š” ๋ถ€๋ถ„์„ ๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ณณ์€ ํ•จ์ˆ˜, ํด๋ž˜์Šค, ๋ฉ”์„œ๋“œ, ํŒŒ๋ผ๋ฏธํ„ฐ์—์„œ ๋“œ๋Ÿฌ๋‚ฉ๋‹ˆ๋‹ค.

Ref. https://jaaamj.tistory.com/39

 

XGBoost ์‹ค์Šต - ์‚ฌ์ดํ‚ท๋Ÿฐ ๋ž˜ํผ -

์‚ฌ์ดํ‚ท๋Ÿฐ์—์„œ๋Š” XGBoost ๊ด€๋ จ ๋ž˜ํผ๋Š” ํฌ๊ฒŒ ๋‘๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ”๋กœ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ํด๋ž˜์Šค XGBClassifier, ํšŒ๊ท€๋ฅผ ์œ„ํ•œ ํด๋ž˜์Šค XGBRegressor ์ž…๋‹ˆ๋‹ค. ์•ž์— ํฌ์ŠคํŒ…์—์„œ ํŒŒ์ด์ฌ ๋ž˜ํผ XGBoost์™€ ์‚ฌ์ดํ‚ท๋Ÿฐ

jaaamj.tistory.com

โœ… Solve

๊ทธ๋ž˜์„œ ์ด๋Ÿฌํ•œ 2๊ฐ€์ง€ ๋ž˜ํผ๋ฅผ ํ†ตํ•ด XGB๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— pickling ๋ฌธ์ œ์˜ ์›์ธ์„ 2๊ฐ€์ง€๋กœ ์‚ผ์•˜์Šต๋‹ˆ๋‹ค.

  1. ํŒŒ์ด์ฌ ๋ž˜ํผ๋ฅผ ์‚ฌ์šฉํ•œ XGB์—์„œ ์ž˜๋ชป๋œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹
  2. ํŒŒ์ด์ฌ ๋ž˜ํผ๋ฅผ ์‚ฌ์šฉํ•œ XGB์™€ ์‚ฌ์ดํ‚ท๋Ÿฐ ๋ž˜ํผ๋ฅผ ์‚ฌ์šฉํ•œ MultiOutputRegressor์™€์˜ ์ถฉ๋Œ

์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ํŒŒ์ด์ฌ ๋ž˜ํผ ๊ธฐ๋ฐ˜์˜ XGB๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋ณด๋‹ค๋Š” ์‚ฌ์ดํ‚ท๋Ÿฐ ๋ž˜ํผ ๊ธฐ๋ฐ˜์˜ XGB๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—๋Ÿฌ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

xgb_model = XGBRegressor(n_estimators=400,
                        learning_rate=0.1,
                        max_depth=3,
                        )

multioutput_model = MultiOutputRegressor(xgb_model).fit(train_X.values, train_Y.values)

์ด ์ฝ”๋“œ์—์„œ ๋งจ ์œ„ ์ฝ”๋“œ์™€ ๊ฐ™์ด Validation Set์„ ๋„ฃ์ง€ ์•Š์€ ์ด์œ ๋Š” ์—๋Ÿฌ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ณผ์ •์—์„œ MultiOutputRegressor๋Š” ์ด๋Ÿฐ eval_set์— ๋Œ€ํ•ด ์ง€์›ํ•˜์ง€ ์•Š์•„์„œ ๋„ฃ์–ด์ฃผ๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ง์ ‘์ ์ธ ๋ชจ๋“ˆ ์ˆ˜์ •์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฑธ ์•Œ๊ฒŒ ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋„ฃ์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

728x90