๊ด€๋ฆฌ ๋ฉ”๋‰ด

Doby's Lab

Dataset Shuffle์„ ํ•ด์•ผ ํ•˜๋Š” ์ด์œ  ๋ณธ๋ฌธ

Code about AI/tensorflow

Dataset Shuffle์„ ํ•ด์•ผ ํ•˜๋Š” ์ด์œ 

๋„๋น„(Doby) 2022. 12. 25. 22:41

๐Ÿค” Problem

๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ณผ์ •์—์„œ ์ด์ƒํ•œ ํ˜„์ƒ์ด ๋‚˜ํƒ€๋‚˜ ๋ฌธ์ œ์ ์œผ๋กœ ์‚ผ์•˜์Šต๋‹ˆ๋‹ค.

๋ฌธ์ œ๋Š” train_set์˜ Accuracy๋Š” 100%๋กœ ์œ ์ง€๋˜๋ฉฐ Loss๋Š” ๋ณ€๋™์ด ๊ฑฐ์˜ ์—†๋‹ค๊ฐ€ ๊ฐ‘์ž๊ธฐ ํ•œ ๋‘ ๋ฒˆ ์—„์ฒญ ์ƒ์Šนํ•˜๋Š” ํ˜„์ƒ์ด์—ˆ์Šต๋‹ˆ๋‹ค.

validation_set์—์„œ๋„ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. Accuracy๊ฐ€ 100%๋กœ ์œ ์ง€๋˜๊ณ , Loss๋Š” 0์œผ๋กœ ์œ ์ง€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฐ ์™„๋ฒฝํ•œ ๋ชจ๋ธ์ด ํ˜„์‹ค์—์„œ ์กด์žฌํ•  ์ˆ˜ ์—†์„๋ฟ๋”๋Ÿฌ ์™„๋ฒฝํ–ˆ๋‹ค๋ฉด test_set์—์„œ๋„ ์ข‹์€ ๊ฒฐ๊ณผ๊ฐ€ ์žˆ์—ˆ์–ด์•ผ ํ•˜์ง€๋งŒ, Accuracy๋Š” ์•ฝ 50%, Loss๋Š” ์•ฝ 84.4๊ฐ€ ๋‚˜์™”์Šต๋‹ˆ๋‹ค.


๐Ÿ˜€ Solution

๋ฌธ์ œ์ ์€ ๋ฐ์ดํ„ฐ์…‹์ด ์•„์˜ˆ ์„ž์—ฌ์žˆ์ง€ ์•Š์•˜๊ธฐ์— ๋ฐœ์ƒํ–ˆ๋˜ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.

๋ฌธ์ œ์— ๋Œ€ํ•œ ์ด์œ  ์ถ”์ธก

๋ชจ๋ธ์ด ๊ฐœ์— ๋Œ€ํ•ด์„œ๋งŒ ํ•™์Šตํ–ˆ๋‹ค๊ฐ€ ๊ณ ์–‘์ด๊ฐ€ ๋“ค์–ด์˜ค๋‹ˆ Loss๊ฐ€ ๊ฐ‘์ž๊ธฐ ํŠ€์–ด ์˜ค๋ฅด๋Š” ๊ฒŒ ์„ค๋ช…์ด ๋ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ, validation_set์—์„œ๋Š” train_set์—์„œ ์ผ๋ถ€๋ฅผ ๋–ผ์–ด์˜ค๊ธฐ ๋•Œ๋ฌธ์— ๊ฐœ์— ๋Œ€ํ•ด์„œ๋งŒ ํ•™์Šตํ•˜๋˜ ๋ชจ๋ธ์„ ๊ฐœ๋กœ๋งŒ ๊ฒ€์ฆํ•˜๋‹ˆ 100%๊ฐ€ ๋‚˜์˜ฌ ์ˆ˜๋ฐ–์— ์—†์—ˆ์ฃ .

 

๊ทธ๋Ÿผ ๋ฐ์ดํ„ฐ์…‹์„ ์„ž์–ด์•ผ ํ•˜๋Š”๋ฐ ์–ด๋–ป๊ฒŒ ์„ž์„ ์ˆ˜ ์žˆ์„๊นŒ์š”?

 

์šฐ์„ , train_set๊ณผ train_target ๊ฐ„์˜ ์ธ๋ฑ์Šค์˜ ๊ด€๊ณ„๋Š” ๊ณ„์† ์ผ์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์„ž์˜€๋‹ค๊ฐ€๋Š” ๋ฐ์ดํ„ฐ์…‹์˜ ์˜๋ฏธ๊ฐ€ ์—†์–ด์ง€์ฃ .

์ด๋ฅผ ๋” pythonicํ•˜๊ฒŒ ์งœ๊ธฐ ์œ„ํ•ด zip์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

shuffle_data = [[x, y] for x, y in zip(train_set, train_target)]

zip์„ ํ†ตํ•ด ๋ฌถ์–ด์„œ ํ•˜๋‚˜์˜ sample์„ ๋ฆฌ์ŠคํŠธ๋กœ ๋งŒ๋“ค์–ด๋ฒ„๋ฆฝ๋‹ˆ๋‹ค.

 

์ด์ œ ์„ž๊ธฐ ์œ„ํ•ด์„œ random์ด๋ผ๋Š” ๋ชจ๋“ˆ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.

random.shuffle(shuffle_data)

x_train = [n[0] for n in shuffle_data]
y_train = [n[1] for n in shuffle_data]

random.shuffle์ด๋ผ๋Š” ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์„ž์–ด์ฃผ๊ณ , index 0์— ์ €์žฅ๋œ train_set์„ ๊ฐ€์ ธ์˜ค๊ณ , index 1์— ์ €์žฅ๋œ train_target์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.


๐Ÿ“‚ Reference

https://jybaek.tistory.com/781

 

[python] ์‚ฌ์šฉ์ž ๋ฐ์ดํ„ฐ์…‹ ์…”ํ”Œ

์ด๋ฏธ ์ž˜ ๊ตฌํ˜„๋˜์–ด ์žˆ๋Š” ์†Œ์Šค์ฝ”๋“œ์™€ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ๊ฒฝ์šฐ์—๋Š” ๋ชจ๋ธ ๋‚ด์— shuffle ์ž์ฒด๊ฐ€ ๊ตฌ์ถ•๋˜์–ด ์žˆ๋Š” ๋ฐ˜๋ฉด์— ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘๊ณผ ์ •์ œ, ๋ชจ๋ธ ๊ตฌ์ถ•๊นŒ์ง€ ๋ฐ‘๋ฐ”๋‹ฅ์—์„œ๋ถ€ํ„ฐ ์Œ“์•„ ์˜ฌ๋ฆฌ๋‹ค๋ณด๋ฉด

jybaek.tistory.com

728x90