๊ด€๋ฆฌ ๋ฉ”๋‰ด

Doby's Lab

Normalization, uint8 -> float64 RAM, ๋Ÿฐํƒ€์ž„ ๋‹ค์šด๋˜๋Š” ํ˜„์ƒ ๋ณธ๋ฌธ

Code about AI/tensorflow

Normalization, uint8 -> float64 RAM, ๋Ÿฐํƒ€์ž„ ๋‹ค์šด๋˜๋Š” ํ˜„์ƒ

๋„๋น„(Doby) 2022. 12. 25. 17:27

๐Ÿค” Problem

์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์…‹์„ ์ „์ฒ˜๋ฆฌํ•˜๋Š” ๊ณผ์ •์—์„œ Min Max Scaling ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด Normalization์„ ํ•ด์ฃผ๋ ค ํ–ˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฏธ์ง€ Dataset์ด๊ธฐ ๋•Œ๋ฌธ์— min๊ฐ’๊ณผ max๊ฐ’์ด ๊ฐ๊ฐ 0, 255์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ์…‹์— 255.0์„ ๋‚˜๋ˆ„์–ด์ฃผ๋ฉด ๋˜์ฃ .
(+255.0์œผ๋กœ ๋‚˜๋ˆ„๋Š” ์ด์œ ๋Š” 255.0์œผ๋กœ ๋‚˜๋ˆ”์œผ๋กœ์จ int type์—์„œ float type์œผ๋กœ type casting์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.)

์ฆ‰, ์ฝ”๋“œ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ์ด๋Ÿฐ ์‹์œผ๋กœ ํ•ด์ฃผ๋ฉด ๋˜๊ฒ ์ฃ .

def minmax_scaler(dataset)
	dataset = dataset / 255.0
    return dataset


์—ฌ๊ธฐ์„œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์‹œ์Šคํ…œ RAM์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•ด๋ณด๋ฉด RAM์ด ์ดˆ๊ณผ๋˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
RAM์ด ์ดˆ๊ณผ๋จ์— ๋”ฐ๋ผ์„œ ๋Ÿฐํƒ€์ž„๋„ ๋‹ค์šด๋˜์–ด ๋ฒ„๋ฆฝ๋‹ˆ๋‹ค.


๐Ÿ˜€ Solution

๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ์…‹์€ ์ด๋ฏธ์ง€์˜ ํ”ฝ์…€ ๊ฐ’๋“ค์„ ์ „๋ถ€ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ”ฝ์…€ ๊ฐ’์˜ ๋ฒ”์œ„๋Š” 0 ~ 255์ž…๋‹ˆ๋‹ค.
ํ”ฝ์…€ ๊ฐ’์˜ ๋ฒ”์œ„์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ์…‹์€ uint8 type์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

Min Max Scaling์„ ํ•˜๋ฉด์„œ 255.0์œผ๋กœ ๋‚˜๋ˆ„๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ์…‹์€ float64 type์œผ๋กœ ๋ฐ”๋€๋‹ˆ๋‹ค.
์ด์— ๋”ฐ๋ผ ํ•˜๋‚˜์˜ ํ”ฝ์…€ ๋‹น 8 bit๋งŒ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์—ˆ๋Š”๋ฐ 64bit๋กœ ๋ฐ”๋€Œ๋ฉด์„œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ™• ์žก์•„๋จน๊ฒŒ ๋˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 8๋ฐฐ๋‚˜ ๋Š˜์–ด๋‚˜๋ฉด์„œ 8,000์žฅ์˜ ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์…‹์€ ์•ฝ 4.19GB์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ฐจ์ง€ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ์…‹์„ ๋ถ„์‚ฐ์ ์œผ๋กœ ๋‚˜๋ˆ„์–ด ๊ฐ€์ ธ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋ถ„์‚ฐ์ ์œผ๋กœ ๋‚˜๋ˆ„์–ด ๊ฐ€์ ธ์˜ค๋ฉด์„œ Normalization์„ ํ•˜๊ณ , ๋ถ€๋ถ„ ๋ฐ์ดํ„ฐ์…‹์„ ๋ฒ„๋ ค์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ์…‹์„ ๋ฒ„๋ฆฌ๋Š”(= ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋น„์šฐ๋Š”) ๋ฐฉ๋ฒ•์€ ํŒŒ์ด์ฌ์—์„œ๋Š” del์ด๋ผ๋Š” ๊ฐ์ฒด๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์—์„œ ์ง€์›Œ๋ฒ„๋ฆฌ๋Š” ํ‚ค์›Œ๋“œ๋ฅผ ์ด์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
์ฝ”๋“œ๋ฅผ ์•„๋ž˜์ฒ˜๋Ÿผ ์งœ์ฃผ๋ฉด ๋Ÿฐํƒ€์ž„์ด ๋‹ค์šด๋˜๋Š” ์ด์Šˆ๋Š” ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

dataset = []

for dataset_pt_path in dataset_path:
    dataset_pt = np.load(dataset_pt_path, allow_pickle=True)
    
    for x in dataset_pt:
    	dataset.append(minmax_scaler(x))
    
    del dataset_pt_scaled
   
dataset = np.array(dataset)

์ฆ‰, ๋ฐ์ดํ„ฐ์…‹์„ ๋ถ„์‚ฐ์ ์œผ๋กœ ๋กœ๋“œํ•˜์—ฌ ๋กœ๋“œํ•œ ๋ถ€๋ถ„ ๋ฐ์ดํ„ฐ์…‹์€ ์‚ฌ์šฉ ํ›„(= fit), del ํ‚ค์›Œ๋“œ๋ฅผ ํ†ตํ•ด ์‚ญ์ œํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.


๐Ÿ“‚ Reference

https://stackoverflow.com/questions/69288593/how-to-prevent-ram-from-filling-up-in-image-classification-dl

 

How to prevent RAM from filling up in image Classification (DL)

I am new in ML and have a problem. I have 10000 images (300,300)px. I want to give them into the DL model for classification. but my problem is when I Normalize the images (convert each image from ...

stackoverflow.com

728x90