Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Data 공부

CNN을 통한 구름 분류 본문

Data 분석

CNN을 통한 구름 분류

Junseokk 2021. 8. 25. 19:33

CNN을 통한 구름 분류

어느 날 하늘을 쳐다보았을 때 구름이 너무 이쁘다고 느껴진 경험은 다들 한번씩 가지고 있습니다.
매일 보는 구름이지만 그 구름의 종류를 알게 된다면 더욱 멋지게 느껴질 것입니다.

- 목표 : CNN을 활용하여 구름을 분류 및 구름 종류 확인

- 사용 DATA : CCSN DATA SET

https://www.kaggle.com/nakendraprasathk/cloud-image-classification-dataset

Zhang, J. L., Liu, P., Zhang, F., & Song, Q. Q. ( 2018). CloudNet: Ground‐based cloud classification with deep convolutional neural network. Geophysical Research Letters, 45, 8665– 8672. https://doi.org/10.1029/2018GL077787

- DATA SET : 구름의 종류 별로 200여개의 jpg file 존재

상층    Cirrocumulus(Ci) Cirrus(Cc) Cirrostratus(Cs) Contrail(Ct)
준층    Altostratus(Ac) Altocumulus(As)z
하층 Stratus(St) Stratocumulus(Sc) Nimbostratus(Ns)
수직운 Cumulonimbus(Cb)   Cumulusk(Cu)

- 사용 module

1
2
3
4
5
6
7
8
9
10
11
12
13

from PIL import Image
import os, glob, numpy as np
import os
from sklearn.model_selection import train_test_split
import os, glob, numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from keras.callbacks import EarlyStopping, ModelCheckpoint
import matplotlib.pyplot as plt
import keras.backend.tensorflow_backend as K
from keras.preprocessing.image import img_to_array, load_img, array_to_img
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator

cs

 

- CCSN Data 의 데이터 수는 모델을 구축하기에는 부족합니다.

- ImageDataGenerator 를 활용하여 각 구름 종류별 데이터의 수를 1500개 이상으로 늘립니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

def IDG(fname): #파일이름을 변수로 받는 ImageDataGenerator함수
   ImageDG = ImageDataGenerator(    rescale = 1. / 255,
                                     rotation_range=15, # 무작위 회전의 각도 범위
                                     width_shift_range=0.1, # 수평방향 범위내 이미지 이동
                                     height_shift_range = 0.1, # 수직방향 범위내 이미지 이동
                                     horizontal_flip=True, # 무작위 가로뒤집기
                                      zoom_range = 0.1,   # 무작위 확대/축소 범위
                                     fill_mode='nearest') # 인풋 경계의 바깥공간 채우는 방식
    img = tf.keras.preprocessing.image.load_img(fname) # 이미지파일로 변환
    x = img_to_array(img)
    x = x.reshape((1,) + x.shape)
    i=0
    save = fname.split('/')[0] + "/" + fname.split('/')[1]+ "/" + fname.split('/')[2] + "/" + fname.split('/')[3] # 생성된 파일 저장경로
   for batch in ImageDG.flow(x, batch_size=1, save_to_dir = save, save_prefix='new'+str(file_name_freq),
                                      save_format='jpg'):
       i+=1
       if i>7: # 새로 생성되는 데이터 개수 정해주기
           break
Colored by Color Scripter

cs



1
2
3
4
5
6
7
8

folder_list = os.listdir('./archive/data/train')
fname =  "./archive/data/train/"
for f in folder_list:
    fname =  "./archive/data/train/" + f+"/"
    file_list = os.listdir(fname)
    for i in file_list:
        filename = fname + i
        IDG(filename)

cs

 

- Image 데이터를 학습 데이터로 변환

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

img_dir =  "./archive/data/train/"
categories = os.listdir(img_dir) 
num_classes = len(categories)
 
image_w = 64  #64*64*3 사이즈로 조정
image_h = 64
 
pixel=  image_w * image_h * 3 
X=[]
y=[]
 
for idx, cat in enumerate(categories): # 카테고리를 enumerate를 이용하여 카테고리와 인덱스 사용
    img_dir_detail = img_dir + '/' + cat
    files = glob.glob(img_dir_detail + "/*.jpg")
    for i,f in enumerate(files):
        try:
            img = Image.open(f)
            img = img.convert('RGB')
            img = img.resize((image_w,image_h)) #이미지의 사이즈를 조정
            data = np.asarray(img)
            X.append(data)
            y.append(idx)
            if i % 300 == 0 : # 300번쨰 마다 프린트
                print(cat, " : ", f)
        except:
            print(cat,str(i)," 번째에서 에러")
            
X = np.array(X)  #array로 변환
y = np.array(y)  #array로 변환
 
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3) #train test 구분

cs

- 학습데이터 가공

1
2
3
4
5
6
7
8
9
10
11
12
13

print(X_train.shape) # 데이터 크기 확인
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
 
# img를 array로 변환시 0~255의 값을 가지는데 이것을 0~1로 변환
X_train = X_train.astype(float) / 255.0 
X_test = X_test.astype(float) / 255.0
 
from keras.utils import to_categorical
# 기존의 1의 값을가지는 y값을 [0,1,0,0,---]와 같이 변환
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

cs

 

- 모델 구축

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

image_w = 64
image_h = 64
 
with K.tf_ops.device('/device:CPU:0'): #가동할 GPU가 없어 CPU로 설정
    model = Sequential() # 모델 선언
  
    # (Convolution layer , Pooling layer) 2개 다중분류여서 activation 은 모두 relu사용
    model.add(Conv2D(32, (3,3), padding="same", input_shape=X_train.shape[1:], activation="relu")) 
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Dropout(0.25)) #과적합 방지 
              
    model.add(Conv2D(64, (3,3), padding="same", activation="relu"))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Dropout(0.25))
    
    #FC layer
    model.add(Flatten())
    model.add(Dense(256, activation = 'relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation = 'softmax'))
    
    #loss= 다중분류 categorical사용          
    model.compile(loss = 'categorical_crossentropy', optimizer = 'adam',metrics=['accuracy'])
    
    model_dir = './model'
    model_path = model_dir + "/cloud_classify.model"
    # 모델결과 저장 및 earlystopping
    checkpoint = ModelCheckpoint(filepath = model_path, monitor='val_loss', verbose = 1, save_best_only = True)
    early_stopping = EarlyStopping(monitor = 'val_loss', patience = 6)
Colored by Color Scripter

cs

- model.summary()

- 학습 및 정확도, loss 확인

test set에 대한 정확도가 예상보다 낮으며 loss 역시 너무 높게 나왔다.

Conv2D layer를 2개 더 쌓아 다시 학습을 해보기로 한다.

- model2 구축

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

with K.tf_ops.device('/device:CPU:0'):
    model2 = Sequential()
  
    model2.add(Conv2D(32, (3,3), padding="same", input_shape=X_train.shape[1:], activation="relu"))
    model2.add(MaxPooling2D(pool_size=(2,2)))
    model2.add(Dropout(0.25))
              
    model2.add(Conv2D(64, (3,3), padding="same", activation="relu"))
    model2.add(MaxPooling2D(pool_size=(2,2)))
    model2.add(Dropout(0.25))
    
    model2.add(Conv2D(128, (3,3), padding="same", activation="relu")) #새로추가
    model2.add(Conv2D(128, (3,3), padding="same", activation="relu")) #새로추가
    model2.add(MaxPooling2D(pool_size=(2,2)))
    model2.add(Dropout(0.25))
    
    model2.add(Flatten())
    model2.add(Dense(256, activation = 'relu'))
    model2.add(Dropout(0.5))
    model2.add(Dense(num_classes, activation = 'softmax'))
              
    model2.compile(loss = 'categorical_crossentropy', optimizer = 'adam',metrics=['accuracy'])
    
    model_dir = './model2'
    model_path = model_dir + "/cloud_classify.model2"
    
    checkpoint = ModelCheckpoint(filepath = model_path, monitor='val_loss', verbose = 1, save_best_only = True)
    early_stopping = EarlyStopping(monitor = 'val_loss', patience = 6)
Colored by Color Scripter

cs

- model2 정확도, loss 확인

model2는 기존의 model 보다는 좋은 성능을 보이지만 아직 실제로 사용하기엔 부족합니다.

하지만 CPU를 이용해서 model을 구축하는데에 너무 많은 시간이 소요되므로 후에 조건이 갖춰진다면 더 발전된 모델을 만들어 보기로 계획을 세웠습니다.

▶ CNN model의 성능 높이기

1. Layer 추가

2. 1x1 Conv filter 사용 ( 파라미터수 감소 )

3. Residual learning block (공부필요)

4. Depth-wise Separable Convolution (공부필요)

5. 훈련 data 보강

- 구축된 모델 활용 (훈련에 사용하지 않은) test_set 구름종류 예측

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

from keras.models import load_model
 
path = './archive/data/test/'
category = os.listdir('./archive/data/train')
 
image_w = 64
image_h = 64
 
pixels = image_h * image_w * 3
 
X = []
filenames = []
files = glob.glob(path+"/*.*")
for f in files:
    img = Image.open(f)
    img = img.convert("RGB")
    img = img.resize((image_w, image_h))
    data = np.asarray(img)
    filenames.append(f)
    X.append(data)
 
X = np.array(X)
prediction_test = model.predict(X)
 
file_index = 0
for i in prediction_test:
    label = i.argmax() # [0.000, 0.000, 0.000, ..., 0.000, 1.000, 0.000] 중 최대값 추출 즉,1값의 인덱스
    print("////////////////////")
    print( filenames[file_index].split('\\')[-1] + "의 예측되는 구름종류 : " + category[label])
    file_index  = file_index+1
Colored by Color Scripter

cs

3.jpg의 예측값 : Cb(적란운) / 32.jpg의 예측값 : Cc(권적운) / 182.jpg의 예측값 : Ct(비행운)

→ 3.jpg는 육안으로 봤을 때 적란운과 적운의 비교가 어려워 정확히 예측하였는지 알기 힘들지만

32.jpg와 182.jpg의 예측은 정확하다.

- 직접찍은 이미지와 비교

'1.jpg'의 예측결과 St는 정확한 예측.

'2.jpg'의 예측결과 Ac는 정확한 예측.

'3.jpg'는 예측결과 Ac 보다는 Cs에 더 가까운 구름의 형태이다. (부정확한 예측)

- 한계

1. model의 정확도가 만족스럽지 못하다.

2. 구름의 특성상 각도에 따라 보이는 구름의 형태가 달라질 수 있다. (비슷한 형태의 구름 종류끼리 구분이 쉽지 않다.)

- 더 발전할 수 있는 idea

1. img file 구름 부분 touch시 구름의 종류 floating하기

2. 이전의 사진 파일을 통한 날씨확인과 연계하여 낙뢰 예측

- 낙뢰는 대부분 Cb(적란운)일 때 발생한다. 날씨, 구름 data를 사용하여 낙뢰가 일어나는 조건을 분석

저작자표시 동일조건

'Data 분석' 카테고리의 다른 글

[A/B Test] E-commerce conversion ab test data (0)	2024.06.27
News Map 만들기 - PyQt5 (0)	2021.09.09
BTS의 성공요인 분석 - 토픽모델링과 소셜 네트워크 분석 활용 (0)	2021.08.26
휴대폰 사진 정보를 활용한 날씨확인 (0)	2021.08.25