[실전 예제/객체 탐지/PyTorch] 객체 검출 튜토리얼: DOTA 데이터셋으로 PyTorch 데이터셋 만들기

객체 검출(Object Detection)이란?

객체 검출(Object Detection)은 이미지 속의 객체의 종류(class)와 위치(bounding box)를 동시에 예측하는 비전 태스크입니다. DOTA (Dataset for Object deTection in Aerial images)는 드론, 위성, 항공기에서 촬영한 고해상도 이미지에서 객체를 탐지하기 위한 데이터셋입니다. 일반적인 Object Detection 문제보다 훨씬 복잡한 배경과 회전된 객체가 많은 것이 특징입니다. 이번 시간에는 PyTorch를 이용하여 객체 탐지 데이터셋을 만드는 방법에 대해 알아보도록 하겠습니다.

DOTA vs 일반 데이터셋

DOTA: 회전 박스(8 좌표 or 각도), 4K 이상의 고해상도, 모든 방향 존재 (비대칭)

Pascal VOC/ COCO: 수직 (xmin, ymin, xmax, ymax), 보통 ~ 512x512, 정렬된 물체

PyTorch로 DOTA데이터셋 만들기

DOTA 데이터셋 특징

약 2,800장의 이미지
15개의 클래스
최대 4,000×4,000 이상의 해상도 (Patch로 쪼개 사용)
188,000개 이상의 객체
회전 박스 (Rotated Bounding Box), 8점 좌표의 바운딩 박
.txt (폴리곤 좌표 + 클래스명 + 난이도)의 라벨 포

DOTA 라벨 포맷 예시 (.txt 파일)

앞 8개 숫자: 폴리곤(x1, y1, x2, y2, x3, y3, x4, y4)
클래스 이름: plane, ship, ...
난이도: 0 (easy), 1 (mid), 2 (hard)

1156.0 425.0 1220.0 425.0 1220.0 473.0 1156.0 473.0 plane 0
396.0 395.0 424.0 395.0 424.0 427.0 396.0 427.0 small-vehicle 0

PyTorch 코드 예제

import os
import torch
from torch.utils.data import Dataset
from PIL import Image

class DOTADataset(Dataset):
    def __init__(self, image_dir, anno_dir, transform=None, classes=None):
        self.image_dir = image_dir
        self.anno_dir = anno_dir
        self.image_files = [f for f in os.listdir(image_dir) if f.endswith('.png')]
        self.transform = transform
        self.classes = classes or [
            'plane', 'ship', 'storage-tank', 'baseball-diamond',
            'tennis-court', 'basketball-court', 'ground-track-field',
            'harbor', 'bridge', 'large-vehicle', 'small-vehicle',
            'helicopter', 'roundabout', 'soccer-ball-field', 'swimming-pool'
        ]
        self.class_to_idx = {c: i + 1 for i, c in enumerate(self.classes)}  # 0은 background

    def __getitem__(self, idx):
        img_name = self.image_files[idx]
        img_path = os.path.join(self.image_dir, img_name)
        anno_path = os.path.join(self.anno_dir, img_name.replace('.png', '.txt'))

        image = Image.open(img_path).convert("RGB")
        polygons, labels = [], []

        with open(anno_path, 'r') as f:
            for line in f:
                parts = line.strip().split()
                if len(parts) < 9: continue
                coords = list(map(float, parts[:8]))  # 4개 꼭짓점
                class_name = parts[8]
                if class_name not in self.class_to_idx:
                    continue
                polygons.append(coords)
                labels.append(self.class_to_idx[class_name])

        target = {
            'polygons': torch.tensor(polygons, dtype=torch.float32),
            'labels': torch.tensor(labels, dtype=torch.int64)
        }

        if self.transform:
            image = self.transform(image)

        return image, target

    def __len__(self):
        return len(self.image_files)

DOTA에 YOLO 적용 시 주의할 점

YOLOv5, YOLOv8은 기본적으로 수직 박스(xyxy)만 처리하므로, 회전 박스를 사용할 경우 다음이 필요합니다.

폴리곤 → 최소경계사각형으로 변환 (cv2.minAreaRect 등 사용)
좌표 변환: 8점 → (cx, cy, w, h, angle)
YOLO-Oriented 모델 사용 (YOLOv5-OBB, MMRotate 등)

마무리

PyTorch를 이용하여 객체 검출 데이터셋을 어떻게 만드는지 살펴보았습니다. 다음 시간에는 모델 구성 및 학습 방법을 PyTorch로 작성하는 방법을 알아보도록 하겠습니다.

파이썬 정복하기