论文标题
空中解析:从瓷砖级场景分类到像素的语义标签
Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling
论文作者
论文摘要
给定航空图像,空中场景解析(ASP)目标可以通过为图像的每个像素分配语义标签来解释图像内容的语义结构,例如。随着数据驱动方法的普及,过去几十年来,当使用高分辨率的空中图像时,通过通过瓷砖级场景分类或基于分割的图像分析的方案来解决问题,从而在ASP上见证了有希望的进展。但是,前一种方案通常会以瓷砖的边界产生结果,而后者则需要处理从像素到语义的复杂建模过程,这通常需要具有像素语义标签的大规模和良好的图像样品。在本文中,我们在ASP中解决了这些问题,从瓷砖级场景分类到像素语义标签的角度。具体而言,我们首先通过文献综述重新审视空中图像解释。然后,我们提供一个大规模的场景分类数据集,其中包含一百万个称为百万富翁的空中图像。在介绍的数据集中,我们还使用经典卷积神经网络(CNN)报告了基准测试实验。最后,我们通过统一图块级场景分类和基于对象的图像分析来执行ASP,以实现像素语义标签。密集的实验表明,百万艾滋病是一个具有挑战性但有用的数据集,可以作为评估新开发算法的基准。当从数百万辅助的CNN模型中转移知识时,在百万富翁中预读的CNN模型始终如一地表现优于那些用于空中现场分类的Imagenet。此外,我们设计的层次多任务学习方法在具有挑战性的GID上实现了最先进的像素分类,将瓷砖级的场景分类链接到空中图像解释的瓷砖级别的场景分类。
Given an aerial image, aerial scene parsing (ASP) targets to interpret the semantic structure of the image content, e.g., by assigning a semantic label to every pixel of the image. With the popularization of data-driven methods, the past decades have witnessed promising progress on ASP by approaching the problem with the schemes of tile-level scene classification or segmentation-based image analysis, when using high-resolution aerial images. However, the former scheme often produces results with tile-wise boundaries, while the latter one needs to handle the complex modeling process from pixels to semantics, which often requires large-scale and well-annotated image samples with pixel-wise semantic labels. In this paper, we address these issues in ASP, with perspectives from tile-level scene classification to pixel-wise semantic labeling. Specifically, we first revisit aerial image interpretation by a literature review. We then present a large-scale scene classification dataset that contains one million aerial images termed Million-AID. With the presented dataset, we also report benchmarking experiments using classical convolutional neural networks (CNNs). Finally, we perform ASP by unifying the tile-level scene classification and object-based image analysis to achieve pixel-wise semantic labeling. Intensive experiments show that Million-AID is a challenging yet useful dataset, which can serve as a benchmark for evaluating newly developed algorithms. When transferring knowledge from Million-AID, fine-tuning CNN models pretrained on Million-AID perform consistently better than those pretrained ImageNet for aerial scene classification. Moreover, our designed hierarchical multi-task learning method achieves the state-of-the-art pixel-wise classification on the challenging GID, bridging the tile-level scene classification toward pixel-wise semantic labeling for aerial image interpretation.
