GroundingCarDD:

Text-Guided Multimodal Phrase Grounding for Car Damage Recognition


RMIT University

Abstract

The rising incidence of vehicle accidents and the growth of online car marketplaces underscore the critical need for efficient and accurate car damage inspection in rental services, insurance claims processing, and vehicle maintenance. Traditional manual inspection methods are time-consuming, prone to errors, and vulnerable to fraud, emphasizing the necessity for automated solutions. While existing automated object detection models are effective in many scenarios, they struggle to accurately identify minor damages on car bodies and distinguish actual damage from car features, reflections, or shadows that resemble damage. This limitation stems from a lack of effective methods to extract distinct features of minor damages and a lack of contextual understanding beyond visual cues. To address this challenge, we introduce GroundingCarDD, a text-guided multimodal phrase grounding framework that leverages both visual and textual information to precisely localize and classify car damages. By employing a context-aware attention mechanism to fuse visual and textual features, GroundingCarDD outperforms state-of-the-art methods. Specifically, it achieves a mean Average Precision (mAP) of 64.1 and makes only a miss detection rate of 14.4% on a combined dataset of public and curated private data, surpassing other state-of-the-art methods YOLOv9 and DETR. Furthermore, on the public CarDD dataset, it attains an AP50 of 80 and a recall of 86.7, exceeding the performance of existing models. GroundingCarDD has the potential to revolutionize the automotive industry by enabling automated and accurate car damage assessment. This method will bring online transparency on insurance claims processing, optimize vehicle servicing, and enhance online car sales, ultimately benefiting consumers, businesses, and the industry as a whole.





Model

GroundingCarDD is a phrase grounding model that bridges the gap between visual and textual information. By accurately aligning textual descriptions of car damage with specific image regions, it offers a robust solution for precise damage assessment. This innovative approach empowers various applications, including insurance claims processing, automotive repair, and autonomous vehicle safety.:

Overview of GroundingCarDD model architecture.





Acknowledgement

This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.