Visual Computing Laboratory
Research Areas


Physical

AI

Generative

AI

Multimodal

AI




Intelligent Surveillance Camera

Person re-identification (re-ID) aims to match the same person across different camera views. In recent years, significant progress has been made in re-ID; however, most studies are confined to short-term re-ID. Unlike long-term re-ID, in short-term re-ID, a person’s clothing remains unchanged, whereas in real-world scenarios, people change their clothes over time. Therefore, in long-term re-ID, extracting identity-related features such as body shape, hair color, and other biometric traits becomes important.

Point Cloud Scene Understanding

Point cloud segmentation is essential for 3D scene understanding by assigning semantic or instance labels to each point. While 3D segmentation focuses on static frames, 4D segmentation incorporates temporal information to handle dynamic changes such as moving objects. Despite recent advances, challenges like temporal consistency and computational cost remain. Our research aims to develop robust and efficient models for both 3D and 4D segmentation to enhance real-world scene understanding.

 Open-Vocabulary Learning

Open-vocabulary learning enables models to recognize unseen categories by leveraging semantic information like text or word embeddings, moving beyond fixed label sets. While recent advances in vision-language models have improved generalization, challenges remain in visual-semantic alignment and domain robustness. Our research focuses on building scalable, interpretable systems to enhance adaptability in real-world scenarios.

Text-to-Image Generation / Editing
Text-to-image generation and image editing are key areas in image synthesis that enable creating and modifying images based on natural language descriptions. Recent advances with diffusion and transformer-based models have greatly improved image quality and control. However, challenges such as semantic alignment and consistency remain. Our research aims to develop controllable and semantically grounded models to enhance both text-to-image generation and image editing in open-domain scenarios.
Zero-Shot Learning
Zero-shot learning (ZSL) enables recognition of novel categories without labeled examples by using semantic information like attributes or text. Compositional ZSL extends this by identifying new combinations of known attributes and objects, improving generalization. Despite progress, challenges remain in representation learning and scalability. Our research focuses on developing robust, interpretable models to enhance zero-shot and compositional zero-shot learning for real-world applications.