Roham’s Page
I am a senior undergraduate student at the University of California, San Diego, majoring in Data Science. My research explores multimodal reasoning, vision-language models, and adversarial machine learning.
I have had the privilege of contributing to cutting-edge research across multiple esteemed labs at several academic institutions. More information about these experiences, along with my entrepreneurial endeavors, can be found in my CV or below.
• At UC San Diego’s McAuley Lab (CSE Department), I first-authored and led the development of TetrisBench, a dataset designed to evaluate vision-language models on spatial fit, uncovering several significant visual reasoning gaps.
• At MIT CSAIL’s Spoken Language Systems Lab, I played a key role in the Neural Codec Resynthesis project, designing large-scale listening experiments for subjective evaluations.
• At UC San Diego’s Adaptive Computing and Embedded Systems (ACES) Lab (ECE Department), I collaborated with Ph.D. students on the EveGuard project, enhancing privacy in speech recognition systems through adversarial robustness techniques.
• At UC San Diego’s Hu Lab (Halıcıoğlu Data Science Institute), I explored large language model-driven procedural generation techniques in Unreal Engine to create scalable 3D city environments for autonomous system simulations.
• I am currently leading another research project within UC San Diego’s CSE Department, focusing on evaluating multimodal models’ ability to reason from different visual perspectives, uncovering a large gap in their capabilities and developing a model.
Publications
TetrisBench: A Spatial Fit Benchmark for Vision-Language Models
Submitted to CVPR 2025
Roham Mehrabi, Tianyang Liu, Lei Zhang, Julian McAuleyA Closer Look at Neural Codec Resynthesis: Bridging the Gap Between Codec & Waveform Generation
Alexander H. Liu, Yuan Gong, Roham Mehrabi, James R. Glass
A revised version was later released, replacing my co-author credit with an acknowledgment.
Accepted to: Audio Imagination: NeurIPS 2024 Workshop on AI-Driven Speech, Music, and Sound Generation
Blogs
Spatial Reasoning in AI: How Autonomous Vehicles and Robots See the World
Oct 1, 2024
This article explores how autonomous vehicles and robots perceive and understand their environments through spatial reasoning.Adversarial Attacks in AI: How ChatGPT Can Be Hacked
Sep 20, 2024
This article talks about the subtle manipulations known as adversarial attacks that can deceive AI systems like ChatGPT into making incorrect predictions or outputs.Vision-Language Models: The AI That’s Learning to See and Speak
Sep 16, 2024
This article provides an overview of vision-language models and their significance in the field of AI, discussing how these models integrate visual and textual information.
Entrepreneurship
In addition to my research and academic pursuits, I have a entrepreneurial background, having co-founded multiple startups.
• I co-founded Dart.cx, where we developed a conversational AI phone agent to answer over 180 patient calls for medical clinics.
• Prior to Dart.cx, I co-founded Socale, a social networking app that anonymously matches college students in the same class based on shared interests. The app garnered over 1,800 downloads, 13,000 messages, and 30,000 user sessions from UCSD students. 📱🚀
• Additionally, I founded RAZI, an organization that provides data-driven marketing strategies to startups and small businesses. I led a team of 47 students from universities such as UC Berkeley, UCLA, and UCI, and worked on marketing campaigns for startups valued at $3-7 million. Through RAZI, I also organized 36 workshops and hosted 7 prominent speakers in the entrepreneurial space. 🎯
Here are 6 of the workshops I conducted at RAZI:
