Katherine Xu

I'm a first year Computer Science PhD student in the GRASP Lab at the University of Pennsylvania advised by Prof. Jianbo Shi and Prof. James Gee. My research interests in computer vision and machine learning include image segmentation, generative models, and 3D scene understanding. I also enjoy leveraging AI in the medical, climate, and robotics domains.

Previously, I spent 4.5 wonderful years at MIT, where I completed my bachelor's degree in Computer Science with a minor in Mathematics and my master's degree in Computer Science with a concentration in AI. I am grateful for the opportunities to conduct research at MIT and intern twice at Meta, including Meta AI (FAIR).

Email  /  Twitter  /  LinkedIn  /  Github

profile photo
Research
sym Modeling Extreme Heat Risk in Urban Areas Using Computer Vision and Data Analysis

Katherine Xu
MIT MEng Thesis, 2023
Writing a paper to submit to a workshop or journal

I developed a model that estimates the extreme heat risk of an urban area at the census tract level. To construct this model, I fine-tuned a vision transformer to segment risk factors from aerial images. I also incorporated heat hazard and vulnerability factors from land surface temperature, building, and socioeconomic datasets. This research focuses on developing a heat risk model for Boston, which experiences intense urban heat islands.

sym Chemistry Insights for Large Pretrained GNNs
Katherine Xu, Janice Lan
NeurIPS AI for Science Workshop, 2022
Paper

During my internship at Meta AI, I worked with the Open Catalyst team, which uses AI to discover catalysts for renewable energy storage. Large graph neural networks (GNNs) have shown good progress on the Open Catalyst 2020 (OC20) dataset to predict the forces and energies of atoms and systems, but we have little understanding of how or why these models work. Hence, we present perturbation analyses of GNN predictions on OC20, and we observed evidence that aligns with chemical intuition.

sym A Novel Digital Algorithm for Identifying Liver Steatosis Using Smartphone-Captured Images
Katherine Xu*, Siavash Raigani*, Angela Shih, Sofia G. Baptista, Ivy Rosales, Nicola M. Parry, Stuti G. Shroff, Joseph Misdraji, Korkut Uygun, Heidi Yeh, Katherine Fairchild, Leigh Anne Dageforde (* equal contribution)
Transplantation Direct, 2022
Paper  /  Poster

sym Linking Threat Tactics, Techniques, and Patterns with Defensive Weaknesses, Vulnerabilities and Affected Platform Configurations for Cyber Hunting
Erik Hemberg, Jonathan Kelly, Michal Shlapentokh-Rothman, Bryn Reinstadler,
Katherine Xu, Nick Rutar, Una-May O'Reilly
arXiv, 2021
Preprint

Projects
sym Rendering and Extracting Transparent Objects using TensoRF and Feature Field Distillation
MIT 6.S980 Machine Learning for Inverse Graphics, Fall 2022
Website  /  Code

sym Object Recognition for Selective Clutter Clearing using PointCLIP
MIT 6.4212 Robotic Manipulation, Fall 2022
Paper

sym Lyric and Audio-Based Music Video Generation
MIT 6.869 Advances in Computer Vision, Spring 2022
Code

sym 3D Subway Surfers: A Live Action Version of the Mobile Game
MIT 6.835 Intelligent Multimodal User Interfaces, Spring 2022

My team implemented a live action implementation of the Subway Surfers mobile game that enables game play using full-body movements, gestures, and speech commands. We wrote Python code to detect the player's pose using OpenCV and MediaPipe and recognize speech commands using the SpeechRecognition library and Google Cloud Speech API. Overall, our 4 volunteers rated our game highly with a mean score of 4.5/5.0. We received positive feedback that the commands are intuitive, and the system is responsive to the player's movements.

sym Large Scale Image Completion via Co-Modulated GANs
MIT 6.S898 Deep Learning, Fall 2021
Website

sym Creating a Chatbot for Three-Way Conversations
MIT 6.S898 Deep Learning, Fall 2021

Natural language processing research often features dialogue systems and chatbots that synthesize 2-way conversations, but few systems involve more participants. We trained a Seq2Seq model on the Persona-Chat dataset and adapted it to train our chatbot for 3-way conversations. We improved our Seq2Seq model performance by adding attention and adjusting the embedding size and number of layers. Our work demonstrates that we can create a chatbot that generates engaging, consistent, rational, and creative responses for 3-way conversations by training a Seq2Seq model on Persona-Chat.


     


Template credit to Jon Barron.