Tracking global NO₂ levels over time is crucial for monitoring climate change. To address this, the Cohen Research Group developed the Berkeley High-Resolution NO₂ Product (BEHR) using the WRF-CHEM climate model to generate high-resolution NO₂ measurements, accounting for spatial and seasonal variations in data recorded by NASA’s Aura satellite. Building on this work, I laid the groundwork for a Machine Learning model designed to replace the computationally intensive WRF-CHEM model. I aligned over 47K WRF-CHEM pixels with OMI pixels, improving data correlation for ML model training, and engineered the geolocation data to streamline analysis and enhance interpretability.
Part-of-speech (POS) tagging is a key component of Natural Language Processing (NLP) that identifies the grammatical categories of words in a sentence, aiding in semantic understanding. I developed a POS tagger model using an LSTM (Long Short-Term Memory) network, trained with mini-batch stochastic gradient descent on pre-trained GloVe embeddings. To enhance model performance, I implemented a bi-directional LSTM, incorporated dropout layers to prevent overfitting, and used the 500,000 most frequent 300-dimensional word embeddings for better contextual understanding. By refining model configurations and experimenting with various parameters, I gained invaluable insights into optimizing neural network performance for real-world applications in language understanding.
Members of the same linguistic community have a lot of variation in how they speak and communicate the same ideas. Often this linguistic variation can be traced to certain social differences and social interactions. In this study, I explored the sociolinguistic variation in "yes" and "no" response variants among English speakers at UC Berkeley. By collecting auditory linguistic data over the span of two weeks, I looked for significant correlations between variant usage and social factors of gender, race, formality of relationship, and situational context.
Languages exhibit recognizable patterns when creating new words, and identifying these patterns can offer valuable insights into language structure. In a collaborative project at UC Berkeley, we developed a neural network model to determine whether generated compound words are valid in Standard American English. Using the LADEC dataset, we trained a multilayer perceptron with 2.3K features and 7.8K compound words. Through extensive training and optimization, we attained over 94% accuracy in accurately classifying compound words, validating the model's ability to effectively learn language patterns. This project strengthened my expertise in NLP, neural network optimization, and data-driven linguistic analysis, showcasing how machine learning can integrate computational linguistics with language structure to offer new perspectives on word formation.
I designed and developed a multiplayer game that merges the classic gameplay of Pac-Man with the iconic Pokémon universe. Players control Pokémon characters as they navigate procedurally generated mazes, completing objectives while avoiding obstacles. To elevate the user experience, I implemented gameplay saving using Git, created intuitive menus, and integrated a dynamic HUD display. This project not only honed my game development skills but also allowed me to explore the intricacies of real-time multiplayer mechanics and user interface design.
(*Note: This project was created for educational purposes and does not infringe on any copyrights held by Nintendo, Game Freak, or Creatures. All credit for Pokémon character design goes to the respective companies.)