Danny Siu's Website

Project Contexts

High-Resolution NO₂ Mapping for Climate Change Research
Advanced Part-of-Speech Tagging with LSTM Networks
Poke-Man: Multiplayer Game Development with Dynamic Maze Generation
Sociolinguistic Variation in Confirmation and Declination Responses
Neural Network Model for Validating Compound Word Structures

Machine Learning Mapping NO2 datasets

Tracking global NO₂ levels over time is crucial for monitoring climate change. To address this, the Cohen Research Group developed the Berkeley High-Resolution NO₂ Product (BEHR) using the WRF-CHEM climate model to generate high-resolution NO₂ measurements, accounting for spatial and seasonal variations in data recorded by NASA’s Aura satellite. Building on this work, I laid the groundwork for a Machine Learning model designed to replace the computationally intensive WRF-CHEM model. I aligned over 47K WRF-CHEM pixels with OMI pixels, improving data correlation for ML model training, and engineered the geolocation data to streamline analysis and enhance interpretability.

Natural Language Processing POS Tagging

Part-of-speech (POS) tagging is a key component of Natural Language Processing (NLP) that identifies the grammatical categories of words in a sentence, aiding in semantic understanding. I developed a POS tagger model using an LSTM (Long Short-Term Memory) network, trained with mini-batch stochastic gradient descent on pre-trained GloVe embeddings. To enhance model performance, I implemented a bi-directional LSTM, incorporated dropout layers to prevent overfitting, and used the 500,000 most frequent 300-dimensional word embeddings for better contextual understanding. By refining model configurations and experimenting with various parameters, I gained invaluable insights into optimizing neural network performance for real-world applications in language understanding.

Sociolinguistic Variation Study at UC Berkeley Analyzing Variation in Confirmation and Declination responses

Members of the same linguistic community have a lot of variation in how they speak and communicate the same ideas. Often this linguistic variation can be traced to certain social differences and social interactions. In this study, I explored the sociolinguistic variation in "yes" and "no" response variants among English speakers at UC Berkeley. By collecting auditory linguistic data over the span of two weeks, I looked for significant correlations between variant usage and social factors of gender, race, formality of relationship, and situational context.

Neural Network Model Generated Compound Word Classifier

Languages exhibit recognizable patterns when creating new words, and identifying these patterns can offer valuable insights into language structure. In a collaborative project at UC Berkeley, we developed a neural network model to determine whether generated compound words are valid in Standard American English. Using the LADEC dataset, we trained a multilayer perceptron with 2.3K features and 7.8K compound words. Through extensive training and optimization, we attained over 94% accuracy in accurately classifying compound words, validating the model's ability to effectively learn language patterns. This project strengthened my expertise in NLP, neural network optimization, and data-driven linguistic analysis, showcasing how machine learning can integrate computational linguistics with language structure to offer new perspectives on word formation.

Game Development POKE-MAN

I designed and developed a multiplayer game that merges the classic gameplay of Pac-Man with the iconic Pokémon universe. Players control Pokémon characters as they navigate procedurally generated mazes, completing objectives while avoiding obstacles. To elevate the user experience, I implemented gameplay saving using Git, created intuitive menus, and integrated a dynamic HUD display. This project not only honed my game development skills but also allowed me to explore the intricacies of real-time multiplayer mechanics and user interface design.