Type:         Senior Thesis
Institution: Rose-Hulman Institute of Technology
Advisor:     Dr. Michael Wollowski
Duration:   Aug 2024 - Present
Developed an efficient safety training technique to make a reasoning model evaluate the impact of its actions from the perspective of others and restrain itself if harm is expected;
Demonstrated performance exceeding SFT and RL in removing persistent backdoors from reasoning-based sleeper agents, by 40-50% reduction of the harmful behavior;
Presents an emergency security patch for frontier model training pipelines and a new approach to superalignment using empathy as a continuous self-learning mechanism for ethical behavior.