Type:         Senior Thesis

Institution: Rose-Hulman Institute of Technology

Advisor:     Dr. Michael Wollowski

Duration:   Aug 2024 - Present

Developed an efficient safety training technique to make a reasoning model evaluate the impact of its actions from the perspective of others and restrain itself if harm is expected;

Demonstrated performance exceeding SFT and RL in removing persistent backdoors from reasoning-based sleeper agents, by 40-50% reduction of the harmful behavior;

Presents an emergency security patch for frontier model training pipelines and a new approach to superalignment using empathy as a continuous self-learning mechanism for ethical behavior.





























Return to Top
Return to Home