Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Huh? I never claimed they made a breakthrough. My point is that if they really have an alignment or interpretability breakthrough, they won't have to tell us by virtue signaling about how safe it is. Users will just be able to tell because it will eliminate or drastically reduce problems like #3 in the OP. The outcomes of prompt changes remaining unpredictable tells us it's still a black box.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: