Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

Lugh@futurology.today · 1 year ago

mateomaui@reddthat.com · 1 year ago

Just… don’t hook it up to the defense grid.

Possibly linux@lemmy.zip · 1 year ago

Sorry, to late for that

mateomaui@reddthat.com · 1 year ago

Alright, I’ll be out back digging the bomb shelter.

Possibly linux@lemmy.zip · edit-2 1 year ago

Its too late for that honestly

mateomaui@reddthat.com · 1 year ago

Alright, I’ll switch to digging holes for the family burial ground.