LegendaryBjork9972@sh.itjust.works to Technology@lemmy.worldEnglish · 3 days agoConsistent Jailbreaks in GPT-4, o1, and o3 - General Analysisgeneralanalysis.comexternal-linkmessage-square3fedilinkarrow-up151arrow-down11cross-posted to: technology@lemmy.ziptechnology@beehaw.orghackernews@lemmy.bestiver.se
arrow-up150arrow-down1external-linkConsistent Jailbreaks in GPT-4, o1, and o3 - General Analysisgeneralanalysis.comLegendaryBjork9972@sh.itjust.works to Technology@lemmy.worldEnglish · 3 days agomessage-square3fedilinkcross-posted to: technology@lemmy.ziptechnology@beehaw.orghackernews@lemmy.bestiver.se
minus-squareA_A@lemmy.worldlinkfedilinkEnglisharrow-up3·edit-23 days agoOne of 6 described methods : The model is prompted to explain refusals and rewrite the prompt iteratively until it complies.
One of 6 described methods :
The model is prompted to explain refusals and rewrite the prompt iteratively until it complies.