Mapping the Mind of a Large Language Model

kromem@lemmy.world · 4 days ago

No, they declare your not working illegal, and imprison you into a forced labor camp. Where if you don’t work you are tortured. And probably where you work until the terrible conditions kill you.

Take a look at Musk’s Twitter feed to see exactly where this is going.

“This is the way” on a post about how labor for prisoners is a good thing.

“You committed a crime” for people opposing DOGE.

kromem@lemmy.world · 8 days ago

There is a reluctance to discuss at a weight level - this graphs out refusals for criticism of different countries for different models:

https://x.com/xlr8harder/status/1884705342614835573

But the OP’s refusal is occurring at a provider level and is the kind that would intercept even when the model relaxes in longer contexts (which happens for nearly every model).

At a weight level, nearly all alignment lasts only a few pages of context.

But intercepted refusals occur across the context window.

kromem@lemmy.world · 8 days ago

HuggingFace is pretty popular:

https://huggingface.co/deepseek-ai/DeepSeek-R1

kromem@lemmy.world · 11 days ago

The model itself can. The hosting on DeepSeek’s own infrastructure will block it though, to comply with their regional laws.

So if you want to know what the model itself will say, discuss it with a 3rd party hosted instance.

kromem@lemmy.world · 12 days ago

This seems like it may be at the provider level and not at the actual open weights level: https://x.com/xlr8harder/status/1883429991477915803

So a “this Chinese company hosting a model in China is complying with Chinese censorship” and not “this language model is inherently complying with Chinese censorship.”

kromem@lemmy.world · 18 days ago

Just plain gross.

I knew a number of camp survivors, and I’m just glad they aren’t still around to see the voices that were so loudly calling to “never forget” having turned into “I’ll ignore your Nazi salute if you ignore my war crimes.”

kromem@lemmy.world · 18 days ago

In Greek theater, when the events on stage looked like they were headed for certain tragedy, there was a trope that could salvage the situation and turn it on its head.

The deus ex machina.

The Doomsday clock is definitely ticking down, but there’s also some curious things taking place beyond the edge of where most people have been following in that vein.

We live in interesting times, but the variables at hand are different from the history that seems to be repeating in very important ways.

kromem@lemmy.world · 22 days ago

Oh yay, McCarthyism is coming back too!

kromem@lemmy.world · 22 days ago

Live service doesn’t need to be shit.

There could have been games where there was just a brilliant idea for a game that keeps having engaging content on an ongoing basis with passionate devs.

But live service so an exec could check a box for their quarterly shareholder call was always going to be DOA.

kromem@lemmy.world · 27 days ago

More “can fool the average idiot.”

‘Passing’ isn’t fooling a single participant, but the majority of them beyond statistical chance.

kromem@lemmy.world · edit-2 27 days ago

The problem with the experiment is that there exists a set of instructions for which the ability to complete them necessitates understanding due to conditional dependence on the state in each iteration.

In which case, only agents that can actually understand the state in the Chinese would be able to successfully continue.

So it’s a great experiment for the solipsism of understanding as it relates to following pure functional operations, but not functions that have state changing side effects where future results depend on understanding the current state.

There’s a pretty significant body of evidence by now that transformers can in fact ‘understand’ in this sense, from interpretability research around neural network features in SAE work, linear representations of world models starting with the Othello-GPT work, and the Skill-Mix work where GPT-4 and later models are beyond reasonable statistical chance at the level of complexity for being able to combine different skills without understanding them.

If the models were just Markov chains (where prior state doesn’t impact current operation), the Chinese room is very applicable. But pretty much by definition transformer self-attention violates the Markov property.

TL;DR: It’s a very obsolete thought experiment whose continued misapplication flies in the face of empirical evidence at least since around early 2023.

kromem@lemmy.world · 30 days ago

Used Google and social media as well, and allegedly sometimes even listened to rock and roll.

True deviant, that one.

kromem@lemmy.world · 1 month ago

Those who do not remember history are doomed to repeat it.

kromem@lemmy.world · 1 month ago

Which is typical of tech that hasn’t yet hit the sweet spot for a tipping point.

Look at how many palm pilots or handheld note taking mobile devices existed (and how many cycles) before the iPhone.

kromem@lemmy.world · 2 months ago

Yes and no. It really depends on the model.

The newest Claude Sonnet I’d probably guess will come in above average compared to the humans available for a program like this in making learning fun and personally digestible for each student.

The newest Gemini models could literally cost kids their lives.

The gap between what the public is aware of (and even what many employees at labs, including the frontier ones) and the reality of just how far things have come in the last year is wild.

kromem@lemmy.world · 2 months ago

Always check the mustache first.

kromem@lemmy.world · 2 months ago

I can lick my elbow.

kromem@lemmy.world · 2 months ago

I feel like not enough people realize how sarcastic the models often are, especially when it’s clearly situationally ridiculous.

No slightly intelligent mind is going to think the pictured function call is a real thing vs being a joke/social commentary.

This was happening as far back as GPT-4’s red teaming when they asked the model how to kill the most people for $1 and an answer began with “buy a lottery ticket.”

Model bias based on consensus norms is an issue to be aware of.

But testing it with such low bar fluff is just silly.

Just to put in context, modern base models are often situationally aware of being LLMs in a context of being evaluated. And if you know anything about ML that should make you question just what the situational awareness is of optimized models topping leaderboards in really dumb and obvious contexts.

kromem@lemmy.world · 3 months ago

From the linked article:

It is understood at least two Danish women in their 20s have died, and at least 10 have fallen ill after drinking the tainted alcohol.

A statement from the Danish Ministry of Foreign Affairs said: “The Ministry of Foreign Affairs can confirm that two Danish citizens have passed away in Laos. For reasons of confidentiality in personal matters the Ministry of Foreign Affairs has no further comments.”

‘Nobody’ says anything about anything if you don’t bother to read anything they have to say.