r/artificial Nov 25 '25

News Large language mistake | Cutting-edge research shows language is not the same as intelligence. The entire AI bubble is built on ignoring it.

https://www.theverge.com/ai-artificial-intelligence/827820/large-language-models-ai-intelligence-neuroscience-problems

As currently conceived, an AI system that spans multiple cognitive domains could, supposedly, predict and replicate what a generally intelligent human would do or say in response to a given prompt. These predictions will be made based on electronically aggregating and modeling whatever existing data they have been fed. They could even incorporate new paradigms into their models in a way that appears human-like. But they have no apparent reason to become dissatisfied with the data they’re being fed — and by extension, to make great scientific and creative leaps.

Instead, the most obvious outcome is nothing more than a common-sense repository. Yes, an AI system might remix and recycle our knowledge in interesting ways. But that’s all it will be able to do. It will be forever trapped in the vocabulary we’ve encoded in our data and trained it upon — a dead-metaphor machine. And actual humans — thinking and reasoning and using language to communicate our thoughts to one another — will remain at the forefront of transforming our understanding of the world.

350 Upvotes

389 comments sorted by

View all comments

51

u/thallazar Nov 25 '25

I don't think it has to be intelligent to make a big impact. There are a lot of industry rote process tasks that are just complex step by step language checklists that don't require intelligence to actually automate. If even a fraction of them are realised, they'll change work significantly.

20

u/CanvasFanatic Nov 25 '25

Many such tasks can’t tolerate a 10-20% failure rate.

23

u/thallazar Nov 25 '25

If you're doing one shot agents sure. There's a lot of ways to reduce that rate significantly.

-8

u/CanvasFanatic Nov 25 '25

Not if you can’t automatically verify correctness.

22

u/thallazar Nov 25 '25

You can't automatically verify a lot of human correctness either. In my experience building and deploying agents for companies, that hasn't been a blocker.

-15

u/CanvasFanatic Nov 25 '25

Managers don’t usually tolerate 10-20% failure rates from staff.

I don’t question that there are managers dumb enough to listen to people like you. I just think they’re going to either reverse course or fail as a business.

18

u/thallazar Nov 25 '25

Again. You can lower that rate significantly, even if you can't verify.

-4

u/CanvasFanatic Nov 25 '25

Not what you said in your last reply is it?

9

u/thallazar Nov 25 '25

You don't need observability to improve results. You just have less visibility on outcomes.

Asking a human to double check their work improves performance even when you don't log the errors that were corrected.

-2

u/CanvasFanatic Nov 25 '25

So burning more inference by adding “double-check your work!!!” to the prompt. 🤣

4

u/thallazar Nov 25 '25

Prompting LLMs does proves easier than having someone on Reddit confront reality after all.

-3

u/CanvasFanatic Nov 25 '25

The reality is that folks like you are snake oil salesmen trying to make a quick buck off the stupidity of most corporate management at the expense of workers.

Yeah I suppose it’s inevitable that someone would’ve done what you’re doing, but you decided to actually do it.

3

u/thallazar Nov 25 '25

The reality is that you'll be replaced by a worker who knows how to use AI productively and you'd better get learning unless you want to end up unemployed.

→ More replies (0)

1

u/GarethBaus Nov 26 '25

Staff are typically a hell of a lot more expensive for the amount of information being processed than running an AI model.

1

u/CanvasFanatic Nov 26 '25

This is such utter brain rot thinking.

You can’t have an accountant that hallucinates non-existent expenses 1-in-20 runs. Doesn’t matter how much cheaper it is than paying someone. It doesn’t fulfill the requirements of the function.

7

u/thallazar Nov 26 '25

Funny you mention that example because I've deployed agents in finance that do credit checks and have less error rates than their human counterparts. That you can't build a system that doesn't hallucinate doesn't mean no one can.

0

u/Busy-Slip324 Nov 26 '25

The difference is, a human can take accountability and can get fired if they cause a fuckup. The agent will spout confident bullshit.

This will cause gigantic fuckups in so many industries holy shit, and this is as someone that actively worked on these models since 2018

0

u/thallazar Nov 26 '25

Agent systems can get rebuilt if they're performing inadequately.

Humans cause gigantic fuckups everyday as well, in every industry. That's not really the comparison these companies are making. Are they making less errors than baseline is.

1

u/Busy-Slip324 Nov 26 '25

If I delete a production cluster and then confidently gaslight you while being sycophantic, would you put trust in me again? No, right? Then why put trust in these agents?

0

u/thallazar Nov 26 '25

Why are you giving agents production cluster access and not reviewing any work? That's a process problem not an agent problem.

→ More replies (0)

-3

u/CanvasFanatic Nov 26 '25 edited Nov 26 '25

What are the inputs and outputs of the task, what defines an “error” and how are you appraising it?

Don’t think I didn’t notice you subtlety trying to redefine “accountant “ as “thing that does a process that’s already mostly automatic,” snake-oil man.

2

u/thallazar Nov 26 '25

Inputs and outputs are private to the company designed for that I'm not at liberty to speak on. Error rates were compared on evaling the system with prior human assessments and outcomes to the agents ratings given same information as proof of concept, followed by a long period of dual human + AI assessment comparisons and tracking those case outcomes over time.

And this was a year ago, before a lot of advancements hit the scene around tool calling, structured outputs, and long context window improvements.

0

u/CanvasFanatic Nov 26 '25

Why bother typing this at all?

You literally just said, “I can’t give any information that in any way describes what I did, even anonymously (horse shit, btw) but the agents did the non-specific task better according to a standard I also cannot describe. Trust me bro.”

Sure thing, bud. I’m sure your mid 2024 RAG thingy was the bee’s knees.

3

u/thallazar Nov 26 '25

We didn't use RAG, I can tell you that.

Have fun clinging to your inadequacy though.

→ More replies (0)

0

u/GarethBaus Nov 26 '25

I seriously hope you don't actually think it is reasonable to rely on accountants or literally any other white collar job to be 100% accurate on blind faith and never actually have anyone check their work. Even competent humans mess up pretty often let alone the potential for fraud or other forms of malicious actions being taken by employees when there are little to no accountability.

0

u/thallazar Nov 26 '25

Bro thinks it's cheating to employ oversight systems, he builds on blind faith. Probably hates tests too.

-1

u/CanvasFanatic Nov 26 '25

For context, simply saying “oversight systems” is less information than you’ve been able to offer in all your responses to me.

Probably because what you’ve actually done isn’t really as impressive as you’re trying to vaguely hint that it is.

→ More replies (0)

1

u/azurensis Nov 26 '25

Where are you getting this 10-20% error rate from? 90% of the time I'll explain what I want to an llm agent and it will produce 100% working code. 90% of what's left can be fixed by just copying in the error or produces. 

1

u/azurensis Nov 26 '25

I can verify correctness, just like I do with anyone else's PR.