Q: I want to wash my car.
-
@knowmadd I definitely want to see the list of things you should take with you! Like "a bathing suit" or "a banana"?

-
@knowmadd gpt-oss also recommends walking. I asked if I should buy a 50m hosepipe to take with me and it rightly reminded me: "No. A 50m hosepipe is excessive for washing a car 50m from your house β you donβt need to stretch it that far. A 25m hose is sufficient and more manageable." Can't argue with 120bn in logic.


-
Don't forget, we don't know when there's a "human in the loop".
There may or may not be some low wage workers involved in the answer.
Some like Google has enormous investments from Saudi Arabia. Oracle is "training" 50,000 Saudi Arabians in AI.
https://gulfbusiness.com/oracle-targets-training-50000-saudis-in-ai-latest-tech/Or is it Lebanese?
https://today.lorientlejour.com/article/1487826/shehadi-defends-deal-with-oracle-to-train-50000-lebanese-in-ai.htmlHow many "answers" are just 700 employees in India, is hard to know. The AI bubble is rife with fraud.
Behind bankruptcy plea of London start-up: It hired 700 Indian engineers to pose as AI tools
A major AI scandal has shaken the tech world as Builder.ai, once valued at $1.5 billion, has filed for bankruptcy. The company, backed by Microsoft and a Qatari sovereign fund, falsely claimed to build apps in minutes using AI, while actually relying on hundreds of human engineers in India.
Firstpost (www.firstpost.com)
@Npars01 @knowmadd @hook I got the right answer when I took a screenshot of Chat GPT and just asked gemini to transcribe it. It just added the right explanation on top. Don't think this is a case of a Waymo getting driven remotely.
Doesn't mean there isn't the possibility of fraud. For example, benchmarks are probably optimised for.
-
"Thinking models" vanished from the marketing pretty quick
-
@knowmadd i guess you have some heavy equipment to carry, but c'mon, it's only 50m and you're young!
-
@knowmadd the new strawberrry!
-
@knowmadd Deepseek was so close.

-
@knowmadd that Mistral checklist should be fun
-
-
@knowmadd to be fair one of the answers mentioned using the car if you have to carry heavy equipment, and I'd say that a car *is* heavy and it probably counts as equipment

although it has wheels, so maybe it could be pushed

(can you even push a modern car? my mental model for these things is probably stuck in the last century)
-
@knowmadd gpt-oss also recommends walking. I asked if I should buy a 50m hosepipe to take with me and it rightly reminded me: "No. A 50m hosepipe is excessive for washing a car 50m from your house β you donβt need to stretch it that far. A 25m hose is sufficient and more manageable." Can't argue with 120bn in logic.


-
@knowmadd That is a super intelligent AI right there.
-
βI burned down a rainforest and all I got was more stupider.β
-
@knowmadd this sounds like the nerd grocery shopping problem.
A: "Darling, please go shopping. Bring 2 liters of milk. If they have eggs, bring 10."
Later the nerd returns.
A: "Why did you bring so much milk?!"
B: "They had eggs. You said, I should bring 10 liters of milk if they have eggs."
-
@OutOfSpace @knowmadd "For minimal environmental benefit -> walk (and then drive)"
-
@knowmadd well in their defence (I'm doing what?) a good chunk of people would say the same thing. Hopefully only for a moment though, before they went 'wait a second!"

-
@knowmadd It's interesting to see different "levels" of Gemini respond in different ways.




-
@knowmadd β Claude is too stupid for me to bother with.

-
@knowmadd DeepSeek :
"You should drive the car to the car wash because the car needs to be at the location to be washed. Walking would leave the car at home, so you wouldn't be able to wash it."(In its working out it discussed environmental issues but also pointed out they were irrelevant as the car needs to be present )
-
@OutOfSpace @knowmadd "For minimal environmental benefit -> walk (and then drive)"
@Azuaron @knowmadd Yeah, as a second option. First option recommended:
For convinience -> Drive.This is what is called selective reporting. Marketing departments of pharmaceutical industry are famous for it.
My point was that deepseek recognized that the car needs to be at the car wash in the end. This is at least a little bit better than the other llms in your test. Your alt-text suggested otherwise.
I don't want to say that deepseek performed well in your test though

