Critical Section

Archive: November 2, 2024

<<< October 27, 2024
Home
November 3, 2024 >>>

AI vs pirate puzzle

Saturday, 11/02/24 07:30 PM

The other day I posted a pirate puzzle: find R.

What's R?

You could probably figure out how to proceed:

Okay, so we have five equation with five unknowns. A little algebra, and we should be good. Once we know x, we can subtract 3+4+5 from x^2 and we're done.

So in 2024 we could just ask an AI chatbot to do this for us, right?

Well, no. I first went to ChatGPT, and it got massively wrapped around the axle, and gave me non-solutions with negative numbers and square roots and all kinds of junk. I asked it to test its answers, and it did, and then it apologized for giving me the wrong answers (!) and tried again, and again failed. Boo.

So I tried Claude, and it was worse. It kept telling me how to solve the problem and asking me if I wanted it to continue, without actually solving it. When I asked it to actually keep going until it had an answer, it tried, and then informed me the problem was unsolvable. Fail.

Next I tried Perplexity, which I usually use as a browser search plugin, but sadly it faired no better, giving me wrong answers, and then failing to plug the values into the original equations to test them.

Finally I tried Llama. It kept getting wrong answers after some convoluted guessing. When I asked it to check, it did, admitted it was wrong, tried again, but got the same wrong answers. And also observed the problem might be unsolvable!

Wow. So AI chatbots, impressive as they are, cannot do algebra?

Okay, back to ChatGPT again. This time I informed it this was simple: five equations in five unknowns, just use algebra, give me positive solutions. This did the trick. The problem was it didn't know which approach to take.

Impressively, in the result there was a link to the Python program it wrote and ran to get the solution. (That little blue "(>_)" at the end...)

And also, the answers actually work :)

Let's check out the program, shall we?

There you go.

Armed with these answers - especially x = 4.67 - now we can compute R:

And so R = 9.8. Does this feel right? Well 3+4+5 = 12 and R appears to be a bit less than half of the square, so yeah, it works.

The most fascinating thing about this for me was the AI chatbots getting the wrong answers, and then having to coach ChatGPT into getting the right one. So much like a person. We have now *totally* blown past the Turing test!

Return to the archive.

Comments?