Ai: The danger of plausible but wrong

Why you shouldn't get AI to do something you can't do

There's no doubt AI is an exciting technology - the demos and API's you can use right now to generate images and text from a simple prompt is impressive.

But there's a big issue visible in many of the AI-created works: things are plausible - but wrong.

It's most obvious with artworks - hands with six fingers, extra arms in a picture, shadows in weird spots or weird mergings between sea and sky. The AI knows enough to know what a hand looks like, the colours, the shape, merging the form from various hands it's seen before. But it doesn't know what a hand is. There's no understanding that hands (generally) have five fingers and when merging multiple image sources you should avoid that.

It's more explainable in text. Given the question "What year did Sir Frances Drake die?" Which of these is the correct answer:

A) Banana B) 2014 C) 1589 D) 1782

Now you, like an AI, understand the form of the answer even if you don't know the answer. We're looking for a year, so A is straight out. With a very basic knowledge of English history, you can rule out 2014 as way too recent. Leaving two reasonably plausible answers. If you know Frances Drake is a famous Elizabethan explorer, you can probably pick out C) as the most likely answer in the group, but D) doesn't seem wildly wrong if you just know him as 'old guy from history'.

Of course C isn't right either. Drake died in 1596, but 1589 is close enough if you trust the source you wouldn't think to double-check it.

Real-life examples

Race Across the World

This flaw has been particularly obvious to me recently as I've been testing out Google's Bard AI with my Race Across the World predictions game: World Race Predictions.

Over the last few weeks, I've been asking Bard who it thinks will win each episode and the results are often wild in inaccuracy while remaining plausible if you've never seen the show. The first week Bard seemed to know the teams, but also thought they were heading (or had reached) Tokyo. More recently it invented entire teams from nothing, and this week it presented teams from series 1 and 2 as the currently competing competitors.

Bard seems aware of the format of the show, and always presents teams as couples of two people, but mentions incorrect, fictional or former contestants more than it does the correct answer. If you asked AI to write a blog post on the show and only did a cursory check, you could think they were all active teams, while the AI makes up nonsense about former series players.

Passing the Bar

Legal podcast Opening Arguments had a similar situation, posing practice bar questions to chat GPT. Frequently the AI answer would offer a legit sounding legal analysis that didn't seem bad, but still picked the wrong response. It sounds right, but isn't - arguably a worse situation than giving a more wildly incorrect answer that sets off alarm bells.

AI as your co-pilot

And that's pretty much where I end up. Ai is smart enough to be dangerous - you're tempted to leave it alone, but it isn't always ready to be unsupervised. If you haven't seen the TV show, don't know the law or aren't prepared to double-check when Sir Frances Drake died; don't trust AI to do it for you.

Github's co-pilot AI has its problems with issues of provenance and code licensing, but I love its branding as a 'co-pilot' - there to assist you rather than lead you. If you can't code at all, code-generation AI can't do it all for you, because how do you know it's doing what you want?

But if you can write a blog post on a subject, or code a CRUD controller yourself, AI can assist you to get that done faster. Just be sure to read it through after to ensure it's not hiding some plausible mistakes.