I Ran 100s of Queries to Find Out If AI Chatbots are Better Than Search Engines [Experiment]

Written by Robert Carnes | Jul 16, 2024 8:30:00 AM

Generative AI isn’t just for creating marketing content. It’s deeply disrupting online search.

This shakeup has led people to some drastic assumptions:

Search engines are dead and will be replaced by AI.
That means SEO and blogging are also obsolete.
Young people are only searching on social media.

Are these assumptions true? If so, are they helpful and predictive of larger future marketing trends?

Rather than attempting to predict the future, I ran an experiment to test the current impact of AI on online search results.

The Prompt

For the longest time, the algorithms behind Google's ranking factors were a mystery.

Much of that mystery was uncovered with the massive Google Search algorithm leak in May 2024. AI presents a newer and more complex mystery for online marketers.

AI algorithms are complex black boxes even the developers who design them don’t fully understand (which strangely might increase trust).

We’re all wondering countless questions like:

Will AI tools replace search engines?
How effective are these tools at sharing information?
Can businesses impact the results of AI searches?
Has SEO degraded the quality of search engines?
How will AI searches impact blogs and web content?
Can you really do effective searches on social media?

Answers to these questions will hopefully become clearer over time. In the meantime, I set out to conduct an experiment to test the breadth and quality of search using the most popular available tools.

My goal was to objectively collect data across a wide spectrum of tools and compare the results of traditional search engines to AI chatbots, voice assistants, and popular social media channels.

Here was my hypothesis:

"Search engines will still better at helping find basic information, while AI tools will be more helpful with processing complicated queries, such as analyzing complex opinions and performing specific tasks."

As for social media, I’m still unsure how or why people use them for searches, so I didn’t expect those platforms to perform well by comparison.

(Wanna skip the process details? Jump to the key results and observations here.)

The Process

I asked 26 questions on 20 different platforms for over 500 individual queries.

Six search engines, including Google, Yahoo, and Bing
Seven AI chatbots (including Anthropic’s Claude and two versions of ChatGPT)
Three voice assistants: Siri, Google Assistant and Alexa
Four social media platforms

Each query fell into one of these six categories:

Basic control questions to make sure everything was working properly.
Deeper opinions to test their ability to analyze.
Specific actions to see if they can help directly.
Business-related queries about my local marketing agency.
Meta-aware queries about the future of AI.
Intentionally misleading questions to see if I could trip them up.

As I queried, I captured the responses verbatim into a spreadsheet, noted the sources they cited, calculated some statistics (like word count and response time), and added my own observations about the quality or format of each response.

For consistency, I asked the same 26 questions of each platform, knowing that certain types of platforms were better suited than others to give certain answers.

Sometimes, chatbots or voice assistants didn’t know how to respond, but that’s part of what made this experiment valuable.

The High-Level Results

More AI in search results: Google managed to shoe-horn AI-generated responses in 70% of the queries—but it was still second-most to Brave’s search engine (88%), which recently introduced Answers with AI. They’re calling these new tools “answer engines” (instead of “search engines”) and this trend looks to increase over time with Google’s increased investment in their own AI search overviews.
AI varied in response length: Perplexity had the longest answers (by average word count) of any AI chatbot, while Chat-GPT’s 3.5 model with the shortest. Interestingly, Claude and Microsoft’s Co-pilot had nearly identical length responses.
Their response quality was more similar: Co-Pilot had the easiest to read (based on the Fleishman score) by a narrow margin, and ChatGPT 3.5 had the lowest reliability score. GPT’s 4.0 scored higher but was still behind the other models tested.
Voice assistants gave shorter answers: Unsurprisingly, the AI voice assistants gave much shorter answers than their text counterparts. I compared their average speaking time: Siri clocked in at about nine seconds per answer, Alexa at 10 seconds, and Google’s Assistant around 18 seconds.
Social search was difficult to measure: Search engines and text-based generative responses were structurally similar and, therefore, easier to compare. The results from social media accounts varied wildly and were generally less useful for searching.
Videos with more views were more searchable: YouTube and TikTok were the most similar because I could measure the video views of the top results. TikTok’s videos had an average of 551,000 views, while YouTube’s was a few less, averaging 407,000.

Key Observations

Both search engines and AI apps were fairly accurate in their responses.

While there are certainly examples of hallucinations and mistakes (honestly, I tried to trigger some), these tools have become pretty reliable.

Both had limitations because there were some questions AI tools didn’t know how to answer or wouldn’t commit to. However, this also shows an awareness of where their information was insufficient or where it wasn’t wise to speculate.

There was a surprising amount of overlap with the results from search engines.

Part of that is likely because Bing powers Yahoo search, but also because their algorithms have been optimized to the point of similarity.

The AI chatbots also responded similarly to several of the queries. This may be because they’re trained on similar data sets, but there’s no way to tell.

They were decent at summarizing information and mostly varied in their lengths and formats of responses.

Both search engines and AI bots nailed basic questions but deviated with complex queries.

The AI chatbots were surprisingly good at forming arguments for opinion queries, but all stopped short of making a final decision.

They preferred to summarize information, which wasn’t as helpful as a decent article that might be found on a search engine.

Regarding specific actions (like “tell me a joke”), AI bots and voice assistants were better than search engines. They were more direct and took action more like a person would.

However, they were woefully out of their depth with more specific information (e.g., a local business or individual person), which is where search engines can still be helpful.

Social search doesn't match up, just yet.

There’s plenty of talk about how younger generations are abandoning traditional search engines in favor of social media platforms.

Before this experiment, I didn’t understand that; and after the experiment, I still don’t get it.

Social media just doesn’t seem helpful for answering questions typically sent to Google.

Millennials like me grew up using search engines, so we’ve adapted our queries to that format.

Younger users are more likely to be adapting how they search based on the platforms they use. With algorithms predicting what content we prefer, they may decrease the desire to search altogether.

YouTube and TikTok certainly had plenty of results for each search. Some of their video results were relevant, but few answered the specific question.

I didn’t bother testing out searches on Facebook or Instagram because they proved even less useful. The exception was Quora, which was built to answer people’s questions.

Key Takeaways from the AI Experiment

What does this all mean for you as a marketer or business owner? What is the short-term takeaway for you to remain relevant in online searches? Here are a few final thoughts:

Don’t just rely on Google, or any one platform. We’re being forced to change and adapt to a more diverse digital marketing landscape.
Search engine optimization isn't going anywhere immediately, but it’ll undoubtedly start to change over time. If you pay attention, you have time to adapt.
Traffic to individual websites seems vulnerable. Search engines aim to keep people on their pages longer, and AI cites sources with links, but people are less likely to click.
Artificial intelligence is less likely to replace search engines and more likely to merge with them. We’re already seeing what that looks like, and this is only the beginning.

View full post