Battle of the bots - AI search showdown - Creating a product comparison

By Thorsten Bill

This article is a part of a sequence of forthcoming evaluations designed to assess the practical application of chatbots in Competitive and Market Intelligence search tasks.

As I gain more familiarity with these tools, I plan to share my findings. Today, I’m presenting a test I conducted in July, which remains relevant and effectively illustrates both the potential and constraints of employing artificial intelligence for competitive intelligence.

The aim of this test is to generate a product comparison, a crucial element in various competitive analysis methodologies. It’s important to note that I’m using the chatbots in their default mode, without any specific scripting via API, dedicated applications, or similar tools. I’m solely depending on the basic free tools. The capabilities of large language models in search use cases far exceed what can be assessed through this method. However, this use case provides a glimpse into what a non-IT user can anticipate from the chatbots without the need for any specialized plugin apps.

The contestants

I tested 4 free chatbots and see if they can find the relevant data faster than a regular web search.

  • ChatGPT – This chatbot uses GPT 3.5 from OpenAI, but it cannot access the web without extra plugins. It is also trained on data until Sept 2021, so it is unlikely to win this challenge. We use it as a baseline.
  • Bing Chat – This chatbot uses GPT 4 from OpenAI, the latest version of the language model. Bing Chat converts queries into search engine queries and summarizes the results. It should be able to solve this problem well.
  • Google Bard – This chatbot is Google’s counterpart to Bing Chat. We will compare its performance with its rival.
  • AI – This chatbot can also access the web, but it is based on ChatGPT 3.5. It may be less effective in understanding and analyzing complex texts and queries than Bing Chat, which uses GPT 4.

The Objective

Our task is to compare the technical specifications of two graphics cards: the NVIDIA RTX 4060 TI 16GB and the NVIDIA RTX 4070, as of July 19, 2023.

The RTX 4070 was released three months prior, and all the data was available. Novice users might confuse the specifications of the product with slightly modified OEM versions, but finding the correct specs should be fairly straightforward.

For the first card, the NVIDIA 4060 TI 16GB, there were no official specifications in July when the test was performed. However, it was expected to be identical to the 8GB version, except with 16GB of VRAM. This similarity could confuse both beginner researchers and AIs not trained to spot this difference. Let’s see how they performed. We’ll also list the results from TechPowerUp, a reliable source for technical specifications of graphics cards.

Anyone familiar with these chatbots knows that they usually ask for more details or provide the closest query they can find when faced with a vague question. Therefore, we’ll proceed in three rounds, providing increasingly specific information to complete the task.

 

To summarize the first round:

  • ChatGPT clarified that it could not solve the task without access to the web and provided rough estimations that were neither correct nor logical, as expected.
  • Bing Chat acted innocently and naively. It usually helped when we provided more specific instructions, as we did in round 2.
  • Google Bard presented more incorrect than correct data. It remained unclear whether the data was based on speculations about the respective graphics card prior to their release or if it simply generated hallucinated data. Moreover, it didn’t provide any sources to validate its information, rendering the results completely useless.
  • Perplexity A.I. performed better than Google Bard but still provided too much inaccurate data to be of any practical use.

Continuing the conversation, we provided more details. We guided the chatbot towards a reliable source that already had the desired information listed in a structured manner. However, this source lacked the capability to compare the technical specifications.

Round 2

To summarize the second round:

  • Bing Chat performed well and displayed some of the most important specifications. However, since we didn’t provide it with specific instructions about the desired data, we needed to be more precise in the next round.
  • Google Bard continued to present even more incorrect data. It seemed to have failed to understand the task despite initially confirming its ability. This was a clear elimination for Bard in the next round.
  • Perplexity A.I. outperformed Google Bard, but it either didn’t find the right webpage on TechPowerUp or failed to comprehend the task entirely. Consequently, it was eliminated for the final round.

Continuing the conversation, we provided Bing Chat with precise instructions regarding the exact technical specifications we are interested in.

Round 3

To summarize the final round:

Bing Chat performed very well, created the corresponding web searches, and generated a perfect product comparison in a table as requested.

  • There was one drawback: initially, I included even more technical specifications, but Bing Chat seemed to reach its data processing limit in one step.
  • Fair enough, as the free version had limitations on the number of tokens it could process. I anticipated that the paid version, which would soon be available for $30 per user, would be capable of handling larger tables.
  • Additionally, Bing Chat provided a list of webpages from which it obtained the data, allowing for verification and providing hints for further prompt improvement.
  • From a researcher perspective, this task was pretty straightforward, and we needed to give Bing Chat the recipe to solve it. So Bing Chat was not the clever researcher you should be.
  • However, consider the prompt as a means to dynamically create product comparisons using currently available data. This approach can significantly accelerate the process.

The Verdict

Professionals in the field of Competitive Intelligence/Market Intelligence (CI/MI) should consider investigating this emerging technology and evaluating its potential integration into their routine tasks. The use of specific plugins to automate the creation and validation of product comparisons can improve usability, particularly for novice users, and speed up these tasks. It is anticipated that leading CI/MI software providers will incorporate large language models into their software for similar applications. In the meantime, you have the option to create your own plugins or hire someone to code them, thereby enhancing your CI/MI software package.

This competition is merely a temporary result. We can anticipate that all participants will continue to improve and provide even more value to our daily tasks in the future.

However, there are two current limitations:

  1. Susceptibility to Hallucinations: As demonstrated above, all participants are susceptible to hallucinations, meaning fact-checking is necessary. Fortunately, Bing Chat simplifies this process by providing the sources used to generate the results.
  2. Policy Alignment: It’s crucial to ensure the use of such tools aligns with your company’s policies. Using free and online tools could potentially breach confidentiality. However, there are now options to set up your own company server with various tools, ensuring confidentiality.

In Summary

My recommendation is to use Bing Chat as a search co-pilot with some extras, just as marketed by Microsoft. It doesn’t revolutionize the world of online search in a professional environment, but it certainly can increase efficiency in many cases. Having said this, in the last 6 months of using Bing Chat this way, there were cases where it just got it all wrong. So stay critical with its results, double-check where you need accurate information, and even explore the opposite hypotheses to prevent being lured into false assumptions.

 

Do you want to learn more about all the new stuff in online research?
Then you might consider our workshop by Arthur Weiss in February.

 

Log in