Evaluating AI generated content is tricky, mainly because consumers do not have access or control over the information or algorithm used to link and categorize concepts. This means that when it comes to evaluating AI we have to consider two key players: information input, as well as the information output. Input directly influences the output generated by AI systems. As such, the first thing we want to assess when it comes to AI is the quality of our initial prompt. As we are aware by now, the more we fine tune our prompt the more precise the output.
Prompt:
Example:
Imagine that you have fed the following prompt to a generative AI chatbox such as Gemini:
Notice how adding a role changed the output from the AI. In addition to the requested information we also received extra tips related to our input of “budget”, likewise by assigning it the role of a “college student” notice how the tone also became less formal and more approachable.This example is fairly low stakes in terms of potential harm from AI generated content.
However, the more complex the input being asked for, the more room for the following known issues with AI to propagate: algorithmic bias, misinformation, and fabrications (hallucinations) to appear in the results being retrieved.