
Evaluating AI Output
Key Ideas
Evaluative Judgement - Framework that positions knowledge makers to confront the unknown and manage what they do not know as they work in AI-mediated contexts.
Good evaluative judgement requires that a person/writer’s claims are defensible within the standards of their rhetorical situation, however, learners often do not know what makes a good text (Bearman et al., 2024).
It is difficult for people who lack knowledge about what counts as defensible within disciplinary standards—and why those standards exist—to judge AI outputs appropriately within the knowledge making context. Without this awareness, they also cannot fully take part in the communities for which they are creating meaning (Bearman et al., 2024; Eaton et al., under review; Walton et al., 2025).

The "black box" nature of generative AI makes it difficult to evaluate outputs. Complete verification of AI content is impossible, and AI systems have no inherent commitment to accuracy. This means that looking only at outputs misses important factors for judging whether they are true and trustworthy within the knowledge making context.
This leads to a “perverse outcome of making individual students entirely responsible for the truth of the claims in the work they submit” (Flenady & Sparrow, 2025, p. 5). Learners cannot fully verify this information, which means they become accountable for content whose origins they cannot track
Evaluative judgement is a promising framework that has been proposed to support learners as they use AI outputs. These evaluations could help learners contend with what Zendik (2021) and Bearman and Ajjawi (2023) highlighted as the "black box" of AI outputs, where the processes and sources behind what AI generates often remain obscure.
Some questions from we can ask to improve student’s evaluative judgements include (see Eaton et al., under review):
1
What can you confirm is valid for this context based on your prior knowledge?
2
What do you not know is valid and how can you validate/invalidate that knowledge?
3
What cannot be validated and how do you address this to ensure the knowledge you create is valid?
Let’s see these questions in action with an example generated by an AI output:
NOTE. A significant issue is figuring out whether these assessments are reliable or whether they reflect user mistakes, since effective AI outputs depend on users who can craft strong prompts and incorporate the results skillfully.

Remember
Evaluative judgement can allow people to become knowledge mediators, disruptors of an epistemic processes that exists, in part, beyond them.
Bearman et al. (2024) noted that learners might struggle to assess quality effectively. When working with AI, users continually evaluate outputs to determine how these tools might serve their purposes. Walton et al. (2025) showed that, through these judgements, learners also evaluate when AI tools will or will not be useful in future work.