@bibliolater@ai minor comment: the LLM data are not being compared to multiple responses by a single person on the same task as that is not a general feature of the primary human experimental literature involved. So, as far as I can make out, the levels of human self-consistency are simply imputed/assumed. Doesn’t mean the difference isn’t there, just that the empirical basis seems somewhat anecdotal.