evaluation-llms topic

List evaluation-llms repositories

AttrScore

Stars

Forks

Watchers

Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"

OSU-NLP-Group

attribution

chatgpt

evaluation-llms

gpt-4

[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, tem...

RaptorMai

benchmark

evaluation-llms

foundation-models

human-annotation