Phi-4-reasoning-vision-15B

Overview

Phi-4-reasoning-vision-15B is a broadly capable model that can be used for a wide array of vision-language tasks such as image captioning, asking questions about images, reading documents and receipts, helping with homework, interfering about changes in sequences of images, and much more. Beyond these general capabilities it excels at math and science reasoning and at understanding and grounding elements on computer and mobile screens.

Benchmark Results

We've tested on the following benchmarks:

Benchmark	Link
AI2D	lmms-lab/ai2d · Datasets at Hugging Face
HallusionBench	lmms-lab/HallusionBench · Datasets at Hugging Face
MathVerse	AI4Math/MathVerse · Datasets at Hugging Face
MathVision	MathLLMs/MathVision · Datasets at Hugging Face
MathVista	AI4Math/MathVista · Datasets at Hugging Face
MMMU	MMMU/MMMU · Datasets at Hugging Face
MMStar	Lin-Chen/MMStar · Datasets at Hugging Face
ScreenSpot v2	Voxel51/ScreenSpot-v2 · Datasets at Hugging Face
WeMath	We-Math/We-Math · Datasets at Hugging Face
ZEROBench	jonathan-roberts1/zerobench · Datasets at Hugging Face

Benchmark Results

Benchmark	Score
AI2D_TEST	84.8
HallusionBench	64.4
MathVerse_MINI	44.9
MathVision_MINI	36.2
MathVista_MINI	75.2
MMMU_VAL	54.3
MMStar	64.5
ScreenSpot_v2_Desktop	87.1
ScreenSpot_v2_Mobile	88.6
ScreenSpot_v2_Web	88.8
WeMath	50.1
ZEROBench_sub	17.7

Hosting the Model

Recommended: The easiest way to get started is using Azure Foundry hosting, which requires no GPU hardware or model downloads. Alternatively, you can self-host with vLLM if you have GPU resources available.

Azure Foundry Hosting (Recommended)

Deploy Phi-4-Reasoning-Vision-15B on Azure Foundry without needing to download weights or manage GPU infrastructure.

Setup:

Deploy the model on Azure Foundry and obtain your endpoint URL, API key and deployment name.

Use the following sample script, be sure to replace the following:

IMAGE_PATH, ENDPOINT_BASE, API_KEY, DEPLOYMENT_NAME
Optional: content of the payload message

import base64
import os
import requests

IMAGE_PATH = "<replace_with_your_image>.jpg"

ENDPOINT_BASE = "<your_base_endpoint_url>"
API_KEY = "<your_api_key_here>"
DEPLOYMENT_NAME = "Phi-4-Reasoning-Vision-15B" # replace here with your deployment name

def main():
    with open(IMAGE_PATH, "rb") as f:
        image_b64 = base64.b64encode(f.read()).decode("utf-8")

    payload = {
        "model": "Phi-4-Reasoning-Vision-15B",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Describe this image in detail."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{image_b64}"
                        },
                    },
                ],
            }
        ],
        "max_tokens": 4096,
        "temperature": 0.0,
    }

    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
        "azureml-model-deployment": DEPLOYMENT_NAME,
    }

    url = f"{ENDPOINT_BASE}/v1/chat/completions"
    print(f"Requesting: {url}")

    resp = requests.post(url, json=payload, headers=headers, timeout=120)
    resp.raise_for_status()

    result = resp.json()
    print("\n--- Response ---")
    print(result["choices"][0]["message"]["content"])


if __name__ == "__main__":
    main()

That's it! No GPU or model downloads required.

Self-hosting with Transformers

If you have access to GPU resources, you can run inference using HuggingFace Transformers library. This requires a GPU machine with sufficient VRAM (e.g., 40GB or more).

Instructions available here.

Self-hosting with vLLM

If you have access to GPU resources, you can self-host using vLLM. This requires a GPU machine with sufficient VRAM (e.g., 40GB or more).

Instructions available here.

Citation

If you use Phi-4-Reasoning-Vision in your research, please use the following BibTeX entry.

@article{phi4vr14b2026,
  title={Phi-4-Vision-Reasoning Technical Report},
  author={Aneja, Jyoti and Harrison, Michael and Joshi, Neel and LaBonte, Tyler and Langford, John and Salinas, Eduardo and Ward, Rachel},
  journal={arXiv:2511.19663},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
huggingface-transformers		huggingface-transformers
vllm		vllm
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
TRANSPARENCY_NOTE.md		TRANSPARENCY_NOTE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phi-4-reasoning-vision-15B

Overview

Benchmark Results

We've tested on the following benchmarks:

Benchmark Results

Hosting the Model

Azure Foundry Hosting (Recommended)

Self-hosting with Transformers

Self-hosting with vLLM

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Phi-4-reasoning-vision-15B

Overview

Benchmark Results

We've tested on the following benchmarks:

Benchmark Results

Hosting the Model

Azure Foundry Hosting (Recommended)

Self-hosting with Transformers

Self-hosting with vLLM

Citation

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages