Testing GPT-5 on Vision Tasks
OpenAI released GPT-5, the newest model in their GPT series.
GPT-5 has advanced reasoning capabilities and, like many recent models by OpenAI, multimodal support.
This means that you can both prompt GPT-5 with one or more images and ask for an answer, but also prompt the model to spend more time reasoning before answering.
This blogpost covers results related to
- GPT-5 for Document Understanding and OCR
- GPT-5 for Defect Detection
- GPT-5 for Object Counting
- GPT-5 for Object Detection: Benchmarks with RF100-VL
- Reasoning and Visual Task Performance with GPT-5

https://blog.roboflow.com/gpt-5-vision-multimodal-evaluation/