Machine Learning Models

Choosing the Right Model

Depending on your specific workflow, usecase and hardware setup, you might want to choose a specific model other than the default. Some suggestions are listed below, or feel free to explore from all the available models.

Fast

V2 Medium: Quick analysis with good accuracy. Great for most workflows

Accurate

V2 Large high-res: Higher accuracy with 384×384 resolution. Good for detailed searches

More Accurate

V2 XLarge x-high-res: Top-tier accuracy with 512×512 resolution. Best for complex searches

Most Accurate

V2 Multilingual x-high-res: Highest accuracy with multilingual support. Ideal for international content

Visual models

This is a more comprehensive list of the models and their properties.

Model Name (Display)	Resolution	Size on Disk	Information
V2 Medium (Default)	256×256	~1.5GB	Default, bundled in app installer Balanced accuracy vs. performance, improved semantic understanding from V2 model improvements. Works well on most modern computers.
V2 Multilingual x-high-res	512x512	~2GB	NEW, MOST ACCURATE MODEL 💪 Fantastic search result quality in both non-English and English languages. Improves accuracy on multilingual benchmarks by 50% compared to default model. Highly recommended!
V2 Medium high-res	384×384	~1.5GB	Identical parameter count as V2 Medium but will analyze frames at 384×384 for more image detail. Good if you need slightly finer detail than 256×256, but be aware it requires more RAM/VRAM.
V2 Medium x-high-res	512×512	~1.5GB	Same parameter count as V2 Medium, but even higher resolution (512×512). Ideal for text detection/OCR or detailed reverse image searches. Uses significantly more memory and produces larger analysis files.
V2 Large	256×256	~3.3GB	Larger model with higher accuracy in challenging scenarios.
V2 Large high-res	384×384	~3.3GB	Same as V2 Large but with higher resolution (384×384). Ideal for text detection/OCR or detailed reverse image searches.
V2 Large x-high-res	512×512	~3.3GB	Same parameters as V2 Large, but at 512×512 input resolution. Ideal for text detection/OCR or detailed reverse image searches.
V2 XLarge	256×256	~4.5GB	Even larger V2 model. Offers improved recognition of subtle elements. Ideal if you want top-tier accuracy and have the computer to run it.
V2 XLarge high-res	384×384	~4.5GB	A higher-resolution variant of V2 XLarge.
V2 XLarge x-high-res	512×512	~4.5GB	The heaviest V2 model in terms of resolution and parameter requirements. Even more accurate, even more resource demanding.
Medium	256×256	~812MB	Legacy previous default model (V1), previously bundled with the application. Reasonably accurate but outperformed by V2 Medium in most scenarios. If your hardware can handle V2 Medium, prefer that instead.
Medium x-high-res	512×512	~812MB	Same as V1 Medium but at 512×512. Useful for text detection, reverse image searches, etc. Produces larger analysis files than some bigger V1 models, purely due to the higher frame resolution.
Large multilingual	256×256	~1.48GB	V1 Multilingual version that improves accuracy for non-English text. V2 equivalents are multilingual by default.
Large	256×256	~2.61GB	Larger V1 model. Prefer V2 alternatives.
Large high-res	384×384	~2.61GB	Same as V1 Large but higher resolution. Demands more resources.
XLarge high-res	384×384	~3.51GB	Largest V1 model. High accuracy, but overshadowed by the new V2 XLarge. Recommended only for legacy compatibility if you can’t run V2.
XLarge multilingual	256×256	~4.51GB	Largest V1 multilingual model. Very high accuracy for non-English searches - V2 models are multilingual by default, but V1 multilingual models can be slightly better than V2 models on certain languages.

Speech models

For transcriptions, Jumper uses Whisper models developed by OpenAI. The exact model depends on your platform.

Platform	Model Variant	Size on Disk	Notes
Windows / Intel Mac	whisper-large-v3-turbo	~1.62GB	Bundled with Windows and Intel Mac installers.
Apple M-series Macs	whisper-large-v3-turbo	~467MB	Uses a quantized version converted to Apple’s MLX framework for hardware acceleration (faster analysis, smaller model size). Bundled with Apple M-series installer.

Get Started

Core concepts

Integrations

Common issues

Machine Learning Models

Choosing the Right Model

Fast

Accurate

More Accurate

Most Accurate

Visual models

Speech models

Get Started

Core concepts

Integrations

Common issues

​Choosing the Right Model

Fast

Accurate

More Accurate

Most Accurate

​Visual models

​Speech models

Choosing the Right Model

Visual models

Speech models