Choosing the Right Model

Depending on your specific workflow, usecase and hardware setup, you might want to choose a specific model other than the default. Some suggestions are listed below, or feel free to explore from all the available models.

Fast

V2 Medium: Quick analysis with good accuracy. Great for most workflows

Accurate

V2 Large high-res: Higher accuracy with 384×384 resolution. Good for detailed searches

More Accurate

V2 XLarge x-high-res: Top-tier accuracy with 512×512 resolution. Best for complex searches

Most Accurate

V2 Multilingual x-high-res: Highest accuracy with multilingual support. Ideal for international content

Visual models

This is a more comprehensive list of the models and their properties.

Model Name (Display)ResolutionSize on DiskInformation
V2 Medium (Default)256×256~1.5GBDefault, bundled in app installer
Balanced accuracy vs. performance, improved semantic understanding from V2 model improvements. Works well on most modern computers.
V2 Multilingual x-high-res512x512~2GBNEW, MOST ACCURATE MODEL 💪 Fantastic search result quality in both non-English and English languages. Improves accuracy on multilingual benchmarks by 50% compared to default model. Highly recommended!
V2 Medium high-res384×384~1.5GBIdentical parameter count as V2 Medium but processes frames at 384×384 for more image detail. Good if you need slightly finer detail than 256×256, but be aware it requires more RAM/VRAM.
V2 Medium x-high-res512×512~1.5GBSame parameter count as V2 Medium, but even higher resolution (512×512). Ideal for text detection/OCR or detailed reverse image searches. Uses significantly more memory and produces larger analysis files.
V2 Large256×256~3.3GBLarger model with higher accuracy in challenging scenarios.
V2 Large high-res384×384~3.3GBSame as V2 Large but with higher resolution (384×384). Ideal for text detection/OCR or detailed reverse image searches.
V2 Large x-high-res512×512~3.3GBSame parameters as V2 Large, but at 512×512 input resolution. Ideal for text detection/OCR or detailed reverse image searches.
V2 XLarge256×256~4.5GBEven larger V2 model. Offers improved recognition of subtle elements. Ideal if you want top-tier accuracy and have the computer to run it.
V2 XLarge high-res384×384~4.5GBA higher-resolution variant of V2 XLarge.
V2 XLarge x-high-res512×512~4.5GBThe heaviest V2 model in terms of resolution and parameter requirements. Even more accurate, even more resource demanding.
Medium256×256~812MBLegacy previous default model (V1), previously bundled with the application. Reasonably accurate but outperformed by V2 Medium in most scenarios. If your hardware can handle V2 Medium, prefer that instead.
Medium x-high-res512×512~812MBSame as V1 Medium but at 512×512. Useful for text detection, reverse image searches, etc. Produces larger analysis files than some bigger V1 models, purely due to the higher frame resolution.
Large multilingual256×256~1.48GBV1 Multilingual version that improves accuracy for non-English text. V2 equivalents are multilingual by default.
Large256×256~2.61GBLarger V1 model. Prefer V2 alternatives.
Large high-res384×384~2.61GBSame as V1 Large but higher resolution. Demands more resources.
XLarge high-res384×384~3.51GBLargest V1 model. High accuracy, but overshadowed by the new V2 XLarge. Recommended only for legacy compatibility if you can’t run V2.
XLarge multilingual256×256~4.51GBLargest V1 multilingual model. Very high accuracy for non-English searches - V2 models are multilingual by default, but V1 multilingual models can be slightly better than V2 models on certain languages.

Speech models

For transcriptions, Jumper uses Whisper models developed by OpenAI. The exact model depends on your platform.

PlatformModel VariantSize on DiskNotes
Windows / Intel Macwhisper-large-v3-turbo~1.62GBBundled with Windows and Intel Mac installers.
Apple M-series Macswhisper-large-v3-turbo~467MBUses a quantized version converted to Apple’s MLX framework for hardware acceleration (faster analysis, smaller model size). Bundled with Apple M-series installer.