Choosing the Right Model
Depending on your specific workflow, usecase and hardware setup, you might want to choose a specific model other than the default. Some suggestions are listed below, or feel free to explore from all the available models.Fastest
V2 Medium: Quick analysis with good accuracy, 256x256 resolution. Great for most workflows.
Fast
V2 XLarge: Larger model with higher accuracy, 256x256 resolution.
Accurate
V1 Multilingual high-res: Top-tier accuracy with 384×384 resolution.
Benefits from more detailed and verbose searches, great multilingual support.
Most Accurate
V2 Multilingual x-high-res: Highest accuracy with 512x512 resolution.
Benefits from more detailed and verbose searches, great multilingual support.
Visual models
This is a more comprehensive list of the models and their properties.Model Name (Display) | Resolution | Size on Disk | Information |
---|---|---|---|
V2 Medium (Default) | 256×256 | ~1.5GB | Default, bundled in app installer Balanced accuracy vs. performance, improved semantic understanding from V2 model improvements. Works well on most modern computers. |
V2 Multilingual x-high-res | 512x512 | ~2GB | NEW, MOST ACCURATE MODEL 💪 Fantastic search result quality in both non-English and English languages. Benefits from more detailed and “verbose” searches - e.g. “a metal sign saying XYZ” instead of just “XYZ” |
V1 Multilingual high-res | 384x384 | ~4GB | Very accurate search result quality in both non-English and English languages. Benefits from more detailed and “verbose” searches - e.g. “a metal sign saying XYZ” instead of just “XYZ” |
V2 Medium high-res | 384×384 | ~1.5GB | Identical parameter count as V2 Medium but will analyze frames at 384×384 for more image detail. Good if you need slightly finer detail than 256×256, but be aware it requires more RAM/VRAM. |
V2 Medium x-high-res | 512×512 | ~1.5GB | Same parameter count as V2 Medium, but even higher resolution (512×512). Ideal for text detection/OCR or detailed reverse image searches. Uses significantly more memory and produces larger analysis files. |
V2 Large | 256×256 | ~3.3GB | Larger model with higher accuracy in challenging scenarios. |
V2 Large high-res | 384×384 | ~3.3GB | Same as V2 Large but with higher resolution (384×384). Ideal for text detection/OCR or detailed reverse image searches. |
V2 Large x-high-res | 512×512 | ~3.3GB | Same parameters as V2 Large, but at 512×512 input resolution. Ideal for text detection/OCR or detailed reverse image searches. |
V2 XLarge | 256×256 | ~4.5GB | Even larger V2 model. Offers improved recognition of subtle elements. Ideal if you want top-tier accuracy and have the computer to run it. |
V2 XLarge high-res | 384×384 | ~4.5GB | A higher-resolution variant of V2 XLarge. |
V2 XLarge x-high-res | 512×512 | ~4.5GB | The heaviest V2 model in terms of resolution and parameter requirements. Even more accurate, even more resource demanding. |
Medium | 256×256 | ~812MB | Legacy previous default model (V1), previously bundled with the application. Reasonably accurate but outperformed by V2 Medium in most scenarios. If your hardware can handle V2 Medium, prefer that instead. |
Medium x-high-res | 512×512 | ~812MB | Same as V1 Medium but at 512×512. Useful for text detection, reverse image searches, etc. Produces larger analysis files than some bigger V1 models, purely due to the higher frame resolution. |
Large multilingual | 256×256 | ~1.48GB | V1 Multilingual version that improves accuracy for non-English text. V2 equivalents are multilingual by default. |
Large | 256×256 | ~2.61GB | Larger V1 model. Prefer V2 alternatives. |
Large high-res | 384×384 | ~2.61GB | Same as V1 Large but higher resolution. Demands more resources. |
XLarge high-res | 384×384 | ~3.51GB | Largest V1 model. High accuracy, but overshadowed by the new V2 XLarge. Recommended only for legacy compatibility if you can’t run V2. |
XLarge multilingual | 256×256 | ~4.51GB | Largest V1 multilingual model. Very high accuracy for non-English searches - V2 models are multilingual by default, but V1 multilingual models can be slightly better than V2 models on certain languages. |
Speech models
For transcriptions, Jumper uses Whisper models developed by OpenAI. The exact model depends on your platform.Platform | Model Variant | Size on Disk | Notes |
---|---|---|---|
Windows / Intel Mac | whisper-large-v3-turbo | ~1.62GB | Bundled with Windows and Intel Mac installers. |
Apple M-series Macs | whisper-large-v3-turbo | ~467MB | Uses a quantized version converted to Apple’s MLX framework for hardware acceleration (faster analysis, smaller model size). Bundled with Apple M-series installer. |