Machine Learning Models
Jumper uses powerful AI models to help you search through your videos visually. These models understand what’s in your footage and let you find moments using natural language.
Choosing the Right Model
Depending on your specific workflow, usecase and hardware setup, you might want to choose a specific model other than the default. Some suggestions are listed below, or feel free to explore from all the available models.
Fast
V2 Medium: Quick analysis with good accuracy. Great for most workflows
Accurate
V2 Large high-res: Higher accuracy with 384×384 resolution. Good for detailed searches
More Accurate
V2 XLarge x-high-res: Top-tier accuracy with 512×512 resolution. Best for complex searches
Most Accurate
V2 Multilingual x-high-res: Highest accuracy with multilingual support. Ideal for international content
Visual models
This is a more comprehensive list of the models and their properties.
Model Name (Display) | Resolution | Size on Disk | Information |
---|---|---|---|
V2 Medium (Default) | 256×256 | ~1.5GB | Default, bundled in app installer Balanced accuracy vs. performance, improved semantic understanding from V2 model improvements. Works well on most modern computers. |
V2 Multilingual x-high-res | 512x512 | ~2GB | NEW, MOST ACCURATE MODEL 💪 Fantastic search result quality in both non-English and English languages. Improves accuracy on multilingual benchmarks by 50% compared to default model. Highly recommended! |
V2 Medium high-res | 384×384 | ~1.5GB | Identical parameter count as V2 Medium but processes frames at 384×384 for more image detail. Good if you need slightly finer detail than 256×256, but be aware it requires more RAM/VRAM. |
V2 Medium x-high-res | 512×512 | ~1.5GB | Same parameter count as V2 Medium, but even higher resolution (512×512). Ideal for text detection/OCR or detailed reverse image searches. Uses significantly more memory and produces larger analysis files. |
V2 Large | 256×256 | ~3.3GB | Larger model with higher accuracy in challenging scenarios. |
V2 Large high-res | 384×384 | ~3.3GB | Same as V2 Large but with higher resolution (384×384). Ideal for text detection/OCR or detailed reverse image searches. |
V2 Large x-high-res | 512×512 | ~3.3GB | Same parameters as V2 Large, but at 512×512 input resolution. Ideal for text detection/OCR or detailed reverse image searches. |
V2 XLarge | 256×256 | ~4.5GB | Even larger V2 model. Offers improved recognition of subtle elements. Ideal if you want top-tier accuracy and have the computer to run it. |
V2 XLarge high-res | 384×384 | ~4.5GB | A higher-resolution variant of V2 XLarge. |
V2 XLarge x-high-res | 512×512 | ~4.5GB | The heaviest V2 model in terms of resolution and parameter requirements. Even more accurate, even more resource demanding. |
Medium | 256×256 | ~812MB | Legacy previous default model (V1), previously bundled with the application. Reasonably accurate but outperformed by V2 Medium in most scenarios. If your hardware can handle V2 Medium, prefer that instead. |
Medium x-high-res | 512×512 | ~812MB | Same as V1 Medium but at 512×512. Useful for text detection, reverse image searches, etc. Produces larger analysis files than some bigger V1 models, purely due to the higher frame resolution. |
Large multilingual | 256×256 | ~1.48GB | V1 Multilingual version that improves accuracy for non-English text. V2 equivalents are multilingual by default. |
Large | 256×256 | ~2.61GB | Larger V1 model. Prefer V2 alternatives. |
Large high-res | 384×384 | ~2.61GB | Same as V1 Large but higher resolution. Demands more resources. |
XLarge high-res | 384×384 | ~3.51GB | Largest V1 model. High accuracy, but overshadowed by the new V2 XLarge. Recommended only for legacy compatibility if you can’t run V2. |
XLarge multilingual | 256×256 | ~4.51GB | Largest V1 multilingual model. Very high accuracy for non-English searches - V2 models are multilingual by default, but V1 multilingual models can be slightly better than V2 models on certain languages. |
Speech models
For transcriptions, Jumper uses Whisper models developed by OpenAI. The exact model depends on your platform.
Platform | Model Variant | Size on Disk | Notes |
---|---|---|---|
Windows / Intel Mac | whisper-large-v3-turbo | ~1.62GB | Bundled with Windows and Intel Mac installers. |
Apple M-series Macs | whisper-large-v3-turbo | ~467MB | Uses a quantized version converted to Apple’s MLX framework for hardware acceleration (faster analysis, smaller model size). Bundled with Apple M-series installer. |