| V2 Medium (Default) | 256×256 | ~1.5GB | Default, bundled in app installer Balanced accuracy vs. performance, improved semantic understanding from V2 model improvements. Works well on most modern computers. |
| V2 Multilingual x-high-res | 512x512 | ~2GB | NEW, MOST ACCURATE MODEL 💪 Fantastic search result quality in both non-English and English languages. Benefits from more detailed and “verbose” searches - e.g. “a metal sign saying XYZ” instead of just “XYZ” |
| V1 Multilingual high-res | 384x384 | ~4GB | Very accurate search result quality in both non-English and English languages. Benefits from more detailed and “verbose” searches - e.g. “a metal sign saying XYZ” instead of just “XYZ” |
| V2 Medium high-res | 384×384 | ~1.5GB | Identical parameter count as V2 Medium but will analyze frames at 384×384 for more image detail. Good if you need slightly finer detail than 256×256, but be aware it requires more RAM/VRAM. |
| V2 Medium x-high-res | 512×512 | ~1.5GB | Same parameter count as V2 Medium, but even higher resolution (512×512). Ideal for text detection/OCR or detailed reverse image searches. Uses significantly more memory and produces larger analysis files. |
| V2 Large | 256×256 | ~3.3GB | Larger model with higher accuracy in challenging scenarios. |
| V2 Large high-res | 384×384 | ~3.3GB | Same as V2 Large but with higher resolution (384×384). Ideal for text detection/OCR or detailed reverse image searches. |
| V2 Large x-high-res | 512×512 | ~3.3GB | Same parameters as V2 Large, but at 512×512 input resolution. Ideal for text detection/OCR or detailed reverse image searches. |
| V2 XLarge | 256×256 | ~4.5GB | Even larger V2 model. Offers improved recognition of subtle elements. Ideal if you want top-tier accuracy and have the computer to run it. |
| V2 XLarge high-res | 384×384 | ~4.5GB | A higher-resolution variant of V2 XLarge. |
| V2 XLarge x-high-res | 512×512 | ~4.5GB | The heaviest V2 model in terms of resolution and parameter requirements. Even more accurate, even more resource demanding. |
| Medium | 256×256 | ~812MB | Legacy previous default model (V1), previously bundled with the application. Reasonably accurate but outperformed by V2 Medium in most scenarios. If your hardware can handle V2 Medium, prefer that instead. |
| Medium x-high-res | 512×512 | ~812MB | Same as V1 Medium but at 512×512. Useful for text detection, reverse image searches, etc. Produces larger analysis files than some bigger V1 models, purely due to the higher frame resolution. |
| Large multilingual | 256×256 | ~1.48GB | V1 Multilingual version that improves accuracy for non-English text. V2 equivalents are multilingual by default. |
| Large | 256×256 | ~2.61GB | Larger V1 model. Prefer V2 alternatives. |
| Large high-res | 384×384 | ~2.61GB | Same as V1 Large but higher resolution. Demands more resources. |
| XLarge high-res | 384×384 | ~3.51GB | Largest V1 model. High accuracy, but overshadowed by the new V2 XLarge. Recommended only for legacy compatibility if you can’t run V2. |
| XLarge multilingual | 256×256 | ~4.51GB | Largest V1 multilingual model. Very high accuracy for non-English searches - V2 models are multilingual by default, but V1 multilingual models can be slightly better than V2 models on certain languages. |