KazTEB Leaderboard ๐
Kazakh language extension for the Massive Text Embedding Benchmark
This is a new and ongoing project dedicated to a comprehensive evaluation of existing text embedding models on datasets designed for Kazakh language tasks. Link to the project code.
Currently, the leaderboard supports only 3 tasks: retrieval, classification, and bitext mining, based on existing human-annotated datasets. The aim of this project is to extend the list to 8 tasks proposed in MTEB and cover multiple domains within each task. The test datasets are planned to be acquired from real data sources, without using synthetic samples.
1 | 0.6435 | 8192 | 560M | 4096 | 0.6435 |
๐ TODO:
- API-based Model Evaluation: Adding results of closed-source models such as Google's Gemini embeddings.
- Dynamic Data Loading: Switching to API-based result fetching for real-time updates without manual JSON uploads.
๐ง Contact: arysbatyr@gmail.com