KazTEB Leaderboard ๐
Kazakh language extension for the Massive Text Embedding Benchmark
This is a new and ongoing project dedicated to a comprehensive evaluation of existing text embedding models on datasets designed for Kazakh language tasks. Link to the project code.
Currently, the leaderboard supports only 3 tasks: retrieval, classification, and bitext mining, based on existing human-annotated datasets. The aim of this project is to extend the list to 8 tasks proposed in MTEB and cover multiple domains within each task. The test datasets are planned to be acquired from real data sources, without using synthetic samples.
10 | 0.7174 | 2048 | Unknown | 3072 | 0.7174 |
๐ TODO:
- Dynamic Data Loading: Switching to API-based result fetching for real-time updates without manual JSON uploads.
๐ง Contact: arysbatyr@gmail.com