Analysis of Consistency of Large Language Models for Low-Resource Languages like Turkish with Min-P and Top-P Sampling Parameters T rk e gibi Az Kaynakli Diller i in B y k Dil Modeli Tutarliliginin Min-P ve Top-P rnekleme Parametreleri ile Analizi


Uzumcu T., GANİZ M. C.

33rd IEEE Conference on Signal Processing and Communications Applications, SIU 2025, İstanbul, Türkiye, 25 - 28 Haziran 2025, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu66497.2025.11112080
  • Basıldığı Şehir: İstanbul
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: Large Language Models, Low-Resource Languages, Min-P, Sampling Strategies, Temperature Sampling, Top-P, Turkish Text Generation
  • Marmara Üniversitesi Adresli: Evet

Özet

Large Language Models (LLMs) struggle to maintain language consistency in low-resource languages like Turkish when sampling at high temperature parameters. This study investigates the effects of recently introduced min-p and top-p parameter values, which filter low-probability tokens, on Turkish text generation in open-source LLMs trained predominantly on English. The effectiveness of min-p in maintaining Turkish consistency across different temperature and top-p settings is evaluated using Supreme Court decision summaries. Detailed experiments demonstrate that min-p sampling significantly increases linguistic consistency at high temperatures and allows for greater creativity without compromising consistency.