Comparison of K - Means and Fuzzy C - Means Clustering Algorithms on Water Quality Parameter: Case Study of Ergene Basin of 17 Stations


Arslan Çene G., Parim C., Çene E.

3.Uluslararası Mühendislik ve Doğa Bilimleri Çalışmaları Kongresi’(ICENSS-2023), Ankara, Türkiye, 24 - 25 Mayıs 2023, ss.1-2

  • Yayın Türü: Bildiri / Özet Bildiri
  • Basıldığı Şehir: Ankara
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.1-2
  • Marmara Üniversitesi Adresli: Evet

Özet

Abstract Water quality parameters are important measures of the health and safety of water sources, which can be affected by various natural and human-induced factors. There are several parameters to assess water quality. The aim of this study is to group 17 water stations in the Ergene Basin, Turkiye by using k - means and fuzzy c-means clustering algorithms which are methods of unsupervised machine learning. For this reason, 15 water-related variables from the period of 1985-2013 are used to group 17 water stations. Different numbers of clusters are inspected in both of the algorithms and the optimal number of clusters is found as 4. These clusters are named high-quality water, slightly polluted water, polluted water, and highly polluted water. The selected water parameters are Biochemical oxygen demand (BOD5), Chloride (Cl-), Dissolved oxygen (DO), Escherichia coli (EC), Aluminum (Al), Ammonium–nitrogen (NH4-N), Nitrite–nitrogen (NO2-N), Nitrate–nitrogen (NO3-N), Orthophosphate (o-PO4), Potential of Hydrogen (pH), Photovoltaics (pV), Suspended Solid (SS), Temperature (T), Total Dissolved Solid (TDS), Turbidity (Turb). The center of the clusters is used to identify the characteristics of stations. The first cluster has the lowest BOD5, Al, NO2-N, T average, and the highest DO average. The second cluster has the lowest Cl-, EC, NH4-N, o-PO4, pV, SS, TDS, and Turb average, and the highest NO3-N, pH, and T average. The third cluster has the lowest DO average and has the highest Cl-, EC, Al, NH4-N, NO2-N, oPO4, and TDS average. The fourth cluster has the lowest NO3-N and pH average and has the highest BOD5, pV, SS, and Turb average. Both k-means and fuzzy c-means clustering gives similar results both among stations and years. Water quality for most of the stations in this basin improved after the year 2006 whereas the water quality of a few stations get worse after the year 1990. Keywords: K-means clustering, Fuzzy c-means clustering, Water Quality, Ergene Basin, Machine Learning