Development of high-order harmonic generation code using CUDA

Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Marmara Üniversitesi, Fen Bilimleri Enstitüsü, Fizik Anabilim Dalı, Türkiye

Tezin Onay Tarihi: 2019

Tezin Dili: İngilizce

Öğrenci: OZAN OĞUZ

Danışman: Erdi Ata Bleda

Özet:

Yüksek-Mertebe Harmonik üretimi (HHG) şiddetli lazer alanı ile hedefin (gaz, plazma, katı) etkileşime geçmesi sonucunda oluşan lineer olmayan bir süreçtir. Süreç ile birlikte oluşan uyumlu harmonikler, ultravioleden soft X-ray bölgesine kadar femtosaniye süresinin altında şiddetli attosaniye atmalar üretmek için bir kaynak olarak iş görebilir. Biz bu tezde öncelikli olarak, tek bir Rydberg atomundan oluşan yüksek-mertebeli harmonik üretimini simüle etmek için iki kod geliştirdik. Geliştirdiğimiz ilk kod sadece tek merkezi işlemci birimi (CPU) izleğinde çalışıyordu; bu tez için geliştirdiğimiz ikinci kod ise simetrik çoklu işlemleme (SMP) mimarisi sayesinde çoklu CPU izlekleri üzerinde başlangıç quantum durumunda tek elektron Rydberg atomunun lazer alanı ile etkileşiminin zaman operatörünü kısmen paralelize ederek çalışabiliyordu. Seri kodumuzu kullanarak yayın kalitesinde sonuçlar elde etmek, ortalama 5 ile 6 saat arası sürmektedir. Diğer bir taraftan aynı simülasyon, paralel kod kullanılarak 2 ya da 2.5 arası sürmektedir. İkinci kod ile tam bir paralelizasyon sağlayamamıza rağmen, bu simülasyonların çalışma süresini grafik işleme üniteleri (GPU) ile daha düşürerek araştırmamızı ilerletmek için sonuçlar oldukça tatmin edicidir. Araştırma amaçlı bilimsel işlemlerde GPU’ların kullanımı, zaten son on yılda şahlanmıştı. Son birkaç yıl gerisine kadar çalıştırmak için çok kompleks olan işlemsel görevler artık GPU’ların gücüyle uygulanabilmektedir. nVIDIA teklonoji şirketi zaten bilimsel hesaplama için Tesla ismi altında grafik kartı serisi geliştirmiştir. Bu kartlar çok yüksek performansta çift hassasiyet hesaplama kabiliyetine sahip binlerce çekirdeğine (CUDA çekirdeği) sahiptir. GPU’ların kullanımı bilim adamlarını sadece komplex işlemsel görevleri yapabilmelerini mümkün kılmaz; ayrıca toplam maliyeti ciddi bir şekilde düşürür. Örnek olarak, bu tezin yazılma süresi sırasında 20 Çekirdek (40 izlek) CPU’nun fiyatı 3,200 Amerikan doları iken 2560 CUDA cekirdekli üst-seviye GPU 550 Amerikan doları civarındadır. GPU’ların yalızım mimarisinin CPU’dan daha farklı olduğununda önemli olduğunu düşünüyoruz. Biz bu tezde GPU üzerinde CUDA kullanarak kısmen parallel çalışan 4 farklı algoritma dizayni uyarladık. Bu tezin ilk kısmında ECC özelliği olmayan GPU’lardaki tekli bit hatalarını inceledik ve sonrasında ise HHG simülasyonları için Üçlü Modüler Artıklık (TMR) metodu üzerine kurulu bir ECC yapısı sunduk. Bu tezin ikinci kısımda ise, ECC özelliği olmayan GPU’lar üzerindeki kernel fonksiyonlarının uyarlamalarını seri çalışan CPU simülasyonu sonuçlarıyla karşılaştırdık. Bu karşılaştırmayı dipol, dipol ivme ve HHG spektrumu sonuçlarını kıyaslayarak yaptık. Bu tezin son kısmında ise farklı kernel fonksiyonu uyarlamaları için HHG simülasyonlarının dalga fonksiyonu ilerleme modüllerinin performanslarını öçtük. -------------------- High-order Harmonic Generation (HHG) is a nonlinear process that occurs as a result of an intense laser field interacting with a target (gas, plasma, solid). The associated coherent harmonics generated by this process can serve as a source for producing intense attosecond pulses of sub-femtosecond durations extending from the extreme ultraviolet to the soft X-ray region collectively called XUV-region. Prior to this thesis, we have developed two codes to simulate high-order harmonic generation resulting from a single electron Rydberg atom. The first code we developed was running only on 1 central processing unit (CPU) thread (serial version), whereas the second code we developed for this thesis was capable of running on multiple CPU threads in accordance with the symmetric multiprocessing (SMP) architecture by means of partially parallelizing the time propagator of the initial quantum state of the single electron Rydberg atom prior to its interaction with the laser field. It takes approximately 5 to 6 hours of computing time to a get a decent publication quality results using our serial code. On the other hand, the same simulation using the parallel code takes only 2 to 2.5 hours of computing time to get the same results. Although we do not achieve full parallelization with the second code, the results were quite satisfactory to advance our research into reducing the elapsed time for the underline simulations even further by utilizing graphical processing units (GPU). The use of GPUs in scientific computing for research purposes has already taken its leap within the last decade. Computational tasks that are too complex to execute a few years back are now in reach by harvesting the powers of GPUs. Technology company nVIDIA has already developed a series of graphics cards named Tesla just for scientific computing. These cards have thousands of cores (CUDA cores) within themselves, which are capable of handling double precision calculations at very high performance. The use of GPUs not only enables scientists to carry out complex computational tasks, but it also reduces the overall cost significantly. For example, as of the time of writing this thesis, the cost of 20-Core (40 threads) CPU is around 3,200 US$ whereas a mid-grade GPU with 2560 CUDA cores is around 550 US$. We think it is also important to point out that the software architecture of GPU is very different than that of CPU. In this thesis, we implemented the partially parallel algorithm on GPU in 4 different kernel function designs using CUDA. In the first part of this thesis, we investigated the single-bit error on GPUs with no ECC feature and then proposed an ECC scheme for HHG simulations based on Triple Modular Redundancy (TMR). In the second part, the result of each kernel function implementation on no ECC GPU is compared to the results of serial CPU simulations for dipole, dipole acceleration, and HHG spectrum. In the last part, we benchmarked the performance of the wave function propagator components of the HHG simulations for different kernel function implementations.