Využití Spark pro zpracování dat na HPC infrastruktuře

Abstract

Diploma thesis describes technologies an Apache Hadoop and a Spark. In first part it explains technologies and implementation selected algorithms. The second part is devode design graphic client for launching implemented algorithms on HPC as a Service. The main goal was compare different implementation algorithms with use Hadoop and Spark onto range of dataset on HPC infrastructure in technology center IT4Innovations.

Description

Subject(s)

HPC, Hadoop, Spark, Machine learning, Paralelism measure

Citation