Apache Spark Overview
Introduction to Apache Spark
Introduction to Apache Spark
Last updated 06th March, 2020
Apache Spark is an open-source distributed, general-purpose cluster-computing framework that is much faster than the previous cluster computing framework, Hadoop MapReduce, thanks to features like in-memory processing and lazy evaluation.
Apache Spark is the leading platform for large-scale SQL, batch processing, stream processing and machine learning, with an easy-to-use API, and for coding in Spark, you have the options of using different programming languages, including Java, Scala, Python, R and SQL. It can be run locally in a single machine or in a cluster of computers to distribute its tasks.
Apache Spark stores data in RDD (Resilient Distributed Datasets), which is an immutable distributed collection of objects, and then divides it into different logical partitions, so it can process each part in parallel, in different nodes of the cluster. Task parallelism and in-memory computing are the key to being ultra-fast in Apache Spark.
Apache Spark creates a core process called driver program in driver node and several executors in worker nodes and then driver program will distribute the processing task among executors to be executed in parallel. The number of the workers can be scaled up and down to optimise the workload based on available resources and computation nodes.
Different components of Apache Spark are:
Please send us your questions, feedback and suggestions to improve the service:
Zachęcamy do przesyłania sugestii, które pomogą nam ulepszyć naszą dokumentację.
Obrazy, zawartość, struktura - podziel się swoim pomysłem, my dołożymy wszelkich starań, aby wprowadzić ulepszenia.
Zgłoszenie przesłane za pomocą tego formularza nie zostanie obsłużone. Skorzystaj z formularza "Utwórz zgłoszenie" .
Dziękujemy. Twoja opinia jest dla nas bardzo cenna.
Dostęp do OVHcloud Community Przesyłaj pytania, zdobywaj informacje, publikuj treści i kontaktuj się z innymi użytkownikami OVHcloud Community.
Porozmawiaj ze społecznością OVHcloud