PySpark StorageLevel

PySpark StorageLevel is used to decide how RDD should be stored in memory. It also determines the weather serialize RDD and weather to replicate RDD partitions. In Apache Spark, it is responsible for RDD should be saved in the memory or should it be stored over the disk, or in both. It contains commonly-used PySpark StorageLevels, static constant like MEMORY_ONLY.

The following code block consist the class definition of a StorageLevel-

class pyspark.StorageLevel(useDisk, useMemory, useOffHeap, deserialized, replication = 1)

Class Variables

There are different PySpark StorageLevels to decide the storage of RDD, such as:

DISK_ONLY: StorageLevel(True, False, False, False, 1)
DISK_ONLY_2: StorageLevel(True, False, False, False, 2)
MEMORY_AND_DISK: StorageLevel(True, True, False, False, 1)
MEMORY_AND_DISK_2: StorageLevel(True, True, False, False, 2)
MEMORY_AND_DISK_SER: StorageLevel(True, True, False, False, 1)
MEMORY_AND_DISK_SER_2: StorageLevel(True, True, False, False, 2)
MEMORY_ONLY: StorageLevel(False, True, False, False, 1)
MEMORY_ONLY_2: StorageLevel(False, True, False, False, 2)
MEMORY_ONLY_SER: StorageLevel(False, True, False, False, 1)
MEMORY_ONLY_SER_2: StorageLevel(False, True, False, False, 2)
OFF_HEAP: StorageLevel(True, True, True, False, 1)

Instance Method

Example of PySpark StorageLevel

Here we use the storage level Memory_And_Disk_2, which means RDD partition will have replication of 2.

from pyspark import SparkContext
import pyspark
sc = SparkContext (
  "local",
  "StorageLevel app"
)
rdd1 = sc.parallelize([1,2])
rdd1.persist( pyspark.StorageLevel.MEMORY_AND_DISK_2 )
rdd1.getStorageLevel()
print(rdd1.getStorageLevel())

Output:

Disk Memory Serialized 2x Replicated

Next TopicPySpark Profiler

← prev next →

For Videos Join Our Youtube Channel: Join Now

Feedback

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

Splunk

SPSS

Swagger

Transact-SQL

Tumblr

ReactJS

Regex

Reinforcement Learning

R Programming

RxJS

React Native

Python Design Patterns

Python Pillow

Python Turtle

Keras

Preparation

Aptitude

Reasoning

Verbal Ability

Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

AWS

Selenium

Cloud Computing

Hadoop

ReactJS

Data Science

Angular 7

Blockchain

Git

Machine Learning

DevOps

B.Tech / MCA

DBMS

Data Structures

DAA

Operating System

Computer Network

Compiler Design

Computer Organization

Discrete Mathematics

Ethical Hacking

Computer Graphics

Software Engineering

Web Technology

Cyber Security

Automata

C Programming

C++

Java

.Net

Python

Programs

Control System

Data Mining

Data Warehouse

Javatpoint Services

JavaTpoint offers too many high quality services. Mail us on [email protected], to get more information about given services.

Website Designing
Website Development
Java Development
PHP Development
WordPress
Graphic Designing
Logo
Digital Marketing
On Page and Off Page SEO
PPC
Content Development
Corporate Training
Classroom and Online Training
Data Entry

Training For College Campus

JavaTpoint offers college campus training on Core Java, Advance Java, .Net, Android, Hadoop, PHP, Web Technology and Python. Please mail your requirement at [email protected]
Duration: 1 week to 2 week

^{Like/Subscribe us for latest updates or newsletter}

PySpark Tutorial