Exploring data using graphics and visualization

 

In this you will be using the churn data:  churn_data.txt

Don't use plagiarized sources. Get Your Custom Essay on
Exploring data using graphics and visualization
From as low as $5/Page
Order Essay

Read data into a data frame using the function read.csv() with the following options:

header=T, stringsAsFactors=F

Assume that you saved the file churn_data.txt in C:/Datasets folder. Then you can read file into a data frame as follows:

file=”C:/Datasets/churn_data.txt”
churnData=read.csv(file, stringsAsFactors = FALSE,header = TRUE)

A) Print the name of the columns.

Hint: colnames() function.

B) Print the number of rows and columns

Hint: dim()

C)  Count the number calls per state.

Hint: table() function.

D) Find mean, median,standard deviation, and variance of nightly charges, the column Night.Charge in the data.
The R functions to be used are mean(), median(), sd(), var().

E) Find maximum and minimum values of international charges (Intl.Charge), customer service calls (CustServ.Calls), and daily charges(Day.Charge).

F) Use summary() function to print information about the distribution of the following features:

"Eve.Charge"     "Night.Mins"     "Night.Calls"    "Night.Charge"   "Intl.Mins"      "Intl.Calls"

What are the min and max values printed by the summary() function for these features?

Check textbook page 34 for a sample.

G) Use unique() function to print the distinct values of the following columns:

State, Area.Code, and Churn.

H)  Extract the subset of  data for the churned customers(i.e., Churn=True). How many rows are in the subset?
Hint: Use subset() function. Check lecture notes and textbook for samples.

I)  Extract the subset of data for customers that made more than 3 customer service calls(CustServ.Calls). How many rows are in the subset?

J) Extract the subset of churned customers with no international plan (Int.l,Plan) and no voice mail plan (VMail.Plan). How many rows are in the subset?

K) Extract the data for customers from California (i.e., State is CA)  who did not churn but made more than 2 customer service calls.

L) What is the mean of customer service calls for the customers that did not churn (i.e., Churn=False)?

question2 related to above 

 

In this ,we will explore the churn data using graphics and visualization. One of the primary reasons for performing exploratory data analysis (EDA) is to investigate the variables, examine the distributions of the categorical variables, look at the histograms of the numeric variables, and explore the relationships among sets of variables.

Although we are not going to develop any models for this project, in a real-world project our task is to identify patterns in the data that will help to reduce the proportion of churners.

We will use the same data set we had in Week 2 assignment:

Data file: churn_data.txt

All graphics in this assignment have to be plotted using ggplot2 library. So, you need to install ggplot2 library for graphs:

install.packages(“ggplot2”)

Before using any methods from the libraries, you need to load these libraries into the R code using

library(ggplot2)

Here is how you can read data into a data frame named churnData:

churnData <- read.csv(filePath, stringsAsFactors = FALSE,header = TRUE)

where filePath is the location of the churn_data.txt file. For example, if you saved file in C:/tmp, then you should use C:/tmp/churn_data.txt

The variables in the file churn_data.txt are

State: Categorical, for the 50 states and the District of Columbia.
Account length: Integer-valued, how long account has been active.
Area code: Categorical
Phone number: Essentially a surrogate for customer ID.
International plan: Dichotomous categorical, yes or no.
Voice mail plan: Dichotomous categorical, yes or no.
Number of voice mail messages: Integer-valued.
Total day minutes: Continuous, minutes customer used service during the day.
Total day calls: Integer-valued.
Total day charge: Continuous, perhaps based on above two variables.
Total eve minutes: Continuous, minutes customer used service during the evening.
Total eve calls: Integer-valued.
Total eve charge: Continuous, perhaps based on above two variables.
Total night minutes: Continuous, minutes customer used service during the night.
Total night calls: Integer-valued.
Total night charge: Continuous, perhaps based on above two variables.
Total international minutes: Continuous, minutes customer used service to make
international calls.
Total international calls: Integer-valued.
Total international charge: Continuous, perhaps based on above two variables.
Number of calls to customer service: Integer-valued.
Churn: Target. Indicator of whether the customer has left the company (true or false).

Part 1. Bar Charts

A bar chart is a histogram for discrete data: it records the frequency of every value of a categorical variable.

1.) Vertical Bar Charts

Plot the bar charts of State, Area.Code, Int.l.Plan, VMail.Plan, CustServ.Calls, and Churn.

Use the theme() function to change the text size, location, color, etc.. (An example is given in the textbook on page 61)

The following is the bar chart for State. As an example, the x-axis labels are bold, and rotated 90 degrees which can be set in the theme() function using 

axis.text.x = element_text(face=”bold”,angle=90,vjust=0.5, size=11).

Similarly, the parameter colour=”#990000″ is used for the color of the x-axis title. So, the following options for axis.title.x and axis.text.x  in theme() function display the title and text of x-axis as shown in the figure below:

axis.title.x = element_text(face=”bold”, colour=”#990000″, size=12), axis.text.x = element_text(face=”bold”,angle=90,vjust=0.5, size=11)

stat_barchart

2.) Horizontal Bar Charts

Create the horizontal bar chart of CustServ.Calls.

Hint: Textbook page 49.

fall2019_int_call_horiz_bar

3.) Horizontal Bar Charts with Sorted Categories

Create horizontal bar chart where the number of calls are sorted for CustServ.Calls.

Hint: Textbook pages 50-51

fall2019_int_call_sorted_horiz_bar

Part 2: Histograms and Density Plots

The histogram and the density plot are two visualizations that help you quickly examine the distribution of a numerical variable.

A basic histogram bins a variable into fixed-width buckets and returns the number of data points that falls into each bucket. You can think of a density plot as a “continuous histogram” of a variable, except the area under the density plot is equal to 1.

1.) Plot the histograms of Account.Length, VMail.Message, Day.Mins, Intl.Calls, and VMail.Message.

Based on the histograms, comment on whether any of them have outliers, close to the Normal Distribution, multi-modal, or skewed.

The histogram for Account.Length is shown below:

acc_length_bar

2.) Plot the density plots of Account.Length, VMail.Message, Day.Mins, Intl.Calls, and VMail.Message.

Based on the density plots, comment on whether any of them have outliers, close to the Normal Distribution, multi-modal, or skewed.

As a sample, the density plot for VMail.Message is shown below:

density_plot1

Part 3. Scatter Plots

In addition to examining variables in isolation, you’ll often want to look at the relationship between two variables.

Part A)

Plot the scatter plots for pairs Eve.Mins – Day.Mins, Day.Mins-Day.Charge, Eve.Mins-Eve.Charge, Day.Mins-Day.Calls.

Based on the plots, are there any relationships between the pair of features plotted?

The scatter plot of Eve.Mins vs Day.Mins is given below:

scatter_plot2

Part B)

For the scatter plots in part A, add color  to display churn and no-churn data points. Simply add  aes(color=Churn) to the geom_point() function as shown below:

geom_point(aes(color=Churn))

scatter_plot1

Part 4. Box Plots

A box-and-whiskers plot describes the distribution of a continuous variable by  plotting its five-number summary: the minimum, lower quartile (25th  percentile), median (50th percentile), upper quartile (75th percentile), and maximum.

Plot the box plots of CustServ.Calls, Night.Calls, and Intl.Charge by Churn.

Which of the features have outliers? (can you spot them in the box plot?)

What is the median of Night.Calls for customers that did not churn? (from the box plot)

The following is the box plot of CustServ.Calls.

box_plot2

Hint:You can find detailed information and samples of box plot at 

https://ggplot2.tidyverse.org/reference/geom_boxplot.html

Part 5. Dodged and Stacked Bar Charts

A) Display a dodged bar chart of Int.l.Plan by Churn.

Hint: Textbook pages 60-61.

dodge_chart1

B) Display a stacked bar chart of CustServ.Calls and Churn.

stack_chart1

Place your order
(550 words)

Approximate price: $22

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more
Confirm Eligibility