Welcome to Premium Paper Help

premiumpaperhelp.com logo

Our Services

Get 15% Discount on your First Order

Check the attachments Please read the instructions and questions carefully in ” Assignment_5_2024_Fall.pdf” file and use “Auto.csv” to finish the as

Check the attachments

Please read the instructions and questions carefully in ” Assignment_5_2024_Fall.pdf” file and use “Auto.csv” to finish the assignment. You should submit both 1) an R code ; 2) A PDF report with answers through the link “Submit Assignment 5 Here”.

Guidelines:

· Use only R for this assignment

· Submit both R code and Report on findings

· Work is to be done individually for this assignment

1. In this problem, you will generate simulated data, and then perform K-means clustering on the data.

1.1 Generate a simulated data set with 30 observations in each of two classes (i.e. 60 observations in total), and 2 variables.

Code Hint: The first four lines of codes should be:

set.seed(2) x=matrix(rnorm(60*2), ncol=2) x[1:30,1]=x[1:30,1]+3

x[1:30,2]=x[1:30,2]-4

1.2 Perform K-means clustering of the observations with K = 2. Plot the data with each observation colored according to its cluster assignment (nstart=20). Take a screenshot of your plot. What is the total within-cluster sum of squares?

1.3 Perform K-means clustering with K = 3. Plot the data with each observation colored according to its cluster assignment (nstart=20). Take a screenshot of your plot. What is the total within-cluster sum of squares?

1.4 Now perform K-means clustering with K = 4. Plot the data with each observation colored according to its cluster assignment (nstart=20). Take a screenshot of your plot. What is the total within-cluster sum of squares?

1.5 Using the scale () function, perform K-means clustering with K = 2 on the data after scaling each variable to have standard deviation one. Take a screenshot of your plot. What is the total within-cluster sum of squares now? How do these results compare to those obtained in (2)?

1

2. Consider the USArrests data. We will now perform hierarchical clustering on the states. USArrests dataset is part of the base R package. You do not need to load any libraries.

2.1 Plot the hierarchical clustering dendrogram using complete linkage clustering with Euclidean distance as the dissimilarity measure. Take a screenshot of your plot.

2.2 Cut the dendrogram at a height that results in three distinct clusters. Which states belong to which clusters? You need to provide state names for each cluster (e.g. Cluster 1 has Alabama, Alaska,…).

2.3 Hierarchically cluster the states using complete linkage and Euclidean distance, after scaling the variables to have standard deviation one.

a) Take a screenshot of your plot.

b) What effect does scaling the variables have on the hierarchical clustering obtained?

c) In your opinion, should the variables be scaled before the inter-observation dissimilarities are computed? Provide a justification for your answer.

2.4 After scaling the variables to have standard deviation one, plot the hierarchical clustering dendrogram using average linkage clustering with Euclidean distance as the dissimilarity measure. Take a screenshot of your plot.

2.5 After scaling the variables to have standard deviation one, plot the hierarchical clustering dendrogram using single linkage clustering with Euclidean distance as the dissimilarity measure. Take a screenshot of your plot.

What to submit:

1.
R code.

a. Should include all the code to accomplish the tasks.

b. Clear and concise comments to indicate what part of the assignment each code chunk pertains to.

c. Code should be easily readable.

d. Filename should be in the format of: LastnameFirstname_A5.R

2.
Report.

a. Take screenshots of your outputs in R Studio and answer all the questions.

b. Submit in PDF format.

c. Answers questions clearly and concisely.

d. Includes appropriate plots. Make sure the plots are properly labeled.

e. The assignment will be graded on the correctness of the answers, comprehensiveness of the analysis, clarity of results’ presentation and neatness of the report.

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

DISCUSSION POLITICS AND THE PATIENT PROTECTION AND AFFORDABLE CARE ACT Regardless of political affiliation, individuals often grow concerned when considering

DISCUSSION POLITICS AND THE PATIENT PROTECTION AND AFFORDABLE CARE ACT Regardless of political affiliation, individuals often grow concerned when considering perceived competing interests of government and their impact on topics of interest to them. The realm of healthcare is no different. Some people feel that local, state, and federal policies

Question 1

Question 1 Net Profit Plants Capacity Configuration Plant Products 1 2 Products 1 2 Demand Demand: A: 1000; B: 1000 A 7 10 A 1000 Probability: 0.25 B 6 4 B 1000 Profit: Capacity 2000 2000 Demand A: 1000 Net Profit Plants Capacity Configuration Plant Demand: A: 1000; B: 3000

Overview The leadership of the Singaporean-headquartered software solutions organization is concerned about issues arising from communication and

Overview The leadership of the Singaporean-headquartered software solutions organization is concerned about issues arising from communication and coordination challenges between employees at the U.S. branch and the Singaporean headquarters. The VP of the U.S. branch tasks you, as an HR consultant, with developing a change management plan. You decide that