VSA R Training (Fall 2022)
Week 3 Project Assignment
This lesson is called Week 3 Project Assignment, part of the VSA R Training (Fall 2022) course. This lesson is called Week 3 Project Assignment, part of the VSA R Training (Fall 2022) course.
If the video is not playing correctly, you can watch it in a new window
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
For this week's assignment we're asking you to read a dataset into your RMarkdown report and to do something with it â we don't mind what!
If you're using your own .csv file you'll need to add it to the Gist (shown in the video).
If you're using a TidyTuesday dataset please include the code you used (also shown in the video!)
Have any questions? Put them below and we will help you out!
You need to be signed-in to comment on this post. Login .
Course Content
141 Lessons
R Programming
Welcome back.
Programming Assignment 1 Review
- In-class Walkthrough 5
Dealing with Missing Data
- Specialness of NAs 7
- You Should Care About NA s 8
- Strategy 1: Total Eradication 9
- Strategy 2: Handle on Subsets 10
- Strategy 3: Imputation 11
Visualizing Data
- Base Plotting System 13-17
- Other Plotting Systems in R 18
- Graphics Devices 19
Combining and Transforming Data Frames
- Columnwise Combination 21-22
- Rowwise Combination 23-24
Final Project Discussion
- Project Proposal Guidelines 26
- Final Project Outline 27
Assignment 1 Discussion
IMHO, this assignment was the single hardest thing you'll be asked to do in this class.
Specialness of NAs
NA is a special object in R, used to capture the idea of "a value whose value is unknown". Confusing, right? They're an inevitability in real-world data.
PRO TIP : See ?NA for R's documentation on the nuances of NA
You Should Care About NAs
It's common for introductory programmers to think of missing values as problems to address, but that isn't always the case! NA can actually hold valuable information. For example, imagine that you get a dump of data from Salesforce or some other CRM system with information like customer_name, date_of_first_contact, and date_of_second_contact.
Depending on how the system was set up, date_of_second_contact1 may have dates only for customers who have been contacted at least twice, and be NA everywhere else. This is valuable information! If you want to build a model of 1-contact conversion, you could use the presence/absence of NA to help you identify the 1-contact customers that belong in your model.
Strategy 1: Total Eradication
The first approach you may take to dealing with NA values is to simply drop them from your data. If you don't think these missing data have any business value and your dataset is big enough that you can afford to drop some rows / columns, this is the right move for you.
Strategy 2: Handle on Subsets
You may find the "remove all the NAs everywhere" strategy a bit too aggressive for your use case. If you have a 100-variable dataset and a single variable (column) is 90\% NA values, do you really want to drop every row where that variable is NA? A better approach might be to selectively subset out columns where missing values are most severe before using complete.cases() to remove rows.
Strategy 3: Imputation
A final strategy, particularly useful in modeling contexts, is to use some imputation strategy to replace NA values with reasonable alternatives. One common approach (and my favorite), the roughfix method. It works like this:
- For numeric columns, replace NAs with the column median
- For categorical columns, replace NAs with the most common value
Base Plotting System
R's built-in plotting tools, called "the base plotting system", is one of its most popular features.
The essential idea of the base plotting system is to build up plots in layers. You first create a simple plot, then "add on" a legend, more variables, other plot types, etc.
See "The Base Plotting System" in the programming supplement.
Creating a Scatter Plot
Let's start with a simple scatter plot to answer the question are sepal length and sepal width related?
Histograms and Densities
Histograms and densities are useful for examining distributions.
Multi-variable line charts
You can add more than one variable to these plots!
Creating a Grid of plots
You can combine multiple plots in a grid layout. See "The Base Plotting System" for an example.
Other Plotting Systems in R
We don't have time in this short class to go into great depth on data visualization, but I want you to know that there are a bunch of cool visualization libraries a short install.packages() away!
- dygraphs : high-level library for creating interactive charts that can be embedded directly in HTML
- ggplot2 : One of the most popular packages in the R world. Based on the "grammar of graphics" approach to building plots
- googleVis : Send your data to the google charts API to make fancy interactive visualizations
- plotly : easy-to-use library for creating interactive data visualizations and dashboards
A Note On Graphics Devices
When R (or any other program!) creates plots, it needs to know where to put them! When you call plot() or other commands from within and RStudio session, the default is to just display the resulting figure in the "Plots" pane. However, you can use other graphics devices (places to put visual output) to tell R to put your figures elsewhere.
Columnwise Combination With cbind
In situations where you have multiple data frames with the same rows but different columns, you can combine them column-wise with R's cbind() command. Note that this command will only work if the two data frames to be joined have the same number of rows AND those rows refer to the same observation.
cbind = "column-wise bind"
Column Matching with merge
It's common in data science workflows to have two mismatched tables of data from different sources and to want to combine them by matching on one or more keys. Think JOIN in SQL or VLOOKUP in Excel. To perform this operation in R, you can use the merge() command.
Rowwise Combination With rbind
So far we've talked about merging columns from different tables. But what if you want to merge rows? For example, imagine that you are a researcher in a lab studying some natural phenomenon. You may take multiple samples (measuring the same variables) and then want to put them together into a single data frame to build a model. For this case, we can use R's rbind() function.
rbind = "row-wise bind"
Rowwise Combination of Many Tables with rbindlist
What if you have 5 tables? 10? 1000? Use data.table::rbindlist() .
Your Final Project Proposal is Due in Week 4
Choosing External Packages
- You need to choose one data retrieval package, one statistics package, and one visualization package from this list
- Change which packages you actually use in the final project
- Use a package that isn't on the list (as long as you clear it with me)
- Use more than just 3 external packages
What Your Proposal Should Cover
- What data set do you plan to use? Where can others find it? What variables does it contain?
- What is the question you're trying to answer?
- What packages do you plan to use?
Your Final Project is Due in Week 5
- let's go through the Final Project description
Additional Resources
Plotting in R : graphics devices
Paths: Relative vs absolute | listing files in a directory in R
R programming Assignment 3 week 4
Course 2 r programming, assignment 3 (week 4), under data science by johns hopkins university, 1 plot the 30-day mortality rates for heart attack, 3 ranking hospitals by outcome in a state, 4 ranking hospitals in all states.
Instantly share code, notes, and snippets.
mGalarnyk / assignment3.md
- Download ZIP
- Star ( 12 ) 12 You must be signed in to star a gist
- Fork ( 45 ) 45 You must be signed in to fork a gist
- Embed Embed this gist in your website.
- Share Copy sharable link for this gist.
- Clone via HTTPS Clone using the web URL.
- Learn more about clone URLs
- Save mGalarnyk/21695638e94965640c35667e8683642c to your computer and use it in GitHub Desktop.
R Programming Project 3
github repo for rest of specialization: Data Science Coursera
The zip file containing the data can be downloaded here: Assignment 3 Data
Part 1 Plot the 30-day mortality rates for heart attack ( outcome.R )
Part 2 Finding the best hospital in a state ( best.R )
Part 3 ranking hospitals by outcome in a state ( rankhospital.r ), part 4 ranking hospitals in all states ( rankall.r ).
vickkiee commented Jul 7, 2020
Thank you for your solution
Sorry, something went wrong.
ghost commented Jul 15, 2020 • edited by ghost Loading
The course isn't for the student's learning ability but for student's cheating ability. This course didn't teach me or explain me to find the solution to solve the assignment; in contrast, it encouraged me to google and copy other's code. I am feeling that I am in the wrong course. I am about to quit. I have no idea why JHU and instructor created irresponsible course like this.
sushilsushil commented Jul 18, 2020
Yo everyone feels the same way. I didn't like the constant saliva sound of that guy. Very poor course design too.
BouNaj commented Jul 25, 2020
I share the same feeling. I was frustrated the content does in no way prepare you to do anything of the sort the assignments require.
Chiagozie-Umeano commented Aug 4, 2020
I thought I was the only one having this difficulty in writing code and functions. Was feeling bad for choosing this course, we are not properly taught how to write these codes. I have being feeling to quit the course..
abduljan71 commented Aug 7, 2020
Hey, I am with it. I am doing the same. The course itself does not help you that much for you to understand and do your own coding. I feel very bad.
shaunkok64 commented Aug 16, 2020 • edited Loading
Thank you for your solution. I am glad that there are so many people feeling the same. As I don't have any knowledge regarding coding in general and I wanted to take this course to learn something new such as Data Science. This course could be great if designed properly but the course is not that learner friendly, in some cases even people with experience with coding may not able to cope it and this has led me thoughts to quit. This course are expecting learners to do their own coding without giving guidance to learners. WHY! There are so little guidance given to readers until I had to Google Search for answers. I need to take sometime to study it after I have passed this.
Gareeeb7 commented Aug 27, 2020
Utha re ba ba uthle Coursera ko
Thanks for your Solution
Fahad-98 commented Aug 30, 2020
How do I submit my assignment.
SaneClouxz commented Sep 14, 2020
How do I submit my assignment. It's a multi choice quiz which you'll solve using the functions provided in this solution.
nfabregas commented Sep 25, 2020
Same here. The problem is simple: the course don't give you the tools and knowledge that someone with no experience with code needs to do the Assignments. It looks like assignments are from another course.
100% agree with D-Se:
Coursera lecture: "The drivetrain of a car looks like this" Swirl: "A car can move hurr durrr" Assignment: "build a car and test it, or else fail the course"
anikvh commented Sep 25, 2020
I agree with the comments above, there is a huge gap between the lectures and the assingments. Thank you so much for sharing this!
Songsuperjing commented Sep 27, 2020
love the above comments. so true.
Arnab-eco commented Oct 8, 2020 • edited Loading
I totally agree with the comments. Unfortunately, The instructor thinks that everyone is a master level programmer. His presentations lack the necessary insight. His assignments are way over the mark. I understand that if someone can complete these assignments, they can learn a lot. But frankly, it's like asking someone to write a novel after teaching them how to read. The swirl exercises do teach the basics, but is absolutely hopeless as it doesn't even allow us to make the smallest changes to a code.
benthecoder commented Oct 18, 2020 • edited Loading
I'm glad many are learning this specialization even today, I was hoping if anyone can direct me to any discord servers or like reddit groups to talk about the assignments, for discussions and to learn from each other. Thanks! And yea the assignments are crazy hard, but to be fair the coursera courses can't fit everything into their videos, so i believe it's encouraged to find ways to create a solution for yourself with the abundance of resources online.
is this part wrong? return(dt[order(state, get(outcome), hospital name ) , head(.SD, num) , by = state , .SDcols = c("hospital name")])
I can't seem to get the right answers with this part. Anyone has a revision of this part of the code?
siyangni commented Nov 26, 2020 • edited Loading
This course is just designed to make me feel bad. I was in the honor's college while I was a senior, now I am getting my master in sociology. Throughout my academic career so far I've never Googled anyone else's assignment. And this course makes me do this for every assignment!!!!!! By the way, I have a decent knowledge of programming where I gain from learning Python. I thought this class would be easy for me (after quickly going through the lecture videos), yet from the first assignment I began to scratch my head for an answer.
Guess what? the following courses for this specialization are just no better. At first sight I thought the instructors may have problems with their pedagogy. After going through several courses of JHU's Data Science Specialization, I highly doubt it's not just pedagogy, it is their attitude. There is no way the three instructors who should be incredibly smart people cannot find the embarrassingly obvious large gap between the course material and the assignments/quizzes in every one of their courses. And they rush through every course in this specialization. I paid for those courses though! I really want to file a complaint on Coursera.
amingraphy commented Nov 28, 2020
I agree the course it useless. Instead I started learning by reading a time-consuming textbook, called "discovering statistics using R". I even think the material provided by Roger Peng are not all in the same basic level. He doesn't teach basic tools of R, but then he jumps to using multi-core computation on your computer to speed up the calculation! It is just ridiculous
DanEscasa commented Dec 30, 2020 • edited Loading
Thanks for the hard work. I chose not to shorten the unwieldingly long column names, just used switch :
outcome <- switch(outcome, "heart attack" = "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack", "heart failure" = "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure", "pneumonia" = "Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia", NULL) if (is.null(outcome)) { stop(" : invalid outcome") }
<rant> Isn't there a way to preserve the indentation in a code block? </rant>
As to Roger Peng's teaching, I was looking for a functional programming-oriented approach. He still treats it like a procedural language. I should write up something on that in codementor.io
charlenelch13 commented Jan 3, 2021
Thank you so much for the sharing!! It is sooo helpful! I had zero knowledge about programming before taking this course. I feel frustrated about learning the logics from the course materials. They are so vague and not supported by daily examples. I don't know how they can assume students to know how to finish the assignments... Thank you for guiding me!!
EmilieWaite18 commented Mar 23, 2021
I have never been so frustrated by anything. I have had to look up every single answer to these assignments.
anaidcandido commented Jun 1, 2021
Many thanks for sharing!! I also thought I was the only one struggling with the course but I'm "glad" to see is the course itself. Now I am able to analyze and compare what I was doing. Thanks a lot!
yeho-bt commented Jun 13, 2021
Hmmmm, I am not alone feeling like this.
jdpm93 commented Oct 5, 2021
I have the same problem, the gap between the lessons, swirl, and the assigments is horrible and frustrating. some excercises require the use of things never covered in the videos or swirl, I see that many have sent feedback but they are not taking action on this, also the video lessons need to be updated it's 2021 and the videos were recorded in 2015. this is insane. Thank you for sharing.
codobene commented Dec 16, 2021
truth is, the world is not meant for everyone to be kept alive - it just needs the upper 10% of people that manage this course well. Same is true for jobs in general, stock market, etc etc get it quickly, or work overtime to compensate for your idiocy, or become social darvinism's fish food.
i blindly copy code that I find here, fail to reproduce anything, and take drugs. good night. good luck!
Juanvelz commented Mar 24, 2022
These people should learn how to teach before offering a course. On the other hand, they know very well how to discourage students.
emakello commented Mar 27, 2022
The essence of taking an online course is to learn skills that can help you in your career or academics. While the courses are stimulating, it makes no sense to bring assignments that discourage rather than encourage learners. Some people like me had never coded before and I spend hours trying to do this thing without success. I wish there was someone who could teach coding from scratch without assuming any prior knowledge.
cfsobral commented Jan 22, 2023
Thank you for help. I believe that the course already expects us to look for solutions like this one from mGalarnyk, otherwise it would not make sense to pass on an assignment of this complexity for beginners to do, because many like me would get frustrated and give up the course, as I read in some comments made here. Thank you mGalarnyk
Doc-OmSa commented Apr 9, 2023
Thanks mGalarnyk. I have a feeling that this course is definitely not for beginners, unlikely for intermediates as well. The assignments are too advanced. I have been struggling to understand functions. I am a beginner having taken the Google Data Analutics course previously. This is too advanced. Would likely leave. Just wanted to ask if there are better courses more suitable for someone who is a beginner and going to intermediate level? An other platform?
- Programming Assignment 3 - R Programming
- by Wagner Pinheiro
- Last updated almost 8 years ago
- Hide Comments (–) Share Hide Toolbars
Twitter Facebook Google+
IMAGES
VIDEO
COMMENTS
R-Programming-Assignment-Week-3. Introduction. This second programming assignment will require you to write an R function is able to cache potentially time-consuming computations. For example, taking the mean of a numeric vector is typically a fast operation. However, for a very long vector, it may take too long to compute the mean, especially ...
R Pubs. by RStudio. Sign in Register. R Programming - Week 3 Assignment. by Ken Wood. Last updated almost 4 years ago. Comments (-) Share.
This video contains the code for programming assignment-3 and the step by step instructions for submission of assignment which includes forking, cloning repo...
Original file line number Diff line number Diff line change @@ -1,15 +1,36 @@ ## Put comments here that give an overall description of what your ## functions do ## These functions written in partial fulfillment of Coursera Data Science: R Programming ## Week 3 Assignment; week beginning January 18, 2016; GitHub user: PamlaM ## Write a short comment describing this function
In order to complete this assignment, you must do the following: Fork the GitHub repository containing the stub R files to create a copy under your own account. Clone your forked GitHub repository to your computer so that you can edit the files locally on your own machine. Edit the R file contained in the git repository and place your solution ...
The R script, FUN.R contains three stand-alone functions which download and read data before arrange it in a specific order defined by the assignment instructions. Please note that if you are doing this assignment as part of your Coursera course you are asked to save each function in a stand-alone R file.
This lesson is called Week 3 Project Assignment, part of the VSA R Training (Fall 2022) course. This lesson is called Week 3 Project Assignment, part of the VSA R Training (Fall 2022) course. Get Lifetime Access Free If the video is not playing correctly, you can watch it in a new window . ...
Programming assignment, week 3 (Coursera)
Week 3 programming assignment_b.r This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessar...
Programming Assignment 3: Quiz >> R Programming 1. What result is returned by the following code? best ("SC", "heart attack") CAROLINAS HOSPITAL SYSTEMMCLEOD MEDICAL CENTER - DARLINGTONMUSC MEDICAL CENTERMARY BLACK MEMORIAL HOSPITALLAKE CITY COMMUNITY HOSPITALGRAND STRAND REG MED CENTER 2. What result is returned by the following code? best ...
R's built-in plotting tools, called "the base plotting system", is one of its most popular features. The essential idea of the base plotting system is to build up plots in layers. You first create a simple plot, then "add on" a legend, more variables, other plot types, etc. See "The Base Plotting System" in the programming supplement.
Star 2 2. Fork 26 26. Programming assignment 3 for Coursera "R Programming" course by Johns Hopkins University. Raw. best.R. best <- function (state, outcome) {. ## Read outcome data. ## Check that state and outcome are valid. ## Return hospital name in that state with lowest 30-day death.
Edit the R file contained in the git repository and place your solution in that file (please do not rename the file). Commit your completed R file into YOUR git repository and push your git branch to the GitHub repository under your account. Submit to Coursera the URL to your GitHub repository that contains the completed R code for the assignment.
There will be an object called 'iris' in your workspace. In this dataset, what is the mean of 'Sepal.Length' for the species virginica? (Please only enter the numeric result and nothing else.)
R programming Assignment 3 week 4 Fan Ouyang 8/21/2017. Course 2 R Programming, Assignment 3 (Week 4), under Data Science by Johns Hopkins University. 1 Plot the 30-day mortality rates for heart attack
RPubs - R Programming Week 3 Programming Assignments 2: Lexical Scoping. R Pubs. by RStudio. Sign in Register. R Programming Week 3 Programming Assignments 2: Lexical Scoping. by Louis Stanley. Last updated almost 2 years ago.
R Pubs. by RStudio. Sign in Register. Ryan Tillis - R Programming - Data Science - Quiz 3 - Coursera. by Ryan Tillis. Last updated almost 8 years ago. Comments (-)
The course isn't for the student's learning ability but for student's cheating ability. This course didn't teach me or explain me to find the solution to solve the assignment; in contrast, it encouraged me to google and copy other's code.
R-Programming Project 3. Contribute to nishantsbi/R-Programming-Assignment-3 development by creating an account on GitHub.
R Pubs. by RStudio. Sign in Register. [Programming Assignment 3] R Programming. by Anderson Hitoshi Uyekita. Last updated over 2 years ago.
RPubs - Programming Assignment 3 - R Programming. R Pubs. by RStudio. Sign in Register. Programming Assignment 3 - R Programming. by Wagner Pinheiro. Last updated over 7 years ago.