Create New Variables in R with dplyr
Read More and Download
- Create New Variables in R with dplyr
- Join our Community and Stay Ahead of Data Analysis Community
- Seeking Professional Coding Assistance? Elevate Your Projects with Our Expertise!
Have you ever stared at a data frame, yearning to extract hidden truths but stumped by the lack of the perfect variable? What if I told you the answer lies not in brute force data collection but in the power of crafting just the right variables?
This article explains the magic of dplyr, the library that lets you shape your data like clay, molding it into forms that reveal its deepest secrets. We’ll explore how to conjure new variables from existing ones, bend calculations to your will, and ultimately ask the right questions about your data. So, are you ready to unlock the true potential of your data? Let's start with me and see what insights await!
In data analysis, R stands out as a powerful tool. In this article, we will cover the basics of data manipulation visualization.
Understanding the Code
Let’s start by breaking down the provided R code, step by step, to ensure a clear understanding.
Before We start, Make sure you read the following:
- Comprehensive Guide: How to install RStudio
- How to Import and Install Packages in R: A Comprehensive Guide
- No interest in Coding? Try our Shiny Apps
Setting Up the Data Frame
Creating the Data Frame
We initiate a data frame with 10 rows and 4 columns containing information about individuals.
## name age gender score
## 1 Alice 25 F 85
## 2 Bob 32 M 76
## 3 Charlie 28 M 92
## 4 David 24 M 81
## 5 Eve 27 F 88
Exploring the Data
Utilizing the head
function, we gain a quick overview of the first five rows of the data frame.
People Also Read
- Exploratory Data Analysis for International Journals -PhD Insight
- Secrets of R Contingency Tables Revealed: A PhD’s Experience
- Case Study: Unraveling Russia’s War Efforts in Ukraine
Adding Variables
Introducing Grades
We introduce a new variable, ‘grade,’ categorizing individuals based on their scores.
## name age gender score grade
## 1 Alice 25 F 85 B
## 2 Bob 32 M 76 C
## 3 Charlie 28 M 92 A
## 4 David 24 M 81 B
## 5 Eve 27 F 88 B
## 6 Frank 29 M 79 C
## 7 Grace 31 F 94 A
## 8 Harry 26 M 83 B
## 9 Ivy 30 F 90 A
## 10 Jack 33 M 86 B
Age Modification
The ‘age’ variable undergoes a transformation, incrementing each individual’s age by one.
## name age gender score
## 1 Alice 26 F 85
## 2 Bob 33 M 76
## 3 Charlie 29 M 92
## 4 David 25 M 81
## 5 Eve 28 F 88
## 6 Frank 30 M 79
## 7 Grace 32 F 94
## 8 Harry 27 M 83
## 9 Ivy 31 F 90
## 10 Jack 34 M 86
Introducing Height and Weight
Two new variables, ‘height’ and ‘weight,’ are created with random values within specified ranges.
## name age gender score height weight
## 1 Alice 25 F 85 159.0612 72.20515
## 2 Bob 32 M 76 174.5540 63.71620
## 3 Charlie 28 M 92 198.6179 75.73870
## 4 David 24 M 81 156.6031 73.91578
## 5 Eve 27 F 88 167.1612 97.90217
## 6 Frank 29 M 79 193.2816 54.55027
## 7 Grace 31 F 94 177.4369 98.49600
## 8 Harry 26 M 83 187.4201 97.41685
## 9 Ivy 30 F 90 191.7643 54.76353
## 10 Jack 33 M 86 178.5300 85.79550
Popular Posts
- How I Master R’s Techniques to Generate, Aggregate, Count, Attach, Change, Format, and Combine Data Sets
- How to Use dplyr in R: A Tutorial on Data Manipulation with Examples
- How to Analyze Data in R: A Beginner’s Guide
- Data Manipulation: Guide to the dplyr Cheat Sheet
- Data Wrangling with dplyr [Update:2023]
Variable Deletion
The ‘score’ variable is removed from the data frame.
## name age gender
## 1 Alice 25 F
## 2 Bob 32 M
## 3 Charlie 28 M
## 4 David 24 M
## 5 Eve 27 F
## 6 Frank 29 M
## 7 Grace 31 F
## 8 Harry 26 M
## 9 Ivy 30 F
## 10 Jack 33 M
Rounding Numeric Variables
Numeric variables in the data frame are rounded using the mutate_all
function.
## age score
## 1 25 85
## 2 32 76
## 3 28 92
## 4 24 81
## 5 27 88
## 6 29 79
## 7 31 94
## 8 26 83
## 9 30 90
## 10 33 86
Text Transformation
The ‘gender’ variable is converted to uppercase, enhancing uniformity.
## name age gender score
## 1 Alice 25 F 85
## 2 Bob 32 M 76
## 3 Charlie 28 M 92
## 4 David 24 M 81
## 5 Eve 27 F 88
## 6 Frank 29 M 79
## 7 Grace 31 F 94
## 8 Harry 26 M 83
## 9 Ivy 30 F 90
## 10 Jack 33 M 86
Ranking and Pass/Fail Classification
Ranking Individuals
We introduce a new variable, ‘rank,’ by grouping the data by gender and assigning ranks based on scores within each group.
## # A tibble: 10 × 5
## name age gender score rank
## <chr> <dbl> <chr> <dbl> <dbl>
## 1 Alice 25 F 85 4
## 2 Bob 32 M 76 6
## 3 Charlie 28 M 92 1
## 4 David 24 M 81 4
## 5 Eve 27 F 88 3
## 6 Frank 29 M 79 5
## 7 Grace 31 F 94 1
## 8 Harry 26 M 83 3
## 9 Ivy 30 F 90 2
## 10 Jack 33 M 86 2
Pass/Fail Classification
A ‘pass’ variable is introduced, categorizing individuals as ‘Yes’ or ‘No’ based on their scores.
## name age gender score pass
## 1 Alice 25 F 85 Yes
## 2 Bob 32 M 76 No
## 3 Charlie 28 M 92 Yes
## 4 David 24 M 81 Yes
## 5 Eve 27 F 88 Yes
## 6 Frank 29 M 79 No
## 7 Grace 31 F 94 Yes
## 8 Harry 26 M 83 Yes
## 9 Ivy 30 F 90 Yes
## 10 Jack 33 M 86 Yes
Body Mass Index (BMI) Calculation
BMI Calculation
We generate dummy data for weight and height, creating a new data frame ‘df1.’ The BMI is calculated and added as a new variable, ‘bmi.’
## # A tibble: 100 × 3
## weight height bmi
## <dbl> <dbl> <dbl>
## 1 64.4 174. 21.3
## 2 89.4 163. 33.5
## 3 70.4 170. 24.5
## 4 94.2 188. 26.6
## 5 97.0 169. 33.8
## 6 52.3 186. 15.2
## 7 76.4 187. 21.9
## 8 94.6 174. 31.1
## 9 77.6 166. 28.0
## 10 72.8 156. 30.0
## # ℹ 90 more rows
Log Transformation
A new variable, ‘log_score,’ is introduced by taking the logarithm of the ‘score’ variable.
## name age gender score log_score
## 1 Alice 25 F 85 4.442651
## 2 Bob 32 M 76 4.330733
## 3 Charlie 28 M 92 4.521789
## 4 David 24 M 81 4.394449
## 5 Eve 27 F 88 4.477337
## 6 Frank 29 M 79 4.369448
## 7 Grace 31 F 94 4.543295
## 8 Harry 26 M 83 4.418841
## 9 Ivy 30 F 90 4.499810
## 10 Jack 33 M 86 4.454347
Unique Identifier
Combining the ‘name’ and ‘age’ variables creates an' id' variable.
## name age gender score id
## 1 Alice 25 F 85 Alice_25
## 2 Bob 32 M 76 Bob_32
## 3 Charlie 28 M 92 Charlie_28
## 4 David 24 M 81 David_24
## 5 Eve 27 F 88 Eve_27
## 6 Frank 29 M 79 Frank_29
## 7 Grace 31 F 94 Grace_31
## 8 Harry 26 M 83 Harry_26
## 9 Ivy 30 F 90 Ivy_30
## 10 Jack 33 M 86 Jack_33
Data Structure
We use the function to inspect the structure of the ‘df’ data frame.
## Rows: 10
## Columns: 4
## $ name <chr> "Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "H…
## $ age <dbl> 25, 32, 28, 24, 27, 29, 31, 26, 30, 33
## $ gender <chr> "F", "M", "M", "M", "F", "M", "F", "M", "F", "M"
## $ score <dbl> 85, 76, 92, 81, 88, 79, 94, 83, 90, 86
Please find us on Social Media and help us grow
Facebook: https://www.facebook.com/rstudioDataLab
Instagram: https://www.instagram.com/rstudioDataLab/
Twitter: https://www.twitter.com/rstudioDataLab/
Youtube: https://www.youtube.com/@rstudioDataLab?sub-confirmation=1
Tiktok: https://www.tiktok.com/@rstudioDataLab
Whatsapp Community: https://chat.whatsapp.com/IGjjvZVsGCiLn5jSmrFQaJ
Whatsapp Channel: https://whatsapp.com/channel/0029VaBzfy80G0XbCXhGGA16
Telegram Channel: https://t.me/rstudioDataLab
Medium: https://data03.medium.com/
Quora: https://www.quora.com/profile/Muhammad-Zubair-Ishaq
Google News: https://news.google.com/publications/CAAqBwgKMIaV0QswxbDoAw?hl=en-PK&gl=PK&ceid=PK%3Aen
Join our Community and Stay Ahead of Data Analysis Community
Seeking Professional Coding Assistance? Elevate Your Projects with Our Expertise!