Create New Variables in R with dplyr

RStudioDataLab
6 min readDec 20, 2023

--

Read More and Download

Have you ever stared at a data frame, yearning to extract hidden truths but stumped by the lack of the perfect variable? What if I told you the answer lies not in brute force data collection but in the power of crafting just the right variables?

This article explains the magic of dplyr, the library that lets you shape your data like clay, molding it into forms that reveal its deepest secrets. We’ll explore how to conjure new variables from existing ones, bend calculations to your will, and ultimately ask the right questions about your data. So, are you ready to unlock the true potential of your data? Let's start with me and see what insights await!

In data analysis, R stands out as a powerful tool. In this article, we will cover the basics of data manipulation visualization.

Understanding the Code

Let’s start by breaking down the provided R code, step by step, to ensure a clear understanding.

Before We start, Make sure you read the following:

Setting Up the Data Frame

Creating the Data Frame

We initiate a data frame with 10 rows and 4 columns containing information about individuals.

##      name age gender score
## 1 Alice 25 F 85
## 2 Bob 32 M 76
## 3 Charlie 28 M 92
## 4 David 24 M 81
## 5 Eve 27 F 88

Exploring the Data

Utilizing the head function, we gain a quick overview of the first five rows of the data frame.

People Also Read

Adding Variables

Introducing Grades

We introduce a new variable, ‘grade,’ categorizing individuals based on their scores.

##       name age gender score grade
## 1 Alice 25 F 85 B
## 2 Bob 32 M 76 C
## 3 Charlie 28 M 92 A
## 4 David 24 M 81 B
## 5 Eve 27 F 88 B
## 6 Frank 29 M 79 C
## 7 Grace 31 F 94 A
## 8 Harry 26 M 83 B
## 9 Ivy 30 F 90 A
## 10 Jack 33 M 86 B

Age Modification

The ‘age’ variable undergoes a transformation, incrementing each individual’s age by one.

##       name age gender score
## 1 Alice 26 F 85
## 2 Bob 33 M 76
## 3 Charlie 29 M 92
## 4 David 25 M 81
## 5 Eve 28 F 88
## 6 Frank 30 M 79
## 7 Grace 32 F 94
## 8 Harry 27 M 83
## 9 Ivy 31 F 90
## 10 Jack 34 M 86

Introducing Height and Weight

Two new variables, ‘height’ and ‘weight,’ are created with random values within specified ranges.

##       name age gender score   height   weight
## 1 Alice 25 F 85 159.0612 72.20515
## 2 Bob 32 M 76 174.5540 63.71620
## 3 Charlie 28 M 92 198.6179 75.73870
## 4 David 24 M 81 156.6031 73.91578
## 5 Eve 27 F 88 167.1612 97.90217
## 6 Frank 29 M 79 193.2816 54.55027
## 7 Grace 31 F 94 177.4369 98.49600
## 8 Harry 26 M 83 187.4201 97.41685
## 9 Ivy 30 F 90 191.7643 54.76353
## 10 Jack 33 M 86 178.5300 85.79550

Popular Posts

Variable Deletion

The ‘score’ variable is removed from the data frame.

##       name age gender
## 1 Alice 25 F
## 2 Bob 32 M
## 3 Charlie 28 M
## 4 David 24 M
## 5 Eve 27 F
## 6 Frank 29 M
## 7 Grace 31 F
## 8 Harry 26 M
## 9 Ivy 30 F
## 10 Jack 33 M

Rounding Numeric Variables

Numeric variables in the data frame are rounded using the mutate_all function.

##    age score
## 1 25 85
## 2 32 76
## 3 28 92
## 4 24 81
## 5 27 88
## 6 29 79
## 7 31 94
## 8 26 83
## 9 30 90
## 10 33 86

Text Transformation

The ‘gender’ variable is converted to uppercase, enhancing uniformity.

##       name age gender score
## 1 Alice 25 F 85
## 2 Bob 32 M 76
## 3 Charlie 28 M 92
## 4 David 24 M 81
## 5 Eve 27 F 88
## 6 Frank 29 M 79
## 7 Grace 31 F 94
## 8 Harry 26 M 83
## 9 Ivy 30 F 90
## 10 Jack 33 M 86

Ranking and Pass/Fail Classification

Ranking Individuals

We introduce a new variable, ‘rank,’ by grouping the data by gender and assigning ranks based on scores within each group.

## # A tibble: 10 × 5
## name age gender score rank
## <chr> <dbl> <chr> <dbl> <dbl>
## 1 Alice 25 F 85 4
## 2 Bob 32 M 76 6
## 3 Charlie 28 M 92 1
## 4 David 24 M 81 4
## 5 Eve 27 F 88 3
## 6 Frank 29 M 79 5
## 7 Grace 31 F 94 1
## 8 Harry 26 M 83 3
## 9 Ivy 30 F 90 2
## 10 Jack 33 M 86 2

Pass/Fail Classification

A ‘pass’ variable is introduced, categorizing individuals as ‘Yes’ or ‘No’ based on their scores.

##       name age gender score pass
## 1 Alice 25 F 85 Yes
## 2 Bob 32 M 76 No
## 3 Charlie 28 M 92 Yes
## 4 David 24 M 81 Yes
## 5 Eve 27 F 88 Yes
## 6 Frank 29 M 79 No
## 7 Grace 31 F 94 Yes
## 8 Harry 26 M 83 Yes
## 9 Ivy 30 F 90 Yes
## 10 Jack 33 M 86 Yes

Body Mass Index (BMI) Calculation

BMI Calculation

We generate dummy data for weight and height, creating a new data frame ‘df1.’ The BMI is calculated and added as a new variable, ‘bmi.’

## # A tibble: 100 × 3
## weight height bmi
## <dbl> <dbl> <dbl>
## 1 64.4 174. 21.3
## 2 89.4 163. 33.5
## 3 70.4 170. 24.5
## 4 94.2 188. 26.6
## 5 97.0 169. 33.8
## 6 52.3 186. 15.2
## 7 76.4 187. 21.9
## 8 94.6 174. 31.1
## 9 77.6 166. 28.0
## 10 72.8 156. 30.0
## # ℹ 90 more rows

Log Transformation

A new variable, ‘log_score,’ is introduced by taking the logarithm of the ‘score’ variable.

##       name age gender score log_score
## 1 Alice 25 F 85 4.442651
## 2 Bob 32 M 76 4.330733
## 3 Charlie 28 M 92 4.521789
## 4 David 24 M 81 4.394449
## 5 Eve 27 F 88 4.477337
## 6 Frank 29 M 79 4.369448
## 7 Grace 31 F 94 4.543295
## 8 Harry 26 M 83 4.418841
## 9 Ivy 30 F 90 4.499810
## 10 Jack 33 M 86 4.454347

Unique Identifier

Combining the ‘name’ and ‘age’ variables creates an' id' variable.

##       name age gender score         id
## 1 Alice 25 F 85 Alice_25
## 2 Bob 32 M 76 Bob_32
## 3 Charlie 28 M 92 Charlie_28
## 4 David 24 M 81 David_24
## 5 Eve 27 F 88 Eve_27
## 6 Frank 29 M 79 Frank_29
## 7 Grace 31 F 94 Grace_31
## 8 Harry 26 M 83 Harry_26
## 9 Ivy 30 F 90 Ivy_30
## 10 Jack 33 M 86 Jack_33

Data Structure

We use the function to inspect the structure of the ‘df’ data frame.

## Rows: 10
## Columns: 4
## $ name <chr> "Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "H…
## $ age <dbl> 25, 32, 28, 24, 27, 29, 31, 26, 30, 33
## $ gender <chr> "F", "M", "M", "M", "F", "M", "F", "M", "F", "M"
## $ score <dbl> 85, 76, 92, 81, 88, 79, 94, 83, 90, 86

Please find us on Social Media and help us grow
Facebook: https://www.facebook.com/rstudioDataLab
Instagram: https://www.instagram.com/rstudioDataLab/
Twitter: https://www.twitter.com/rstudioDataLab/
Youtube: https://www.youtube.com/@rstudioDataLab?sub-confirmation=1
Tiktok: https://www.tiktok.com/@rstudioDataLab
Whatsapp Community: https://chat.whatsapp.com/IGjjvZVsGCiLn5jSmrFQaJ
Whatsapp Channel: https://whatsapp.com/channel/0029VaBzfy80G0XbCXhGGA16
Telegram Channel: https://t.me/rstudioDataLab
Medium: https://data03.medium.com/
Quora: https://www.quora.com/profile/Muhammad-Zubair-Ishaq
Google News: https://news.google.com/publications/CAAqBwgKMIaV0QswxbDoAw?hl=en-PK&gl=PK&ceid=PK%3Aen

Join our Community and Stay Ahead of Data Analysis Community
Seeking Professional Coding Assistance? Elevate Your Projects with Our Expertise!

--

--

RStudioDataLab
RStudioDataLab

Written by RStudioDataLab

I am a doctoral scholar, certified data analyst, freelancer, and blogger, offering complimentary tutorials to enrich our scientific community's knowledge.

No responses yet