Timing in R: Best Practices for Accurate Measurements

RStudioDataLab
7 min readSep 8, 2023

--

Do you want to learn how to make your R code faster and more efficient? Do you want to know how long it takes for your code to run and where the bottlenecks are? Do you want to impress your friends and teachers with data analysis skills?

If you answered yes to these questions, this tutorial is for you!

In this tutorial, you will learn how to measure the running time of your R code using different functions and packages. Measuring the running time of your code can help you identify and fix slow or inefficient parts of your code. It can also help you compare solutions or approaches to the same problem.

You will learn how to use five different methods to measure the running time of your code:

  • Sys.time: A simple way to measure the elapsed time between two points in your code.
  • system.time: A way to measure the user time, system time, and elapsed time of a single expression or function call.
  • tictoc: A package that allows you to create nested timers and log the results in a list or a file.
  • rbenchmark: A package that allows you to benchmark multiple expressions or functions and compare their results in a table or a plot.
  • microbenchmark: A package that allows you to measure the running time of concise expressions or functions with high precision.

By the end of this tutorial, you can measure your R code’s running time like a pro!

Ready to get started? Let’s go!

Interesting, get this code from Timing in R: Best Practices for Accurate Measurements

What You Need

To follow this tutorial, you will need:

  1. A computer with RStudio installed. You can download RStudio for free from here.
  2. Some sample data. We will use the mtcars dataset with R. This dataset contains information about 32 cars, such as miles per gallon, number of cylinders, horsepower, weight, and more. You can load it by typing data(mtcars) in the console window of RStudio.
  3. Some sample code. We will use simple code snippets that perform calculations or manipulations on the mtcars dataset. You can copy and paste them from this tutorial or write them yourself.

People Also Read:

How to Use Sys.time

The first method we will learn is Sys.time. This function returns the current date and time as an object of class POSIXct. This function can measure the elapsed time between two points in our code by subtracting the start and end times.

To use Sys.time, we need to follow these steps:

  1. Assign the current date and time to a variable before running our code. It will be our start time.
  2. Run our code.
  3. Assign the current date and time to another variable after running our code. It will be our end time.
  4. Subtract the start time from the end time to get the elapsed time.

For example, we want to measure how long it takes to calculate the mean miles per gallon for each number of cylinders in the mtcars dataset.

We can use Sys.time like this:

This code will produce an output like this:
A time difference of 20.42383 seconds.

As you can see, Sys.time has measured the elapsed time between our start and end points and returned it as a time difference object. You can also convert this object to a numeric value using as.numeric(elapsed_time).

Sys.time is a simple and easy way to measure the running time of your code, but it has some limitations. For example, it only measures the elapsed time, not the user or system time. It may also need to be more accurate and precise for very short or long code snippets. For these cases, we need to use other methods.

How to Use system.time

The second method we will learn is system.time. This function evaluates an expression or a function call and returns the user time, system time, and elapsed time as an object of class proc_time. The user time is the time spent by the CPU executing the user code; the system time is the time spent by the CPU executing the system calls, and the elapsed time is the difference in times since the function was called.

To use system.time, we need to follow these steps:

  • Wrap our code in an expression or a function call and pass it as an argument to system.time.
  • Assign the output of the system.time to a variable or print it to the console.

For example, we want to measure how long it takes to calculate the mean miles per gallon for each number of cylinders in the mtcars dataset using system.time.

We can use a system.time like this:

This code will produce an output like this:

As you can see, system.time has measured our code’s user, system, and elapsed time and returned them as a proc_time object. You can also convert this object to a numeric vector using as.numeric(system.time(…)).

system.time is a more comprehensive and accurate way to measure the running time of your code than Sys.time, but it still has some limitations.

For example, it only measures the running time of a single expression or function call, not multiple ones. It may also need to be more precise for concise code snippets or consistent for repeated runs. For these cases, we need to use other methods.

How to Use tictoc

The third method we will learn is tictoc. This package provides a simple way to create nested timers and log the results in a list or a file. A timer is a pair of tic and toc functions that start and stop the stopwatch.

The tic function can take a name argument to label the timer, and the toc function can take a log argument to print or save the results.

To use tictoc, we need to follow these steps:

  1. Install and load the tictoc package by typing install.packages(“tictoc”) and library(“tictoc”) in the console window of RStudio.
  2. Insert tic and toc functions before and after the code blocks we want to measure.
  3. Optionally, give names to our timers and log our results.

For example, we want to measure how long it takes to calculate the mean miles per gallon for each number of cylinders in the mtcars dataset using tictoc.

We can use tictoc like this:

This code will produce an output like this:

Mean MPG by CYL: 3.61 sec elapsed.

As you can see, tictoc has measured the elapsed time between our tic and toc functions and printed it with our timer name. You can also save the results in a list using tic.log() or in a file using tic.save().

How to Use benchmark

The first function we will learn is the benchmark. This function allows you to run multiple expressions or functions multiple times and collect the results in a data frame. The results include the user time, system time, elapsed time, and relative time for each expression or function.

To use a benchmark, we need to follow these steps:

  • Load the rbenchmark package by typing library(“rbenchmark”) in the console window of RStudio. If not installed, then use this function install.packages(“rbenchmark”).
  • Write the expressions or functions we want to benchmark and assign them to variables.
  • Pass our expressions or functions as arguments to benchmark, assign the output to a variable, or print it to the console.
  • Optionally, specify other arguments to benchmark, such as the number of repetitions, the number of observations, the columns to display, and the unit of time.

For example, let’s say we want to benchmark three different ways to calculate the mean miles per gallon for each number of cylinders in the mtcars dataset:

This code will produce an output like this:

  • The name of the test.
  • The number of replications.
  • The elapsed time is in milliseconds.
  • The relative time compared to the fastest test.

How to Use Order

The second function we will learn is order. This function allows you to sort the benchmark results by different criteria, such as elapsed time, user time, system time, or relative time.

To use order, we need to follow these steps:

  • Run a benchmark and assign the output to a variable.
  • Pass our variable as an argument to order and specify the column name we want to sort by.
  • Subset our variable by using square brackets and the output of order.

For example, we want to sort our previous results by elapsed time in ascending order. We can use an order like this:

# Run benchmark and assign the output to res res <- benchmark(expr1, expr2, expr3, replications = 100, columns = c("test", "replications", "elapsed", "relative"), unit = "ms") # Sort by elapsed time in ascending order res[order(res$elapsed), ]

This code will produce an output like this:

How to Interpret the Results

Now that you know how to use benchmark and order, you may wonder how to interpret the results and what they mean for your code performance.

Here are some tips and guidelines to help you interpret the results:

  • The relative time is the ratio of the elapsed time to the fastest test. This measures how much slower or faster a test is compared to the best one.

Based on these definitions, you can use the following rules of thumb to compare and improve your code performance:

Conclusion

Congratulations! You have learned how to measure and compare the running time of your R code using the different packages. You have also known how to interpret and improve your code performance.

Benchmarking is a powerful technique to help you find and fix slow or inefficient parts of your code. It can also help you choose the best solution or approach for your problem or task.

I hope you enjoyed this tutorial and found it helpful. If you have any questions or feedback, please contact me at info@data03.online or visit my website, “ Data Analysis”.

To learn more about data analysis and RStudio, check out my other tutorials and courses at data03.online/p/order-now.html. I have tutorials on topics such as data manipulation, data visualization, data modelling, and more.

Interesting, get this code from Timing in R: Best Practices for Accurate Measurements

Join Our Community Allow us to Assist You

Originally published at https://www.data03.online.

--

--

RStudioDataLab
RStudioDataLab

Written by RStudioDataLab

I am a doctoral scholar, certified data analyst, freelancer, and blogger, offering complimentary tutorials to enrich our scientific community's knowledge.

No responses yet