Monte Carlo Simulations Using R

Introduction

I knew of Monte Carlo simulations in my work with planning cellular networks using radio planning software. These software tools are specialised systems used to aid cellular network design, site selection and capacity modelling. The latter is often implemented using Monte Carlo simulations to model the behaviour of users, their geographic locations within the radio cell and their services. Simplistically speaking, when using these tools you generally tick a few boxes, enter some values and hit ‘run’. Whilst I have used a number of radio planning systems, my knowledge of how these capacity models actually worked below the surface was effectively zero. I could tell someone what a Monte Carlo simulation was used for but I had no idea how it worked, so I thought it might be interesting to try some examples using R.

Definition

Monte Carlo methods cover a broad range of applications and refer to the use of algorithms employing random samples to derive a numerical solution for deterministic problems (i.e. those in which randomness plays no role). These methods can be used to derive the outcome of any problem that has uncertainty in terms of its input variables. By the law of large numbers the average of the results of repeated simulations should converge on the expected value as the number of iterations is increased.

I read a number of definitions online and in several textbooks and frankly I wasn’t much the wiser on the topic. It wasn’t until I played around with a few examples that it made a little more sense. The example provided on the main wikipedia entry seemed like a fairly interesting application, so I tried my hand at reproducing it using R.

Calculating the Value of Pi

In this example, a Monte Carlo simulation is used to calculate the value of pi, using simple geometry and randomly generated points. Consider a unit square (i.e. a square with an area equal to one) containing a quadrant of a circle with radius equal to 1 and an area equal to \(\pi/4\). It follows that the ratio of their areas is thus \(\pi/4\).

There are several ways of plotting a circle using R and I chose to do it using some familiar equations and plotly. First I created a sequence for the angles in the quadrant, and then calculated the values of \(x\) and \(y\).

# Define angle from 0 to 90 degrees 
theta <- seq(0, 90, 0.1)

# Convert to radians
theta <- theta * pi/180

# Circle parameters 
r <- 1
x <- r * cos(theta)
y <- r * sin(theta)

# Create data frame
circ <- data.frame(x, y)

Once I had created the data frame, the plot was simple enough. The circle is drawn using a scatter plot and the square is added in using the shapes() function with appropriate boundary values.

# Plot
plot_ly(circ,
        x = ~x,
        y = ~y,
        type= 'scatter',
        mode = 'markers',
        width = 600, 
        height = 600) %>%
layout(showlegend = FALSE, 
       shapes = list(type = "rect",
                     line = list(color = "blue"),
                     opacity = 0.3,
                     x0 = 0, x1 = 1, xref = "x",
                     y0 = 0, y1 = 1, yref = "y")) %>%
config(displayModeBar = F) # Hide the mode bar
Figure 1: Unit Square with Quadrant (r = 1)


If points are uniformly plotted within the unit square, then the proportion of the points inside the quadrant compared to the total number of points allows one to approximate the value of \(\pi\), by observing that:

\[\frac{\Sigma\; points_{inside}}{\Sigma\; points_{total}} \approx \frac \pi4 \] Distributing the points randomly within the unit square is where the Monte Carlo method comes into play. Points are allocated an \((x,y)\) coordinate with \(0 < x \leq 1\) and \(0 < y \leq 1\). In R, this can be done using the runif() function. Points inside the quadrant are coloured blue and those outside are coloured red. The approximate value of \(\pi\) can then be determined for the number of samples \(n\).

# Number of samples
n = 1000

# x coordinates
ptx <- runif(n)
# y coordinates
pty <- runif(n)

dfpt <- data.frame(ptx, pty)
  
# Add in radius
dfpt$ptr <- sqrt(dfpt$ptx^2 + dfpt$pty^2)
  
# Add in color of point
dfpt$col <- ifelse(dfpt$ptr < 1, "blue", "red")

# Calculate PI
pt_in <- length(which(dfpt$col == "blue"))
pt_out <- length(which(dfpt$col == "red"))
dfpt$pi <- pt_in/(pt_out+pt_in)*4

# Plotly
plot_ly(dfpt, # circle
        x = ~x,
        y = ~y,
        type= 'scatter',
        mode = 'markers',
        width = 600, 
        height = 600) %>%

layout(showlegend = FALSE,
       shapes = list(type = "rect", # square
                     line = list(color = "blue"),
                     opacity = 0.3,
                     x0 = 0, x1 = 1, xref = "x",
                     y0 = 0, y1 = 1, yref = "y")) %>%
  
add_markers(x = ~dfpt$ptx, # data points
            y = ~dfpt$pty,
            marker = list(color = ~dfpt$col))  %>%

config(displayModeBar = F)
Figure 2: n = 1000, \(\pi\) = 3.124


Even with only 1000 points, the approximation of pi is correct to two significant digits. Increasing the number of samples \(n\) generally improves the accuracy of the estimation, but in some cases the individual results may be worse. The real value in this method is to run the simulation multiple times and then average the results.

# Number of samples
n = 1000

# Number of simulations
s = 100

# Initialise results object
pi_result <- c()

for (i in 1:s){
  
  # x coordinates
  ptx <- runif(n)
  # y coordinates
  pty <- runif(n)
  
  # Coerce
  dfpt <- data.frame(ptx, pty)
  
  # Add in radius
  dfpt$ptr <- sqrt(dfpt$ptx^2 + dfpt$pty^2)
  
  # Add in color of point
  dfpt$col <- ifelse(dfpt$ptr < 1, "blue", "red")

  # Calculate PI
  pt_in <- length(which(dfpt$col == "blue"))
  pt_out <- length(which(dfpt$col == "red"))
  pi_calc <- pt_in/(pt_out+pt_in)*4
  pi_tmp <- data.frame(i, pi_calc)
  pi_result <- rbind(pi_result, pi_tmp)
}

Running the simulation 100 times and plotting the result for each simulation shows how the estimates can vary considerably. I tabulated the results for the first 10 simulations to demonstrate this effect.

Simulation Approximation of Pi
1 3.156
2 3.136
3 3.116
4 3.192
5 3.116
6 3.108
7 3.132
8 3.152
9 3.100
10 3.232
Table 1: First ten simulations, n = 1000


# Mean result
pi_result$pi_mean <- mean(pi_result$pi_calc)
# Pi value to 6 sig. digits
pi_result$pi_actual <- 3.14159

# Legend
l <- list(bordercolor = "#D3D3D3",
      borderwidth = 2,
      xanchor = 'center',
      orientation = 'h',
      y = 1.1,
      x = 0.5) 

# Plot
plot_ly(pi_result,
        x = ~i,
        y = ~pi_calc,
        name = 'Estimate',
        type = 'scatter',
        mode = 'markers') %>%
add_lines(pi_result, y = ~pi_actual, name = 'Pi') %>% 
add_lines(pi_result, y = ~pi_mean, name = 'Mean Estimate') %>%
config(displayModeBar = F) %>%
layout(yaxis = list(title = "Pi"),
       xaxis = list(title = "Simulation Number"),
       legend = l)

Figure 3: Estimations of Pi (n = 1000, 100 Simulations)


Plotting the rolling mean of the estimations shows how the approximation converges towards the value of pi as the simulation count increases.

Figure 4: Rolling Mean of Estimations of Pi (n = 1000, 100 Simulations)


Finally, I set \(n\) = 10,000 and \(s\) = 10,000 to see how this improved the result. The final approximation was 3.141523, which is accurate to 5 significant digits.


Figure 5: Rolling Mean of Estimations of Pi (n = 10000, 10000 Simulations)


Conclusion

This post has provided a simple example of how to perform and visualise a Monte Carlo simulation using R. Monte Carlo simulations are used in many domains in engineering and science and I hope to revisit this with some other applications.

Related