8

How to generate a sequence of numbers, which would have a specific correlation (for example 0.56) and would consist of.. say 50 numbers with R program? Ty.

2
  • 2
    You ask for a vector of numbers with a specific correlation. I presume correlation with itself then or did you mean correlation with another set of numbers? Hint: correlation involves 2 vectors of data! If you do mean autocorrelation, do you mean lag 1 correlation = 0.56? Commented Nov 8, 2012 at 14:52
  • 5
    If you do want two sets with a specified correlation between them, mvrnorm in the MASS package is a good place to start. Commented Nov 8, 2012 at 14:54

3 Answers 3

24

Assuming you mean two normal/Gaussian vectors of values with correlation 0.56

We can use mvrnorm() from package MASS

require(MASS)
out <- mvrnorm(50, mu = c(0,0), Sigma = matrix(c(1,0.56,0.56,1), ncol = 2),
               empirical = TRUE)

which gives

> cor(out)
     [,1] [,2]
[1,] 1.00 0.56
[2,] 0.56 1.00

The empirical = TRUE bit is important otherwise the actual correlation achieved is subject to randomness too and will not be exactly the stated value with larger discrepancies for smaller samples.

Assuming you mean a lag 1 correlation of 0.56 & Gaussian random variables

For this one you can use the arima.sim() function:

> arima.sim(list(ar = 0.56), n = 50)
Time Series:
Start = 1 
End = 50 
Frequency = 1 
 [1]  0.62125233 -0.04742303  0.57468608 -0.07201988 -1.91416757 -1.11827563
 [7]  0.15718249  0.63217365 -1.24635896 -0.22950855 -0.79918784  0.31892842
[13]  0.33335688 -1.24328177 -0.79056890  1.08443057  0.55553819  0.33460674
[19] -0.33037659 -0.65244221  0.70461755  0.61450122  0.53731454  0.19563672
[25]  1.73945110  1.27119241  0.82484460  1.58382861  1.81619212 -0.94462052
[31] -1.36024898 -0.30964390 -0.94963216 -3.75725819 -1.77342095 -1.20963799
[37] -1.76325350 -1.20556172 -0.94684678 -0.85407649  0.14922226 -0.31109945
[43]  0.39456259  0.89610859 -0.70913792 -2.27954408 -1.14722464  0.39140446
[49]  0.66376227  1.63275483
Sign up to request clarification or add additional context in comments.

1 Comment

Do you how to the the same but for a log-normal distribution with a no-semidefinite matrix?
9

If you don't want to specify those matrices, other options are corgen from ecodist:

library("ecodist")
xy <- corgen(len = 50, r = 0.56, epsilon = 0.01)

Or rolling your own:

simcor <- function (n, xmean, xsd, ymean, ysd, correlation) {
    x <- rnorm(n)
    y <- rnorm(n)
    z <- correlation * scale(x)[,1] + sqrt(1 - correlation^2) * 
             scale(resid(lm(y ~ x)))[,1]
    xresult <- xmean + xsd * scale(x)[,1]
    yresult <- ymean + ysd * z
    data.frame(x=xresult,y=yresult)
}

Test

> r <- simcor(n = 50, xmean = 12, ymean = 30, xsd = 3, ysd = 4, correlation = 0.56)
> cor(r$x,r$y)
[1] 0.56

Comments

7

Use rmvnorm from the mvtnorm package to sample from the multivariate normal distribution. For example for correlation of 0.56:

library("mvtnorm")
foo <- rmvnorm(10000,c(0,0),matrix(c(1,0.56,0.56,1),2,2))

Test:

> cor(foo[,1],foo[,2])
[1] 0.5611207

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.