How to generate a sequence of numbers, which would have a specific correlation (for example 0.56) and would consist of.. say 50 numbers with R program? Ty.
3 Answers
Assuming you mean two normal/Gaussian vectors of values with correlation 0.56
We can use mvrnorm() from package MASS
require(MASS)
out <- mvrnorm(50, mu = c(0,0), Sigma = matrix(c(1,0.56,0.56,1), ncol = 2),
empirical = TRUE)
which gives
> cor(out)
[,1] [,2]
[1,] 1.00 0.56
[2,] 0.56 1.00
The empirical = TRUE bit is important otherwise the actual correlation achieved is subject to randomness too and will not be exactly the stated value with larger discrepancies for smaller samples.
Assuming you mean a lag 1 correlation of 0.56 & Gaussian random variables
For this one you can use the arima.sim() function:
> arima.sim(list(ar = 0.56), n = 50)
Time Series:
Start = 1
End = 50
Frequency = 1
[1] 0.62125233 -0.04742303 0.57468608 -0.07201988 -1.91416757 -1.11827563
[7] 0.15718249 0.63217365 -1.24635896 -0.22950855 -0.79918784 0.31892842
[13] 0.33335688 -1.24328177 -0.79056890 1.08443057 0.55553819 0.33460674
[19] -0.33037659 -0.65244221 0.70461755 0.61450122 0.53731454 0.19563672
[25] 1.73945110 1.27119241 0.82484460 1.58382861 1.81619212 -0.94462052
[31] -1.36024898 -0.30964390 -0.94963216 -3.75725819 -1.77342095 -1.20963799
[37] -1.76325350 -1.20556172 -0.94684678 -0.85407649 0.14922226 -0.31109945
[43] 0.39456259 0.89610859 -0.70913792 -2.27954408 -1.14722464 0.39140446
[49] 0.66376227 1.63275483
1 Comment
sparkle
Do you how to the the same but for a log-normal distribution with a no-semidefinite matrix?
If you don't want to specify those matrices, other options are corgen from ecodist:
library("ecodist")
xy <- corgen(len = 50, r = 0.56, epsilon = 0.01)
Or rolling your own:
simcor <- function (n, xmean, xsd, ymean, ysd, correlation) {
x <- rnorm(n)
y <- rnorm(n)
z <- correlation * scale(x)[,1] + sqrt(1 - correlation^2) *
scale(resid(lm(y ~ x)))[,1]
xresult <- xmean + xsd * scale(x)[,1]
yresult <- ymean + ysd * z
data.frame(x=xresult,y=yresult)
}
Test
> r <- simcor(n = 50, xmean = 12, ymean = 30, xsd = 3, ysd = 4, correlation = 0.56)
> cor(r$x,r$y)
[1] 0.56
mvrnormin theMASSpackage is a good place to start.