Visualizing Data – Page 2 – Scripts & Statistics

How to plot animated maps with gganimate

Intro

Leipzig is the largest city in the federal state of Saxony, Germany. Since about 2009, Leipzig is one of the fastest growing German cities. Between 2006 and 2016, the population has grown from about 506.000 to 580.000 inhabitants.The largest part of the population growth can be attributed to inward migration.

In this blog post, I show how to create an animated map of Leipzig visualizing the migration balances for each of Leipzig's 63 districts (“Ortsteile”).

The data

The data I will visualize in this blog post are provided by the local Statistical Bureau. They can be downloaded as .csv file from the following website. The shapefile I use, is not publicly available and can be purchased for about 25 Euro from the Statistical Bureau.

R packages

The following code will install load and / or install the R packages required for this blog post.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(readr, rgdal, ggplot2, reshape2)
pacman::p_load_current_gh("dgrtwo/gganimate")

While the packages readr and rgdal are required for importing the .csv and the shapefile, ggplot2 and gganimate are needed to plot and animate the data. The reshape2 package is required for transforming data frames from wide to long format.

After importing the csv file containing the migration data, we add a key variable (id) to identify each of Leipzig's 63 districts. Using the melt-function of the reshape2 package, we reformat the data frame from wide into long form.

# import csv file
df.migrat <- read_csv2("./Data_LE/Bevölkerungsbewegung_Wanderungen_Wanderungssaldo.csv",
                      col_types = cols(`2015` = col_number()),
                      n_max = 63,
                      locale = locale(encoding = "latin1"),
                      trim_ws = TRUE)
# add id variable
df.migrat$id <- c(1:63)
# drop variables not needed
df.migrat <- df.migrat[,c(1,18,2:17)]
# change name of first variable
colnames(df.migrat)[1] <- 'district'
# convert from wide to long
df.migrat <- reshape2::melt(df.migrat, id.vars=c("district","id"))
# change name of third variable
colnames(df.migrat)[3] <- 'year'

With the next code snippet, we import the shapefile using the readOGR-function of the rgdal package. The object we receive (sh.file) is of class SpatialPolygonsDataFrame. After adding a key variable, we transform this object into an ordinary data frame using the fortify-function of the ggplot2 package. Furthermore, we join both data frames by the key variable id. Before we plot the data, we create a new variable (co) categorizing the migration balance variable (value).

# import shapefile
sh.file <- readOGR("ot.shp", layer="ot")
# add id variable
sh.file$id <- c(1:63)
# Create a data frame
sh.file <- fortify(sh.file, region = 'id')
# Merge data frames by id variable
mydata <- merge(sh.file, df.migrat, by='id')
# create a categorized variable
mydata$co <- ifelse(mydata$value >= 1000, 4,
                   ifelse(mydata$value >= 0, 3,
                                ifelse(mydata$value >= -1000, 2, 1)))
# define levels and labels
mydata$co  <- factor(mydata$co, levels = c(4:1),
                     labels = c('> 1000', '0-1000', '-1000-0', '< -1000'))

Moreover, we define a theme suitable for plotting maps with ggpot2. The following theme is a customized version of a theme I found on Stack Overflow.

theme_opts <- list(theme(panel.grid.minor = element_blank(),
                         panel.grid.major = element_blank(),
                         panel.background = element_blank(),
                         plot.background = element_blank(),
                         panel.border = element_blank(),
                         axis.line = element_blank(),
                         axis.text.x = element_blank(),
                         axis.text.y = element_blank(),
                         axis.ticks = element_blank(),
                         axis.title.x = element_blank(),
                         axis.title.y = element_blank(),
                         legend.position="right",
                         plot.title = element_text(size=16)))

Plotting the data

Finally, we can plot the data. The first plot (static) visualizes the migration data for the whole period of time (2000–2015).

ggplot(mydata, aes(x = long, y = lat, group = group, fill=value)) +
  theme_opts + 
  coord_equal() +
  geom_polygon(color = 'grey') +
  labs(title = "Migration balance between 2000 and 2015",
       subtitle = "Static Map",
       caption = 'Data source: Stadt Leipzig, Amt für Statistik und Wahlen, 2015') +
  scale_fill_distiller(name='Migration\nbalance', direction = 1, palette='RdYlGn')

plot of chunk map

For creating an animated plot, we add the frame argument stemming from the gganimate package. With interval = 5 we define the time (in seconds) between the “slides”.

p <- ggplot(mydata, aes(x = long, y = lat, group = group, fill=value, frame=year)) +
  theme_opts + 
  coord_equal() +
  geom_polygon(color = 'grey') +
  labs(title = "Migration balance in year: ",
       subtitle = "Animated Map",
       caption = 'Data source: Stadt Leipzig, Amt für Statistik und Wahlen, 2015') +
  scale_fill_distiller(name='Migration\nbalance', direction = 1, palette='RdYlGn')

gg_animate(p, interval = 5)

How to add a background image to ggplot2 graphs

When producing so called infographics, it is rather common to use images rather than a mere grid as background. In this blog post, I will show how to use a background image with ggplot2.

Packages required

The following code will install load and / or install the R packages required for this blog post.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(jpeg, png, ggplot2, grid, neuropsychology)

Choosing the data

The data set I will be using in this blog post is named diamonds and part of the ggplot2 package. It contains information about – surprise, surprise – diamonds, e.g. price and cut (Fair, Good, Very Good, Premium, Ideal). Using the tapply-function, we create a table returning the maximum prices per cut. Since we need the data to be organized in a data frame, we must transform the table using the data.frame-function.

mydata <- data.frame(price = tapply(diamonds$price, diamonds$cut, max))
mydata$cut <- rownames(mydata)

cut	price
Fair	18574
Good	18788
Very Good	18818
Premium	18823
Ideal	18806

Importing the background image

The file format of the background image we will be using in this blog post is JPG. Since the image imitates a blackboard, we name it “blackboard.jpg”. The image file must be imported using the readJPEG-function of the jpeg package. The imported image will be saved into an object named image.

imgage <- jpeg::readJPEG("blackboard.jpg")

To import other image file formats, different packages and functions must be used. The next code snippet shows how to import PNG images.

image <- png::readPNG("blackboard.png")

Drawing the plot

In the next step, we actually draw a bar chart with a backgriund image. To make blackboard.jpg the background image, we need to combine the annotation_custom-function of the ggplot2 package and the rasterGrob-function of the grid package.

ggplot(mydata, aes(cut, price, fill = -price)) +
  ggtitle("Bar chart with background image") +
  scale_fill_continuous(guide = FALSE) +
  annotation_custom(rasterGrob(imgage, 
                               width = unit(1,"npc"), 
                               height = unit(1,"npc")), 
                               -Inf, Inf, -Inf, Inf) +
  geom_bar(stat="identity", position = "dodge", width = .75, colour = 'white') +
  scale_y_continuous('Price in $', limits = c(0, max(mydata$price) + max(mydata$price) / 4)) +
  scale_x_discrete('Cut') +
  geom_text(aes(label = round(price), ymax = 0), size = 7, fontface = 2, 
            colour = 'white', hjust = 0.5, vjust = -1)

plot of chunk plot1

Adding opacity

Using the specification alpha = 0.5, we add 50% opacity to the bars. alpha ranges between 0 and 1, with higher values indicating greater opacity.

ggplot(mydata, aes(cut, price, fill = -price)) +
  theme_neuropsychology() +
  ggtitle("Bar chart with background image") +
  scale_fill_continuous(guide = FALSE) +
  annotation_custom(rasterGrob(imgage, 
                               width = unit(1,"npc"), 
                               height = unit(1,"npc")), 
                               -Inf, Inf, -Inf, Inf) +
  geom_bar(stat="identity", position = "dodge", width = .75, colour = 'white', alpha = 0.5) +
  scale_y_continuous('Price in $', limits = c(0, max(mydata$price) + max(mydata$price) / 4)) +
  scale_x_discrete('Cut') +
  geom_text(aes(label = round(price), ymax = 0), size = 7, fontface = 2, 
            colour = 'white', hjust = 0.5, vjust = -1)

plot of chunk plot2

The recently published R package neuropsychology contains a theme named theme_neuropsychology(). This theme may be used to get bigger axis titles as well as bigger axis and legend text.

How to combine box and jitter plots using R and ggplot2

R makes it easy to combine different kinds of plots into one overall graph. This may be useful to visualize both basic measures of central tendency (median, quartiles etc.) and the distribution of a certain variable. Moreover, so called cut-off values can be added to the graph.

In this blog post, I show how to combine box and jitter plots using the ggplot2 package.

First of all, we need to install and load the R packages required for the following steps. Since we want to do the installation and loading using the pacman package, we need to check whether this package has been installed already. If not, it will be installed and loaded. If yes, it will just be loaded (line 1). Furthermore we need the R packages ggplot2 and Hmisc. This time, the p_load function checks whether these packages have been installed already and either installs and loads or just loads them (line 2).

if (!require("pacman")) install.packages("pacman")
pacman::p_load(ggplot2, Hmisc)

In a second step, we create three random variables (var.scale, var.group, var.cutoff) with n=300.

var.scale is a numeric variable with a mean value of about 50 and a standard deviation of about 17.
var.group is a factor variable comprising the groups male dnd female.
var.cutoff was calculated based on var.scale using predefined cut-off values (0 – 40 == low, 41 –60 = medium, >60 == high).

var.scale <- round(rnorm(300, 50, 17))
var.group <- rbinom(300, 1, .5)
var.group <- factor(var.group, 
                     levels = c(0:1), 
                     labels = c("male", "female"))

var.cutoff <- ifelse(var.scale <= 40, 1, 
                     ifelse(var.scale > 40 & var.scale <= 60, 2, 3))

var.cutoff <- factor(var.cutoff, 
                     levels = c(3:1), 
                     labels = c("high", "medium", "low"))

The describe() function of the Hmisc package returns some basic measures of central tendency.

Hmisc::describe(var.scale)

## var.scale 
##       n missing  unique    Info    Mean     .05     .10     .25     .50 
##     300       0      71       1   51.25   24.00   30.90   41.00   50.00 
##     .75     .90     .95 
##   63.25   70.00   76.00 
## 
## lowest :   8  10  14  16  17, highest:  85  97 100 102 104

Hmisc::describe(var.group)

## var.group 
##       n missing  unique 
##     300       0       2 
## 
## male (141, 47%), female (159, 53%)

Hmisc::describe(var.cutoff)

## var.cutoff 
##       n missing  unique 
##     300       0       3 
## 
## high (87, 29%), medium (141, 47%), low (72, 24%)

Since the ggplot2 package requires the variables to be in a data frame, we have to create a new data frame df comprising our predefined variables using the data.frame() function.

df <- data.frame(var.scale, var.cutoff, var.group)

Using the functions xlab(), ylab() and ggtitle(), axis labels and plot title will be defined.

Box plots will be created using the geom_boxplot() function, with width specifying the boxes' width :-).

Jitter plots will be created using the geom_jitter() function. In addition, specifications have been made for colour and position and size of the dots.

ggplot(df) +
  xlab("Group") +
  ylab("Scale") +
  ggtitle("Combination of Box and Jitter Plot") + 
  geom_boxplot(aes(var.group, var.scale), 
               width=0.5) + 
  geom_jitter(aes(var.group, var.scale, colour = var.cutoff), 
              position = position_jitter(width = .15, height=-0.7),
              size=2) +
  scale_y_continuous(limits=c(0, 101), 
                     breaks = seq(0, 110, 10)) +
  scale_color_manual(name="Legend", 
                     values=c("red", "blue3", "green3"))

plot of chunk plot

Finally, we are going to format both Y-axis and legend using the functions scale_y_continuous() and scale_color_manual().

Auswertung von LEGIDA-Polizeiberichten. Teil II: Worthäufigkeiten

In Teil II meiner Serie über Pressemitteilungen, die von der Polizeidirektion Leipzig anlässlich der Demonstrationen der fremdenfeindlichen LEGIDA-Bewegung veröffentlicht wurden, zeige ich heute, welche Worte in diesen Berichten am häufigsten verwendet werden.

## Warning in readChar(con, 5L, useBytes = TRUE): kann komprimierte Datei 'C:/
## ProgrammeNK/GDrive/Projects/R/Polizeiberichte/Legida.RData' nicht öffnen.
## Grund evtl. 'Datei oder Verzeichnis nicht gefunden'

## Error in readChar(con, 5L, useBytes = TRUE): kann Verbindung nicht öffnen

Die Auszählung von Worthäfigkeiten bezieht sich dabei (fast) nur auf bedeutungstragende Worte. Sogenannte Stopwords wurden von der Analyse ausgeschlossen. Die verwendete Stopwords-Liste findet sich unter dem folgenden Link zum Download.

library(ggplot2)

plt.words <- ggplot(df.words, aes(interval, freq, fill = 500 - freq)) +
  geom_bar(stat="identity", position="dodge", width = 0.75) + 
  scale_size_area() +
  scale_y_continuous('', limits=c(0, max(df.words$freq)+10), breaks = seq(0, max(df.words$freq)+10, by = 20)) +
  scale_x_discrete('') +
  theme(legend.position="none") + 
  coord_flip() +
  ggtitle("HÃ¤ufigste Wortnennungen") +
  geom_text(aes(label = paste0(percent, '%'), ymax = 0), size = 3, fontface=2, 
            hjust = -0.5, vjust = 0.2)

plt.words

plot of chunk bar

Mit dem wordcloud-Paket lässt sich das Ganze auch als Wordcloud darstellen.

colfunc <- colorRampPalette(c("blue", "red"))

set.seed(4)
par(mar = c(0, 0, 0, 0))
wordcloud::wordcloud(txt.wc, 
                     scale=c(3,.3),
                     min.freq=3,
                     max.words=150,
                     random.order=FALSE,
                     colors = colfunc(200))

plot of chunk wc

Schaut man sich die Auszählung der Worthäufigkeiten an, so ist erkennbar, dass es in den polizeilichen Pressemitteilungen häufig um eine zeitliche und r?umliche Einordnung des Geschehens geht. So werden zum einen sehr oft Uhrzeiten berichtet (vgl.). Zum anderen lässt sich erkennen, dass vor allem der Richard-Wagner-Platz, der Augustusplatz sowie der Leipziger Hauptbahnhof zentrale Örtlichkeiten der LEGIDA-Demonstartionen sind.

LEGIDA rallies: A heat map of day times using ggplot2

LEGIDA is a Leipzig based offshoot of the rigth-wing and xenophobic PEGIDA movement.

Since January 2015, LEGIDA has held at least one rally per month against what they call the “Islamisation of the Western world”.

Usually, the Leipzig Police Department publishes online reports describing what happened at these rallies.

In my following blog posts I will show, what kind of information can be derived from these reports and how these information can be visualized. My blog posts will have a technical rather than a political character.

Today, I'm going to show how to find information about the time of the day the rallays took place and how to visualize these specifications of time using a pie chart.

In the first code chunk, we will simply load the concatenated police report as a character vector named txt. The text of the police reports is rather unstructured. However, specifications of time are constantly made in the format: two digits followed by one colon followed by two digits, e.g. '19:00' for 7 p.m. Thus, all specifications of time can be extracted with a simple Regular Expression. Since we are not particulary interested in exact to the minute specifications, we save the speifications of hours as a numeric vector.

time.str <- unlist(regmatches(txt, gregexpr("\\d{2}\\:\\d{2}", txt)))
time.str

[1] “19:00” “17:00” “21:00” “15:00” “18:00” “17:00” “19:00” “18:45”
[9] “18:30” “20:20” “17:44” “21:45” “18:00” “17:30” “19:00” “20:20”
[17] “21:30” “19:00” “20:00” “21:15” “21:30” “22:30” “19:15” “19:45”
[25] “21:00” “18:15” “16:15” “19:30” “20:15” “21:30” “22:00” “16:45”
[33] “19:00” “17:20” “18:00” “19:00” “20:15” “19:00” “21:15” “22:45”
[41] “19:10” “19:40” “21:00” “22:00” “19:30” “21:30” “23:00” “21:45”
[49] “17:30” “19:00” “21:00” “19:00” “20:00” “21:15” “21:45” “17:15”
[57] “17:00” “18:00” “19:10” “21:00” “17:00” “19:15” “20:40” “20:15”
[65] “21:45” “18:00” “22:00” “18:30” “18:00” “18:30” “18:00” “18:45”
[73] “19:50” “20:00” “20:20” “21:15” “21:00” “18:45” “19:00” “20:40”
[81] “20:00” “20:45” “21:15” “19:50” “20:40” “21:00” “20:00” “20:45”
[89] “21:00” “19:00” “20:00” “20:50” “21:45” “21:00” “21:30” “19:00”
[97] “20:00” “20:45” “21:20” “21:30” “19:00” “19:30” “20:20” “21:00”
[105] “21:35” “18:00” “19:00” “18:00” “20:50” “19:00” “20:00” “20:50”
[113] “20:50” “19:00” “19:50” “20:35” “20:55” “17:35” “18:35” “19:00”
[121] “20:45” “21:00” “18:00” “18:45” “19:00” “19:35” “20:20” “20:30”
[129] “20:40” “22:00” “18:00” “19:20” “02:30” “19:05” “21:10” “17:40”
[137] “18:45” “20:00” “21:20” “18:30” “19:20” “20:00” “21:45” “19:00”
[145] “19:30” “20:45” “21:45” “23:00”

time.str <- as.numeric(stringr::str_sub(time.str, 1, 2))

[1] 19 17 21 15 18 17 19 18 18 20 17 21 18 17 19 20 21 19 20 21 21 22 19
[24] 19 21 18 16 19 20 21 22 16 19 17 18 19 20 19 21 22 19 19 21 22 19 21
[47] 23 21 17 19 21 19 20 21 21 17 17 18 19 21 17 19 20 20 21 18 22 18 18
[70] 18 18 18 19 20 20 21 21 18 19 20 20 20 21 19 20 21 20 20 21 19 20 20
[93] 21 21 21 19 20 20 21 21 19 19 20 21 21 18 19 18 20 19 20 20 20 19 19
[116] 20 20 17 18 19 20 21 18 18 19 19 20 20 20 22 18 19 2 19 21 17 18 20
[139] 21 18 19 20 21 19 19 20 21 23

Since we are only interested in the time span between 12 p.m and 12 a.m., we transform our numeric vector time.str into a vector of class factor containing only day times of the specified span. Afterwards, we save this vector as a table.

time.str <- factor(time.str, levels = c(13:24))
time.tab <- table(time.str)

In the next step, we create a table containing the proportions for each hour and save these specifications into a new vector named time.vecp.

time.tabp <- round(prop.table(table(time.str)), 2)
time.vecp <- as.numeric(as.character((time.tabp)))*100

[1] 0 0 1 1 7 15 24 23 22 4 1 0

Finally, we want to visualize our results. Since the form of a clock can be very good reproduced with a pie chart, we first create a dataframe with twelf segments of the same size (time). To this dataframe, we add two more variables: our proportional time vector (value) and the labels for visualizing the clock (labs).

df <- data.frame(time = rep(1,12),
                 value = time.vecp,
                 labs <- c(1:12))

The pie chart is plotted using ggplot2. The result is kind of a heat map visualizing the day times the LEGIDA rallies usually take place.

library(ggplot2)

  ggplot(df, aes(x = "", y = time, fill = value)) +
    geom_bar(width = 1, stat = "identity", colour = "grey") +
    scale_y_continuous('', limits=c(0, 12), breaks = seq(1,12,1),
                       labels=df$labs) +
    scale_x_discrete('') +
    scale_fill_distiller('Percent', palette = 'Oranges', space = "Lab", direction = 1) +
    coord_polar(theta = "y", start = 0) +
    labs(title = "LEGIDA clock") +
    theme_minimal() +
    theme(axis.text = element_text(size = 18))

plot of chunk plot

Obviously, the rallies usually take place between 6 and 9 p.m.

Last update: 2016-08-30, after the 35^{th^} LEGIDA rally.

Web-App-Programmierung mit dem Shiny-Package

Bitte klicken Sie auf den Link.

Kombination von Box- und Jitter Plots mit ‘ggplot2’

Mit dem ggplot2-Package lassen sich verschiedene Plots miteinander kombinieren. So bietet z.B. eine Kombination von Box- und Jitter Plot die Möglichkeit, in einer Graphik sowohl basale Verteilungsmerkmale (Median, Quartile etc.) als auch die Verteilung der Werte selbst darzustellen. Darüber hinaus lassen sich auch sogenannte Cut-Off-Werte darstellen.

In diesem Tutorial zeige ich, wie man mit dem ggplot2-Package Box- mit sogenannten Jitter Plots kombiniert.

Zunächst erstellen wir drei Zufallsvariablen (var.scale, var.group, var.cutoff) mit jeweils 300 Fällen. Die Variable var.scale ist eine metrische Variable und hat einen WerteBereich von 0 bis 101, die Variable var.group ist eine Faktorvariable und enthält die Gruppen male und female. Die dritte Variable wurde auf Grundlage der var.scale-Variable mit vorher festgelegten Cut-Off-Werten berechnet. Dabei entsprecht der Wertebereich 0 – 40 der Kategorie low, der Wertebereich 41 – 60 der Kategorie medium und der Wertebereich > 60 der Kategorie high.

set.seed(1111)

var.scale <- round(rnorm(300, 50, 17))

var.group <- rbinom(300, 1, .5)

var.group <- factor(var.group, 
                     levels = c(0:1), 
                     labels = c("male", "female"))

var.cutoff <- ifelse(var.scale <= 40, 1, 
                     ifelse(var.scale > 40 & var.scale <= 60, 2, 3))

var.cutoff <- factor(var.cutoff, 
                     levels = c(3:1), 
                     labels = c("high", "medium", "low"))

Mit der describe()-Funktion aus dem Hmisc-Package lassen sich die Verteilungen dieser variablen anzeigen.

library(Hmisc)
describe(var.scale)

## var.scale 
##       n missing  unique    Info    Mean     .05     .10     .25     .50 
##     300       0      76       1   52.18   24.00   28.90   41.00   51.00 
##     .75     .90     .95 
##   65.00   75.00   80.05 
## 
## lowest :   0  13  15  16  19, highest:  91  92  93  94 101

describe(var.group)

## var.group 
##       n missing  unique 
##     300       0       2 
## 
## male (142, 47%), female (158, 53%)

describe(var.cutoff)

## var.cutoff 
##       n missing  unique 
##     300       0       3 
## 
## high (98, 33%), medium (129, 43%), low (73, 24%)

Im letzten Vorbereitungsschritt verknüpfen wir die soeben generierten Variablen mit der data.frame()-Funktion zu einem Dataframe, den wir df nennen. Dieser Schritt ist notwendig, da das im Folgenden verwendete Graphikpaket ggplot2 nur mit in einem Dataframe gespeicherten Daten funktioniert.

df <- data.frame(var.scale, var.cutoff, var.group)

Im Folgenden werden mit den Funktionen xlab(), ylab() und ggtitle() Labels für die X- und Y-Achse sowie ein Titel vergeben.

Die beiden Boxplots werden mit der Funktion geom_boxplot() aufgerufen, mit der Eigenschaft width wird die Breite der Plots spezifiziert.

Mit der Funktion geom_jitter() werden die Jitter Plots aufgerufen. Auch diese Funktion wird durch die Angabe verschiedener Eigenschaften (colour, position, size etc.) näher bestimmt.

library(ggplot2)

ggplot(df) +
  xlab("Group") +
  ylab("Scale") +
  ggtitle("Combination of Box and Jitter Plot") + 
  geom_boxplot(aes(var.group, var.scale), 
               width=0.5) + 
  geom_jitter(aes(var.group, var.scale, colour = var.cutoff), 
              position = position_jitter(width = .15, height=-0.7),
              size=2) +
  scale_y_continuous(limits=c(0, 101), 
                     breaks = seq(0, 110, 10)) +
  scale_color_manual(name="Legend", 
                     values=c("red", "blue3", "green3"))

plot of chunk plot

Abschließend werden mit den Funktionen scale_y_continuous() bzw. scale_color_manual() Y-Achse und Legende formatiert.

Grafiken mit Hintergrundbild erstellen

Manchmal kann es sinnvoll sein, die graphische Darstellung einer statistischen Größe auf ein Hintergrundbild zu applizieren. Mit R lässt sich das folgendermaßen bewerkstelligen:

Zuerst liest man mit der readJPEG()-Funktion aus dem jpeg-Paket das ausgewählte Hintergrundbild ein (im Beispiel heißt es blackboard.jpg).

library(jpeg)
imgage <- readJPEG("blackboard.jpg")

Danach lädt man die Pakete ggplot2 und grid, wobei mit dem ersten Paket die Grafik erzeugt und mit dem zweiten Paket das Hintergrundbild – hier eine Tafel – eingebunden wird.

library(ggplot2)
library(grid)

ggplot(diamonds) +
  annotation_custom(rasterGrob(imgage, 
                               width=unit(1,"npc"), 
                               height=unit(1,"npc")), 
                    -Inf, Inf, -Inf, Inf) +
  geom_bar(aes(clarity), fill="white", colour="red") +
  xlab("") +
  ylab("") +
  ggtitle("Säulendiagramm mit Hintergrundbild") +
  scale_fill_continuous(guide = FALSE)

PS: Der Dataframe diamonds sowie die darin enthaltene Variable clarity werden mit dem Paket ggplot2 geladen.

ggplot-Graphiken mit der source()-Funktion aufrufen

In R gibt es die Möglichkeit, mit der source()-Funktion von einem sogenannten MAKE- oder Masterfile aus auf eine oder mehere Quelldateien zuzugreifen. Dies bietet sich vor allem dann an, wenn der R-Code sehr umfangreich ist. Eine Aufteilung dies Codes in mehrere Source-Files sorgt dabei für eine größere Übersichtlichkeit.

source("C:/A_ggplot.R")

In der source()-Funktion müssen Name und Ort der Quelldatei angegeben werden. Das A_ am Anfang der im Beispielcode verwendeten Datei A_ggplot.R zeigt an, dass es sich bei der Quelldatei um ein Analyse-File handelt. Die Bezeichnung ggplot weißt darauf hin, dass mit dieser Datei eine Graphik mit dem R-Paket ggplot2 erzeugt wird.

library(ggplot2)
data(diamonds)  # 'diamonds' ist ein in R bereitgestellter Datensatz
ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()

Ruft man die Quelldatei A_ggplot.R mit der source()-Funktion auf, wird jedoch keinerlei Output erzeugt.

source("C:/A_ggplot.R")

Um die gewünschte Grafik zu erzeugen, gibt es zwei Möglichkeiten. Zum einen kann die source()-Funktion um die Angabe print.eval=TRUE ergänzt werden:

source("C:/A_ggplot.R", print.eval = TRUE)

Zum anderen kann die in der Quelldatei erzeugte Grafik als Objekt (hier: chart) gespeichert und danach der print()-Funktion zugewiesen werden:

library(ggplot2)
data(diamonds)  # 'diamonds' ist ein in R bereitgestellter Datensatz
chart <- ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()
print(chart)

Der Aufruf des Quellfiles erfolgt wie bereits weiter oben angegeben:

source("C:/A_ggplot.R")

Und der gewünschte Output wird erzeugt:

Das Kanzlerduell 2013

Diese “Gradient Word Cloud” habe ich mit Hilfe des GNU R “qdap”-Paketes generiert. Die Analyse des “Kanzlerduells” zwischen Angela Merkel und Peer Steinbrück zeigt zum einen, welche bedeutungstragenten Wörter am häufigsten benutzt wurden (markiert durch die Größe des Worts) und zum anderen, welcher der beiden Kanzlerkandidaten das entsprechende Wort häufiger benutzt hat als der andere. Dabei wurden alle blau gefärbten Wörter von Angela Merkel und alle rot eingefärbten Wörter von Peer Steinbrück häufiger genannt.

Intro

The data

R packages

Plotting the data

Share this:

Packages required

Choosing the data

Importing the background image

Drawing the plot

Adding opacity

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: