<- function(babynames, select_sex, from = 1950) {
get_most_frequent <- babynames |>
most_freq filter(sex == select_sex, year > from) |>
group_by(name) |>
summarise(average = mean(prop)) |>
arrange(desc(average))
return(list(
babynames = babynames,
most_frequent = most_freq,
sex = select_sex,
from = from))
}
<- function(x, top = 10) {
plot_top <- x$most_frequent$name[1:top]
topx
<- x$babynames |>
p filter(name %in% topx, sex == x$sex, year > x$from) |>
ggplot(aes(x = year, y = prop, color = name)) +
geom_line() +
scale_color_brewer(palette = "Paired") +
theme_classic()
return(p)
}
Writing with Quarto
After finishing this chapter you will be able to:
Create a quarto project with RStudio
Locally render and view a quarto website
Add markdown content to a quarto document
Add and execute code chunks to a quarto document
Add and modify code-chunk options
Execute code chunks that create figures and create cross references to them
Initiate a website project
We’ll start with creating a website project. In Rstudio:
Choose File > New Project..
Choose New Directory
Choose Quarto website
Choose a location and name for the website. Leave Create a git repository checked
Finish by pushing the button Create Project
Now we have created a template for a quarto website 👏 . It doesn’t contain much, but we can already render it and use it as a functional website. In order to view it locally in your browser you can either:
Open
index.qmd
in Rstudio and press the button RenderOr you can use the Terminal and type:
quarto preview
In both cases your computer will open a browser and locally host the page at a random port.
Adding content
Adding text
In this part, we will use quarto to display our project goals. In index.qmd
, create an introduction where you will state:
In this project we aim to visualize the trends of the most frequently used babynames from 1880 to 2017 in the United States. We do this by:
- Understanding the different columns of the data set
- Find the top 10 most frequently used baby names in the data for:
- girls
- boys
- Plot the yearly trend of the top 10 baby names
Take the above introduction of your babynames
project, convert it to markdown text, and add it to index.qmd
.
This is what your index.qmd
could look like:
---
title: "quarto-tutorial"
---
In this project we aim to visualize the trends of the most frequently used babynames from 1880 to 2017 in the United States. We do this by:
- Understanding the different columns of the data set
- Find the top 10 most frequently used baby names in the data for:
- girls
- boys
- Plot the yearly trend of the top 10 baby names
Adding an image
You can add images with the following syntax:
![image_alt_text](path/or/url/to/image.png)
Check out the quarto figure guide. Find an image of a baby in the public domain on creative commons. Get the full URL of the image and add it to your index.qmd
. Make it 400 pixels in width and align it to the right side of the page.
Your index.qmd
could look like this:
---
title: "quarto-tutorial"
---
In this project we aim to visualize the trends of the most frequently used babynames from 1880 to 2017 in the United States. We do this by:
- Understanding the different columns of the data set
- Find the top 10 most frequently used baby names in the data for:
- girls
- boys
- Plot the yearly trend of the top 10 baby names
![](https://cdn.pixabay.com/photo/2016/10/02/06/27/baby-1709013_1280.jpg){fig-align="right" width=400}
Adding a page
We will now add a page for the analysis. We do this by clicking in Rstudio File > New File.. > Quarto Document... Choose a title, e.g. analysis
and press Create. Save the newly created file as analysis.qmd
.
In order to make our new page part of the website, we have to edit _quarto.yml
. By adding it to the navbar
item in the website
markup:
website:
title: "quarto-tutorial"
navbar:
left:
- href: index.qmd
text: Home
- about.qmd
- analysis.qmd
Adding code chunks
We can add some code to analysis.qmd
. First, we add a chunk that loads the libraries:
```{r}
library(babynames)
library(knitr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(pheatmap)
```
Plotting and cross-referencing
To display the first 10 lines of the babynames table you can add:
```{r}
head(babynames) |> kable()
```
Add the two above code chunks as an R code chunk to analysis.qmd
, and do the following:
- add some text to let the reader know what it does
- suppress the package startup messages with
#| output: false
at the top of the chunk that loads the packages. More info about the hash-pipe here - re-render the site.
---
title: "Analysis"
---
First we load the packages:
```{r}
#| output: false
library(babynames)
library(knitr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(pheatmap)
```
The first ten lines of the babynames dataset looks like:
```{r}
head(babynames) |> kable()
```
Here, I’ve created two functions that do the following:
get_most_frequent
: Gets the most frequent babynames over a time-period.plot_top
: from the output ofget_most_frequent
. Plot the top n most popular names.
This is not a coding tutorial, so you can ignore the code itself. Just know what is does.
Plotting them for girls like this:
get_most_frequent(babynames, select_sex = "F") |>
plot_top()
Plotting them for boys like this:
get_most_frequent(babynames, select_sex = "M") |>
plot_top()
Add the above three code chunks to analysis.qmd
, and do the following:
- Since the functions are a bit bulky we want to make the chunk foldable. Do this by
#| code-fold: true
- Describe in a text what you see in the figures. Refer to the individual figures in text by cross-referencing.
To create a visualization of the most popular baby names, we have created two functions. Click the 'Code' link to view:
```{r}
#| code-fold: true
get_most_frequent <- function(babynames, select_sex, from = 1950) {
most_freq <- babynames |>
filter(sex == select_sex, year > from) |>
group_by(name) |>
summarise(average = mean(prop)) |>
arrange(desc(average))
return(list(
babynames = babynames,
most_frequent = most_freq,
sex = select_sex,
from = from))
}
plot_top <- function(x, top = 10) {
topx <- x$most_frequent$name[1:top]
p <- x$babynames |>
filter(name %in% topx, sex == x$sex, year > x$from) |>
ggplot(aes(x = year, y = prop, color = name)) +
geom_line() +
scale_color_brewer(palette = "Paired") +
theme_classic()
return(p)
}
```
Here we call the code to visualize the top 10 most frequent girl names from 1950 onwards:
```{r}
#| label: fig-line-girls
#| fig-cap: "Line plot displaying proportion of top 10 girl names by year"
get_most_frequent(babynames, select_sex = "F") |>
plot_top()
```
In @fig-line-girls you can see that there has been a peak in popularity for 'Lisa', 'Jennifer' and 'Jessica'. Interesting! Let's have a look at the boys names:
```{r}
#| label: fig-line-boys
#| fig-cap: "Line plot displaying proportion of top 10 boy names by year"
get_most_frequent(babynames, select_sex = "M") |>
plot_top()
```
@fig-line-boys shows that names that were popular before 1990 are relatively infrequent after 2000.