<- function(babynames, select_sex, from = 1950) {
get_most_frequent <- babynames |>
most_freq filter(sex == select_sex, year > from) |>
group_by(name) |>
summarise(average = mean(prop)) |>
arrange(desc(average))
return(list(
babynames = babynames,
most_frequent = most_freq,
sex = select_sex,
from = from))
}
<- function(x, top = 10) {
plot_top <- x$most_frequent$name[1:top]
topx
<- x$babynames |>
p filter(name %in% topx, sex == x$sex, year > x$from) |>
ggplot(aes(x = year, y = prop, color = name)) +
geom_line() +
scale_color_brewer(palette = "Paired") +
theme_classic()
return(p)
}
Writing with Quarto
After finishing this chapter you will be able to:
Create a quarto project with RStudio
Locally render and view a quarto website
Add markdown content to a quarto document
Add and execute code chunks to a quarto document
Add and modify code-chunk options
Execute code chunks that create figures and create cross references to them
Initiate a website project
We’ll start with creating a website project. In Rstudio:
Choose File > New Project..
Choose New Directory
Choose Quarto website
Choose a location and name for the website. Leave Create a git repository checked
Finish by pushing the button Create Project
Now we have created a template for a quarto website 👏 . It doesn’t contain much, but we can already render it and use it as a functional website. In order to view it locally in your browser you can either:
Open
index.qmd
in Rstudio and press the button RenderOr you can use the Terminal and type:
quarto preview
In both cases your computer will open a browser and locally host the page at a random port.
Adding content
Adding text
In this part, we will use quarto to display our project goals. In index.qmd
, create an introduction where you will state:
In this project we aim to visualize the trends of the most frequently used babynames from 1880 to 2017 in the United States. We do this by:
- Understanding the different columns of the data set
- Find the top 10 most frequently used baby names in the data for:
- girls
- boys
- Plot the yearly trend of the top 10 baby names
Throughout this course, do not hesitate to refer to this quarto markdown basic sheet.
Take the above introduction of your babynames
project, convert it to markdown text, and add it to index.qmd
.
This is what your index.qmd
could look like:
---
title: "quarto-tutorial"
---
In this project we aim to visualize the trends of the most frequently used babynames from 1880 to 2017 in the United States. We do this by:
- Understanding the different columns of the data set
- Find the top 10 most frequently used baby names in the data for:
- girls
- boys
- Plot the yearly trend of the top 10 baby names
Adding an image
You can add images with the following syntax:

Check out the quarto figure guide. Find an image of a baby in the public domain on creative commons. Get the full URL of the image and add it to your index.qmd
. Make it 400 pixels in width and align it to the right side of the page.
Your index.qmd
could look like this:
---
title: "quarto-tutorial"
---
In this project we aim to visualize the trends of the most frequently used babynames from 1880 to 2017 in the United States. We do this by:
- Understanding the different columns of the data set
- Find the top 10 most frequently used baby names in the data for:
- girls
- boys
- Plot the yearly trend of the top 10 baby names
{fig-align="right" width=400}
Adding a page
We will now add a page for the analysis. We do this by clicking in Rstudio File > New File.. > Quarto Document... Choose a title, e.g. analysis
and press Create. Save the newly created file as analysis.qmd
.
In order to make our new page part of the website, we have to edit _quarto.yml
. By adding it to the navbar
item in the website
markup:
website:
title: "quarto-tutorial"
navbar:
left:
- href: index.qmd
text: Home
- about.qmd
- analysis.qmd
Adding code chunks
We can add some code to analysis.qmd
. First, we add a chunk that loads the libraries:
```{r}
library(babynames)
library(knitr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(pheatmap)
```
Plotting and cross-referencing
To display the first 10 lines of the babynames table you can add:
```{r}
head(babynames) |> kable()
```
Add the two above code chunks as an R code chunk to analysis.qmd
, and do the following:
- add some text to let the reader know what it does
- suppress the package startup messages with
#| output: false
at the top of the chunk that loads the packages. More info about the hash-pipe here - re-render the site.
---
title: "Analysis"
---
First we load the packages:
```{r}
#| output: false
library(babynames)
library(knitr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(pheatmap)
```
The first ten lines of the babynames dataset looks like:
```{r}
head(babynames) |> kable()
```
Here, I’ve created two functions that do the following:
get_most_frequent
: Gets the most frequent babynames over a time-period.plot_top
: from the output ofget_most_frequent
. Plot the top n most popular names.
This is not a coding tutorial, so you can ignore the code itself. Just know what is does.
Plotting them for girls like this:
get_most_frequent(babynames, select_sex = "F") |>
plot_top()
Plotting them for boys like this:
get_most_frequent(babynames, select_sex = "M") |>
plot_top()
Add the above three code chunks to analysis.qmd
, and do the following:
- Since the functions are a bit bulky we want to make the chunk foldable. Do this by
#| code-fold: true
- Describe in a text what you see in the figures. Refer to the individual figures in text by cross-referencing.
To create a visualization of the most popular baby names, we have created two functions. Click the 'Code' link to view:
```{r}
#| code-fold: true
<- function(babynames, select_sex, from = 1950) {
get_most_frequent <- babynames |>
most_freq filter(sex == select_sex, year > from) |>
group_by(name) |>
summarise(average = mean(prop)) |>
arrange(desc(average))
return(list(
babynames = babynames,
most_frequent = most_freq,
sex = select_sex,
from = from))
}
<- function(x, top = 10) {
plot_top <- x$most_frequent$name[1:top]
topx
<- x$babynames |>
p filter(name %in% topx, sex == x$sex, year > x$from) |>
ggplot(aes(x = year, y = prop, color = name)) +
geom_line() +
scale_color_brewer(palette = "Paired") +
theme_classic()
return(p)
}```
Here we call the code to visualize the top 10 most frequent girl names from 1950 onwards:
```{r}
#| label: fig-line-girls
#| fig-cap: "Line plot displaying proportion of top 10 girl names by year"
get_most_frequent(babynames, select_sex = "F") |>
plot_top()
```
In @fig-line-girls you can see that there has been a peak in popularity for 'Lisa', 'Jennifer' and 'Jessica'. Interesting! Let's have a look at the boys names:
```{r}
#| label: fig-line-boys
#| fig-cap: "Line plot displaying proportion of top 10 boy names by year"
get_most_frequent(babynames, select_sex = "M") |>
plot_top()
```
@fig-line-boys shows that names that were popular before 1990 are relatively infrequent after 2000.