Writing with Quarto

Learning outcomes

After finishing this chapter you will be able to:

  • Create a quarto project with RStudio

  • Locally render and view a quarto website

  • Add markdown content to a quarto document

  • Add and execute code chunks to a quarto document

  • Add and modify code-chunk options

  • Execute code chunks that create figures and create cross references to them

Initiate a website project

We’ll start with creating a website project. In Rstudio:

  • Choose File > New Project..

  • Choose New Directory

  • Choose Quarto website

  • Choose a location and name for the website. Leave Create a git repository checked

  • Finish by pushing the button Create Project

Now we have created a template for a quarto website 👏 . It doesn’t contain much, but we can already render it and use it as a functional website. In order to view it locally in your browser you can either:

  • Open index.qmd in Rstudio and press the button Render

  • Or you can use the Terminal and type:

quarto preview

In both cases your computer will open a browser and locally host the page at a random port.

Adding content

Adding text

In this part, we will use quarto to display our project goals. In index.qmd, create an introduction where you will state:

In this project we aim to visualize the trends of the most frequently used babynames from 1880 to 2017 in the United States. We do this by:

  • Understanding the different columns of the data set
  • Find the top 10 most frequently used baby names in the data for:
    • girls
    • boys
  • Plot the yearly trend of the top 10 baby names
Note

Throughout this course, do not hesitate to refer to this quarto markdown basic sheet.

Exercise

Take the above introduction of your babynames project, convert it to markdown text, and add it to index.qmd.

This is what your index.qmd could look like:

---
title: "quarto-tutorial"
---

In this project we aim to visualize the trends of the most frequently used babynames from 1880 to 2017 in the United States. We do this by:

- Understanding the different columns of the data set
- Find the top 10 most frequently used baby names in the data for:
   - girls
   - boys
- Plot the yearly trend of the top 10 baby names 

Adding an image

You can add images with the following syntax:

![image_alt_text](path/or/url/to/image.png)
Exercise

Check out the quarto figure guide. Find an image of a baby in the public domain on creative commons. Get the full URL of the image and add it to your index.qmd. Make it 400 pixels in width and align it to the right side of the page.

Your index.qmd could look like this:

---
title: "quarto-tutorial"
---

In this project we aim to visualize the trends of the most frequently used babynames from 1880 to 2017 in the United States. We do this by:

- Understanding the different columns of the data set
- Find the top 10 most frequently used baby names in the data for:
  - girls
  - boys
- Plot the yearly trend of the top 10 baby names 

![](https://cdn.pixabay.com/photo/2016/10/02/06/27/baby-1709013_1280.jpg){fig-align="right" width=400}

Adding a page

We will now add a page for the analysis. We do this by clicking in Rstudio File > New File.. > Quarto Document... Choose a title, e.g. analysis and press Create. Save the newly created file as analysis.qmd.

In order to make our new page part of the website, we have to edit _quarto.yml. By adding it to the navbar item in the website markup:

website:
  title: "quarto-tutorial"
  navbar:
    left:
      - href: index.qmd
        text: Home
      - about.qmd
      - analysis.qmd

Adding code chunks

We can add some code to analysis.qmd. First, we add a chunk that loads the libraries:

```{r}
library(babynames)
library(knitr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(pheatmap)
```

Plotting and cross-referencing

To display the first 10 lines of the babynames table you can add:

```{r}
head(babynames) |> kable()
```
Exercise

Add the two above code chunks as an R code chunk to analysis.qmd, and do the following:

  • add some text to let the reader know what it does
  • suppress the package startup messages with #| output: false at the top of the chunk that loads the packages. More info about the hash-pipe here
  • re-render the site.
---
title: "Analysis"
---

First we load the packages:

```{r}
#| output: false
library(babynames)
library(knitr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(pheatmap)
```

The first ten lines of the babynames dataset looks like:


```{r}
head(babynames) |> kable()
```

Here, I’ve created two functions that do the following:

  • get_most_frequent: Gets the most frequent babynames over a time-period.
  • plot_top: from the output of get_most_frequent. Plot the top n most popular names.
Note

This is not a coding tutorial, so you can ignore the code itself. Just know what is does.

get_most_frequent <- function(babynames, select_sex, from = 1950) {
  most_freq <- babynames |>
    filter(sex == select_sex, year > from) |>
    group_by(name) |>
    summarise(average = mean(prop)) |>
    arrange(desc(average))
    
  return(list(
    babynames = babynames,
    most_frequent = most_freq,
    sex = select_sex,
    from = from))
}

plot_top <- function(x, top = 10) {
  topx <- x$most_frequent$name[1:top]
  
  p <- x$babynames |>
    filter(name %in% topx, sex == x$sex, year > x$from) |>
    ggplot(aes(x = year, y = prop, color = name)) +
    geom_line() +
    scale_color_brewer(palette = "Paired") +
    theme_classic()
  
  return(p)
}

Plotting them for girls like this:

get_most_frequent(babynames, select_sex = "F") |>
  plot_top()

Plotting them for boys like this:

get_most_frequent(babynames, select_sex = "M") |>
  plot_top()
Exercise

Add the above three code chunks to analysis.qmd, and do the following:

  • Since the functions are a bit bulky we want to make the chunk foldable. Do this by #| code-fold: true
  • Describe in a text what you see in the figures. Refer to the individual figures in text by cross-referencing.
To create a visualization of the most popular baby names, we have created two functions. Click the 'Code' link to view:

```{r}
#| code-fold: true
get_most_frequent <- function(babynames, select_sex, from = 1950) {
  most_freq <- babynames |>
    filter(sex == select_sex, year > from) |>
    group_by(name) |>
    summarise(average = mean(prop)) |>
    arrange(desc(average))
    
  return(list(
    babynames = babynames,
    most_frequent = most_freq,
    sex = select_sex,
    from = from))
}

plot_top <- function(x, top = 10) {
  topx <- x$most_frequent$name[1:top]
  
  p <- x$babynames |>
    filter(name %in% topx, sex == x$sex, year > x$from) |>
    ggplot(aes(x = year, y = prop, color = name)) +
    geom_line() +
    scale_color_brewer(palette = "Paired") +
    theme_classic()
  
  return(p)
}
```

Here we call the code to visualize the top 10 most frequent girl names from 1950 onwards:


```{r}
#| label: fig-line-girls
#| fig-cap: "Line plot displaying proportion of top 10 girl names by year"
get_most_frequent(babynames, select_sex = "F") |>
  plot_top()
```
    
In @fig-line-girls you can see that there has been a peak in popularity for 'Lisa', 'Jennifer' and 'Jessica'. Interesting! Let's have a look at the boys names:

```{r}
#| label: fig-line-boys
#| fig-cap: "Line plot displaying proportion of top 10 boy names by year"
get_most_frequent(babynames, select_sex = "M") |>
  plot_top()
```

@fig-line-boys shows that names that were popular before 1990 are relatively infrequent after 2000. 

Extra: quarto extension and webassembly

Quarto offers the possiblity to personnalise and extend its base capability with extensions built by the quarto community.

These include extra fonts and emojis, additionnal document template (like scientific journal template), extra graphical components (like the accordion ).

Extensions can be installed with the terminal command:

quarto add EXTENSIONNAME

The extension will be installed in _extensions/ folder.

Note that extension are specific to a quarto project, meaning that they need to be installed separately for each project. This may sound tedious, but in practice it makes sharing and automatically building websites or documents with these extensions easier.

Let’s try it with an extension that extends quarto with webassembly components for python and R: quarto-live.

quarto-live : web assembly for “free” interactive computing

Using web-assembly, we can have python or R code executed on the fly in the browser. In quarto, this can be achieved with the quarto-live extension.

This is very nice as it lets you build fully interactive app without the need to host them with a server architecture (because all the computations happen on the user’s computer), which lets us host it as a static website (potentially for free, as for our github page!).

Of course there is a downside: upon loading the page, the user’s browser will need to install all the R webassembly components and associated libraries. This generally takes around 10 to 30 seconds if you don’t have many complex libraries. Also, the execution of the R or python code is often a bit slower than what you would experience outside the web browser.


We start by installing the extension.

In the terminal, navigate to your quarto project folder and type:

quarto add r-wasm/quarto-live

The command will ask you to type “Y” and press Enter in order to validate the installation.

Then, you can start adding webassembly components in your qmd files.

For example, try adding a new file interactive_compute.qmd

this example is heavily inspired by this page.

We start the page by the yaml preamble:

---
title: Interactive work without a server with web assembly
format: live-html
engine: knitr
---

{{< include ./_extensions/r-wasm/live/_knitr.qmd >}}

Note here the format: live-html, and engine: knitr. The first marks the document as a live document (that uses quarto-live), and the second set the rendering engine (the alternative is engine: jupyter, which requires a python installation).

The last line is required if you use the knitr rendering engine. According to the extension documentation in the future it will not be needed anymore, but for now we have to include it.

Then, you can (finally) start adding some code:


```{webr}
#| output: false
#| echo: FALSE
#| edit: false
#| autorun: true
library(babynames)
library(ggplot2)
library(magrittr)
library(dplyr)
```

Histogram interacting with a javascript [Observable input](https://github.com/observablehq/inputs?tab=readme-ov-file) component:

```{ojs}
//| echo: false
viewof sex_box = Inputs.checkbox(
  ["M", "F"],
  {
    value: ["M", "F"],
    label: "Shown sexes:",
  }
)
```

```{webr}
#| echo: FALSE
#| edit: false
#| autorun: true
#| input:
#|   - sex_box

nbins = 100
data = babynames %>% filter( sex %in% sex_box  )

ggplot( data , aes(x=n , fill = sex) ) + geom_histogram(bins = nbins) + scale_x_log10()
```

You can note that the code cells types are different:

  • {webr} : contains R that will be in web-assembly: rather than being executed once during the html rendering, they will be executed live in the browser.
  • {ojs} : Observable javascript - we use these cells to include interactive javascript input/outputs

Also, in addition to the base quarto chunk options, we see:

  • //| in {ojs} cells, because javascript comments start with // rather than #
  • #| edit: : make a cell editable or not
  • #| autorun: true : when true the cell is automatically executed at startup and whenever its content is changed.

And importantly:

#| input:
#|   - sex_box

This tells R to go grab a javascript variable we defined with viewof in another {ojs} block.

From there, even more components can be added to create the interactive plot of dashboard of your choice:

```{webr}
#| echo: false
#| edit: false
#| autorun: true

get_most_frequent <- function(babynames, select_sex, from = 1950) {
  most_freq <- babynames |>
    filter(sex == select_sex, year > from) |>
    group_by(name) |>
    summarise(average = mean(prop)) |>
    arrange(desc(average))
    
  return(list(
    babynames = babynames,
    most_frequent = most_freq,
    sex = select_sex,
    from = from))
}

plot_top <- function(x, top = 10) {
  topx <- x$most_frequent$name[1:top]
  
  p <- x$babynames |>
    filter(name %in% topx, sex == x$sex, year > x$from) |>
    ggplot(aes(x = year, y = prop, color = name)) +
    geom_line() +
    scale_color_brewer(palette = "Paired") +
    theme_classic()
  
  return(p)
}
```


A more complex examples with more interactive inputs (NB: it seems that when moving the slider the image get re-computed for each intermediate value, leading to some slowness sometimes)  :
```{webr}
#| echo: FALSE
#| edit: false
#| autorun: true
#| input:
#|   - sex_show
#|   - year
#|   - n_names

get_most_frequent(babynames, select_sex = sex_show , from = year ) |>
  plot_top(top=n_names)

```


```{ojs}
//| echo: false
viewof sex_show = Inputs.radio(["M", "F"], {label: "Sex:",value:"F"})

viewof year = Inputs.range([1880, 2010], {step: 10, label: "starting year:"})

viewof n_names = Inputs.range([1, 12], {step: 1, label: "number of top names:"})
```

You can also just have interactive R/python cells where you can write and execute code :

```{webr}
for (x in 1:5) {
  print(x ** 2)
}
```