06_differential_accessibility

In the next step, we will start importing the read counts on consensus peaks (output from featureCounts) into R and perform differential accessibility analysis using DESeq2.

Import read counts and metadata

We start by loading featureCounts output as a matrix of counts

Task 1

Read the output of featurecounts that contains counts for all peaks in all samples: results/05_counts_reads/feature_Counts.txt
Keep the header of the table
Process the table in order to keep: read-counts columns only, peak IDs as row.names, simple column names (sample names)
Convert the table into matrix format

??? success “Solution”

```r
# Read table of counts (output from FeatureCounts)
counts_file <- "results/05_counts_reads/feature_Counts.txt"
cts_table <- read.table(counts_file, header = T)

# convert into matrix, with interval IDs as rownames
cts <- cts_table[,7:10]
row.names(cts) <- cts_table[,1]
colnames(cts) <- gsub(colnames(cts), pattern = "results.01_filtered_bams.", replacement = "") %>% 
                 gsub(., pattern=".qc_bl_filt.sorted.bam", replacement="")
```

** Task 2**

Create metadata table We need to create a data.frame containing metadata information about the samples. Here we have two conditions, PC and RACM, with two replicates each. We need to create a data.frame containing the conditions information for each sample. If we would have a second factor, e.g. treatment (treated vs untreated), we would need to add this information as a second column in the data.frame.

!!! Note:

Important: The order of the rows in the data.frame must match the order of the columns in the counts matrix
           The metadata data.frame must have factor variables (not character variables)

condition <- factor( c(rep("PC",2), rep("RACM",2)) )
colData <- data.frame(condition, row.names = colnames(cts))

DEseq object and analysis

Bring together counts and metadata to create a DESeq object

dds <- DESeqDataSetFromMatrix(
  countData = cts, colData = colData, 
  design = ~ condition)
dim(dds)

Remove lowly exp peaks

idx <- rowSums(counts(dds, normalized=FALSE) >= 50) >= 2
dds.f <- dds[idx, ]
dim(dds.f)

We perform the estimation of dispersions

dds <- DESeq(dds.f)

And plot PCA of the samples

vsd <- varianceStabilizingTransformation(dds.f, blind=TRUE )
pcaData <- plotPCA(vsd, intgroup=c("condition"), ntop="all")
pcaData + geom_label(aes(x=PC1,y=PC2,label=name))
plotPCA(vsd, intgroup=c("sizeFactor"))