In the next step, we will start importing the read counts on consensus peaks (output from featureCounts) into R and perform differential accessibility analysis using DESeq2.
We start by loading featureCounts output as a matrix of counts
Task 1
results/05_counts_reads/feature_Counts.txt??? success “Solution”
```r
# Read table of counts (output from FeatureCounts)
counts_file <- "results/05_counts_reads/feature_Counts.txt"
cts_table <- read.table(counts_file, header = T)
# convert into matrix, with interval IDs as rownames
cts <- cts_table[,7:10]
row.names(cts) <- cts_table[,1]
colnames(cts) <- gsub(colnames(cts), pattern = "results.01_filtered_bams.", replacement = "") %>%
gsub(., pattern=".qc_bl_filt.sorted.bam", replacement="")
```
** Task 2**
Create metadata table We need to create a data.frame containing metadata information about the samples. Here we have two conditions, PC and RACM, with two replicates each. We need to create a data.frame containing the conditions information for each sample. If we would have a second factor, e.g. treatment (treated vs untreated), we would need to add this information as a second column in the data.frame.
!!! Note:
Important: The order of the rows in the data.frame must match the order of the columns in the counts matrix
The metadata data.frame must have factor variables (not character variables)
condition <- factor( c(rep("PC",2), rep("RACM",2)) )
colData <- data.frame(condition, row.names = colnames(cts))
Bring together counts and metadata to create a DESeq object
dds <- DESeqDataSetFromMatrix(
countData = cts, colData = colData,
design = ~ condition)
dim(dds)
Remove lowly exp peaks
idx <- rowSums(counts(dds, normalized=FALSE) >= 50) >= 2
dds.f <- dds[idx, ]
dim(dds.f)
We perform the estimation of dispersions
dds <- DESeq(dds.f)
And plot PCA of the samples
vsd <- varianceStabilizingTransformation(dds.f, blind=TRUE )
pcaData <- plotPCA(vsd, intgroup=c("condition"), ntop="all")
pcaData + geom_label(aes(x=PC1,y=PC2,label=name))
plotPCA(vsd, intgroup=c("sizeFactor"))