Setup and data exploration

As said, we will use the the wild-type data from the Tal1 chimera experiment:

  • Sample 5: E8.5 injected cells (tomato positive), pool 3
  • Sample 6: E8.5 host cells (tomato negative), pool 3
  • Sample 7: E8.5 injected cells (tomato positive), pool 4
  • Sample 8: E8.5 host cells (tomato negative), pool 4
  • Sample 9: E8.5 injected cells (tomato positive), pool 5
  • Sample 10: E8.5 host cells (tomato negative), pool 5

Note that this is a paired design in which for each biological replicate (pool 3, 4, and 5), we have both host and injected cells.

We start by loading the data and doing a quick exploratory analysis, essentially applying the normalization and visualization techniques that we have seen in the previous lectures to all samples.

library(MouseGastrulationData)
sce <- WTChimeraData(type = "processed", samples=5:10)
sce
## class: SingleCellExperiment 
## dim: 29453 20935 
## metadata(0):
## assays(1): counts
## rownames(29453): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000095742 tomato-td
## rowData names(2): ENSEMBL SYMBOL
## colnames(20935): cell_9769 cell_9770 ... cell_30702 cell_30703
## colData names(11): cell barcode ... doub.density sizeFactor
## reducedDimNames(2): pca.corrected.E7.5 pca.corrected.E8.5
## mainExpName: NULL
## altExpNames(0):
colData(sce)
## DataFrame with 20935 rows and 11 columns
##                   cell          barcode    sample       stage    tomato
##            <character>      <character> <integer> <character> <logical>
## cell_9769    cell_9769 AAACCTGAGACTGTAA         5        E8.5      TRUE
## cell_9770    cell_9770 AAACCTGAGATGCCTT         5        E8.5      TRUE
## cell_9771    cell_9771 AAACCTGAGCAGCCTC         5        E8.5      TRUE
## cell_9772    cell_9772 AAACCTGCATACTCTT         5        E8.5      TRUE
## cell_9773    cell_9773 AAACGGGTCAACACCA         5        E8.5      TRUE
## ...                ...              ...       ...         ...       ...
## cell_30699  cell_30699 TTTGTCACAGCTCGCA        10        E8.5     FALSE
## cell_30700  cell_30700 TTTGTCAGTCTAGTCA        10        E8.5     FALSE
## cell_30701  cell_30701 TTTGTCATCATCGGAT        10        E8.5     FALSE
## cell_30702  cell_30702 TTTGTCATCATTATCC        10        E8.5     FALSE
## cell_30703  cell_30703 TTTGTCATCCCATTTA        10        E8.5     FALSE
##                 pool stage.mapped        celltype.mapped closest.cell
##            <integer>  <character>            <character>  <character>
## cell_9769          3        E8.25             Mesenchyme   cell_24159
## cell_9770          3         E8.5            Endothelium   cell_96660
## cell_9771          3         E8.5              Allantois  cell_134982
## cell_9772          3         E8.5             Erythroid3  cell_133892
## cell_9773          3        E8.25             Erythroid1   cell_76296
## ...              ...          ...                    ...          ...
## cell_30699         5         E8.5             Erythroid3   cell_38810
## cell_30700         5         E8.5       Surface ectoderm   cell_38588
## cell_30701         5        E8.25 Forebrain/Midbrain/H..   cell_66082
## cell_30702         5         E8.5             Erythroid3  cell_138114
## cell_30703         5         E8.0                Doublet   cell_92644
##            doub.density sizeFactor
##               <numeric>  <numeric>
## cell_9769    0.02985045    1.41243
## cell_9770    0.00172753    1.22757
## cell_9771    0.01338013    1.15439
## cell_9772    0.00218402    1.28676
## cell_9773    0.00211723    1.78719
## ...                 ...        ...
## cell_30699   0.00146287   0.389311
## cell_30700   0.00374155   0.588784
## cell_30701   0.05651258   0.624455
## cell_30702   0.00108837   0.550807
## cell_30703   0.82369305   1.184919

To speed up computations, after removing doublets, we randomly select 50% cells per sample.

library(scater)
library(ggplot2)
library(scran)

# remove doublets
drop <- sce$celltype.mapped %in% c("stripped", "Doublet")
sce <- sce[,!drop]

set.seed(29482)
idx <- unlist(tapply(colnames(sce), sce$sample, function(x) {
    perc <- round(0.50 * length(x))
    sample(x, perc)
}))

sce <- sce[,idx]

We now normalize the data and visualize them in a tSNE plot.

# normalization
sce <- logNormCounts(sce)

# identify highly variable genes
dec <- modelGeneVar(sce, block=sce$sample)
chosen.hvgs <- dec$bio > 0

# dimensionality reduction
sce <- runPCA(sce, subset_row = chosen.hvgs, ntop = 1000)
sce <- runTSNE(sce, dimred = "PCA")

sce$sample <- as.factor(sce$sample)
plotTSNE(sce, colour_by = "sample")

plotTSNE(sce, colour_by = "celltype.mapped") +
    scale_color_discrete() +
    theme(legend.position = "bottom")

There are evident sample effects. Depending on the analysis that you want to perform you may want to remove or retain the sample effect. For instance, if the goal is to identify cell types with a clustering method, one may want to remove the sample effects with “batch effect” correction methods.

For now, let’s assume that we want to remove this effect.

Correcting batch effects

We correct the effect of samples by aid of the correctExperiment function in the batchelor package and using the sample colData column as batch.

library(batchelor)
set.seed(10102)
merged <- correctExperiments(sce, 
    batch=sce$sample, 
    subset.row=chosen.hvgs,
    PARAM=FastMnnParam(
        merge.order=list(
            list(1,3,5), # WT (3 replicates)
            list(2,4,6)  # td-Tomato (3 replicates)
        )
    )
)

merged <- runTSNE(merged, dimred="corrected")
plotTSNE(merged, colour_by="batch")

Once we removed the sample batch effect, we can proceed with the Differential Expression Analysis.

Differential Expression

In order to perform a Differential Expression Analysis, we need to identify groups of cells across samples/conditions (depending on the experimental design and the final aim of the experiment).

As previously seen, we have two ways of grouping cells, cell clustering and cell labeling. In our case we will focus on this second aspect to group cells according to the already annotated cell types to proceed with the computation of the pseudo-bulk samples.

Pseudo-bulk samples

To compute differences between groups of cells, a possible way is to compute pseudo-bulk samples, where we mediate the gene signal of all the cells for each specific cell type. In this manner, we are then able to detect differences between the same cell type across two different conditions.

To compute pseudo-bulk samples, we use the aggregateAcrossCells function in the scuttle package, which takes as input not only a SingleCellExperiment, but also the id to use for the identification of the group of cells. In our case, we use as id not just the cell type, but also the sample, because we want be able to discern between replicates and conditions during further steps.

# Using 'label' and 'sample' as our two factors; each column of the output
# corresponds to one unique combination of these two factors.
library(scuttle)
summed <- aggregateAcrossCells(merged, 
    id=colData(merged)[,c("celltype.mapped", "sample")])
summed
## class: SingleCellExperiment 
## dim: 13641 179 
## metadata(2): merge.info pca.info
## assays(1): counts
## rownames(13641): ENSMUSG00000051951 ENSMUSG00000025900 ...
##   ENSMUSG00000096730 ENSMUSG00000095742
## rowData names(3): rotation ENSEMBL SYMBOL
## colnames: NULL
## colData names(15): batch cell ... sample ncells
## reducedDimNames(5): corrected pca.corrected.E7.5 pca.corrected.E8.5 PCA
##   TSNE
## mainExpName: NULL
## altExpNames(0):

Differential Expression Analysis

The main advantage of using pseudo-bulk samples is the possibility to use well-tested methods for differential analysis like edgeR and DESeq2, we will focus on the former for this analysis. edgeR uses a Negative Binomial Generalized Linear Model.

First, let’s start with a specific cell type, for instance the “Mesenchymal stem cells”, and look into differences between this cell type across conditions.

label <- "Mesenchyme"
current <- summed[,label==summed$celltype.mapped]

# Creating up a DGEList object for use in edgeR:
library(edgeR)
y <- DGEList(counts(current), samples=colData(current))
y
## An object of class "DGEList"
## $counts
##                    Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
## ENSMUSG00000051951       2       0       0       0       1       0
## ENSMUSG00000025900       0       0       0       0       0       0
## ENSMUSG00000025902       4       0       2       0       3       6
## ENSMUSG00000033845     765     130     508     213     781     305
## ENSMUSG00000002459       2       0       1       0       0       0
## 13636 more rows ...
## 
## $samples
##         group lib.size norm.factors batch cell barcode sample stage tomato pool
## Sample1     1  2478901            1     5 <NA>    <NA>      5  E8.5   TRUE    3
## Sample2     1   548407            1     6 <NA>    <NA>      6  E8.5  FALSE    3
## Sample3     1  1260187            1     7 <NA>    <NA>      7  E8.5   TRUE    4
## Sample4     1   578699            1     8 <NA>    <NA>      8  E8.5  FALSE    4
## Sample5     1  2092329            1     9 <NA>    <NA>      9  E8.5   TRUE    5
## Sample6     1   904929            1    10 <NA>    <NA>     10  E8.5  FALSE    5
##         stage.mapped celltype.mapped closest.cell doub.density sizeFactor
## Sample1         <NA>      Mesenchyme         <NA>           NA         NA
## Sample2         <NA>      Mesenchyme         <NA>           NA         NA
## Sample3         <NA>      Mesenchyme         <NA>           NA         NA
## Sample4         <NA>      Mesenchyme         <NA>           NA         NA
## Sample5         <NA>      Mesenchyme         <NA>           NA         NA
## Sample6         <NA>      Mesenchyme         <NA>           NA         NA
##         celltype.mapped.1 sample.1 ncells
## Sample1        Mesenchyme        5    151
## Sample2        Mesenchyme        6     28
## Sample3        Mesenchyme        7    127
## Sample4        Mesenchyme        8     75
## Sample5        Mesenchyme        9    239
## Sample6        Mesenchyme       10    146

A typical step is to discard low quality samples due to low sequenced library size. We discard these samples because they can affect further steps like normalization and/or DEGs analysis.

We can see that in our case we don’t have low quality samples and we don’t need to filter out any of them.

discarded <- current$ncells < 10
y <- y[,!discarded]
summary(discarded)
##    Mode   FALSE 
## logical       6

The same idea is typically applied to the genes, indeed we need to discard low expressed genes to improve accuracy for the DEGs modeling.

keep <- filterByExpr(y, group=current$tomato)
y <- y[keep,]
summary(keep)
##    Mode   FALSE    TRUE 
## logical    9121    4520

We can now proceed to normalize the data There are several approaches for normalizing bulk, and hence pseudo-bulk data. Here, we use the Trimmed Mean of M-values method, implemented in the edgeR package within the calcNormFactors function. Keep in mind that because we are going to normalize the pseudo-bulk counts, we don’t need to normalize the data in “single cell form”.

y <- calcNormFactors(y)
y$samples
##         group lib.size norm.factors batch cell barcode sample stage tomato pool
## Sample1     1  2478901    1.0506857     5 <NA>    <NA>      5  E8.5   TRUE    3
## Sample2     1   548407    1.0399112     6 <NA>    <NA>      6  E8.5  FALSE    3
## Sample3     1  1260187    0.9700083     7 <NA>    <NA>      7  E8.5   TRUE    4
## Sample4     1   578699    0.9871129     8 <NA>    <NA>      8  E8.5  FALSE    4
## Sample5     1  2092329    0.9695559     9 <NA>    <NA>      9  E8.5   TRUE    5
## Sample6     1   904929    0.9858611    10 <NA>    <NA>     10  E8.5  FALSE    5
##         stage.mapped celltype.mapped closest.cell doub.density sizeFactor
## Sample1         <NA>      Mesenchyme         <NA>           NA         NA
## Sample2         <NA>      Mesenchyme         <NA>           NA         NA
## Sample3         <NA>      Mesenchyme         <NA>           NA         NA
## Sample4         <NA>      Mesenchyme         <NA>           NA         NA
## Sample5         <NA>      Mesenchyme         <NA>           NA         NA
## Sample6         <NA>      Mesenchyme         <NA>           NA         NA
##         celltype.mapped.1 sample.1 ncells
## Sample1        Mesenchyme        5    151
## Sample2        Mesenchyme        6     28
## Sample3        Mesenchyme        7    127
## Sample4        Mesenchyme        8     75
## Sample5        Mesenchyme        9    239
## Sample6        Mesenchyme       10    146

To investigate the effect of our normalization, we use a Mean-Difference (MD) plot for each sample in order to detect possible normalization problems due to insufficient cells/reads/UMIs composing a particular pseudo-bulk profile.

In our case, we verify that all these plots are centered in 0 (on y-axis) and present a trumpet shape, as expected.

par(mfrow=c(2,3))
for (i in seq_len(ncol(y))) {
    plotMD(y, column=i)
}

Furthermore, we want to check if the samples cluster together based on their known factors (like the tomato injection in this case).

To do so, we use the MDS plot, which is very close to a PCA representation.

par(mfrow=c(1,1))
limma::plotMDS(cpm(y, log=TRUE), 
    col=ifelse(y$samples$tomato, "red", "blue"))

We then construct a design matrix by including both the pool and the tomato as factors. This design indicates which samples belong to which pool and condition, so we can use it in the next step of the analysis.

design <- model.matrix(~factor(pool) + factor(tomato), y$samples)
design
##         (Intercept) factor(pool)4 factor(pool)5 factor(tomato)TRUE
## Sample1           1             0             0                  1
## Sample2           1             0             0                  0
## Sample3           1             1             0                  1
## Sample4           1             1             0                  0
## Sample5           1             0             1                  1
## Sample6           1             0             1                  0
## attr(,"assign")
## [1] 0 1 1 2
## attr(,"contrasts")
## attr(,"contrasts")$`factor(pool)`
## [1] "contr.treatment"
## 
## attr(,"contrasts")$`factor(tomato)`
## [1] "contr.treatment"

Now we can estimate the Negative Binomial (NB) overdispersion parameter, to model the mean-variance trend.

y <- estimateDisp(y, design)
summary(y$trended.dispersion)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.009325 0.016271 0.024233 0.021603 0.026868 0.027993

We then fit a Quasi-Likelihood (QL) negative binomial generalized linear model for each gene. The robust=TRUE parameter avoids distortions from highly variable clusters. The QL method includes an additional dispersion parameter, useful to handle the uncertainty and variability of the per-gene variance, which is not well estimated by the NB dispersion, so the two dispersion types complement each other in the final analysis.

fit <- glmQLFit(y, design, robust=TRUE)
summary(fit$var.prior)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2977  0.6640  0.8275  0.7637  0.8798  0.9670
summary(fit$df.prior)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.323   8.177   8.177   8.111   8.177   8.177

We then use an empirical Bayes quasi-likelihood F-test to test for differential expression (due to tomato injection) per each gene at a False Discovery Rate (FDR) of 5%. The low amount of DGEs highlights that the tomato injection effect has a low influence on the mesenchyme cells.

res <- glmQLFTest(fit, coef=ncol(design))
summary(decideTests(res))
##        factor(tomato)TRUE
## Down                    5
## NotSig               4510
## Up                      5
topTags(res)
## Coefficient:  factor(tomato)TRUE 
##                         logFC   logCPM          F       PValue          FDR
## ENSMUSG00000010760 -4.1551264 9.973704 1112.14948 9.905998e-12 4.477511e-08
## ENSMUSG00000096768  1.9992920 8.844258  403.85294 1.594095e-09 3.602655e-06
## ENSMUSG00000035299  1.8001627 6.904163  123.52980 5.130084e-07 7.729327e-04
## ENSMUSG00000101609  1.3708397 7.310009   80.58075 3.745290e-06 4.232177e-03
## ENSMUSG00000019188 -1.0195649 7.545530   61.65538 1.248303e-05 1.128466e-02
## ENSMUSG00000024423  0.9946833 7.391075   58.34967 1.591674e-05 1.199061e-02
## ENSMUSG00000086503 -6.5155131 7.411257  159.33690 2.625600e-05 1.695388e-02
## ENSMUSG00000042607 -0.9567347 7.468203   45.42154 4.690293e-05 2.650016e-02
## ENSMUSG00000036446 -0.8305889 9.401028   42.72058 6.071290e-05 3.049137e-02
## ENSMUSG00000027520  1.5814592 6.952923   40.94715 7.775888e-05 3.514702e-02

All the previous steps can be easily performed with the following function for each cell type, thanks to the pseudoBulkDGE function in the scran package.

library(scran)
summed.filt <- summed[,summed$ncells >= 10]

de.results <- pseudoBulkDGE(summed.filt, 
    label=summed.filt$celltype.mapped,
    design=~factor(pool) + tomato,
    coef="tomatoTRUE",
    condition=summed.filt$tomato 
)

The returned object is a list of DataFrames each with the results for a cell type. Each of these contains also the intermediate results in edgeR format to perform any intermediate plot or diagnostic.

cur.results <- de.results[["Allantois"]]
cur.results[order(cur.results$PValue),]
## DataFrame with 13641 rows and 5 columns
##                        logFC    logCPM         F      PValue         FDR
##                    <numeric> <numeric> <numeric>   <numeric>   <numeric>
## ENSMUSG00000037664 -7.995130  11.55290  3180.990 7.35933e-22 3.09165e-18
## ENSMUSG00000010760 -2.574762  12.40592  1114.529 9.22901e-18 1.93855e-14
## ENSMUSG00000086503 -7.015319   7.49749   703.373 5.57372e-16 7.80507e-13
## ENSMUSG00000096768  1.828480   9.33239   304.769 8.39747e-13 8.81944e-10
## ENSMUSG00000022464  0.969837  10.28302   118.697 2.12502e-09 1.78544e-06
## ...                      ...       ...       ...         ...         ...
## ENSMUSG00000095247        NA        NA        NA          NA          NA
## ENSMUSG00000096808        NA        NA        NA          NA          NA
## ENSMUSG00000079808        NA        NA        NA          NA          NA
## ENSMUSG00000096730        NA        NA        NA          NA          NA
## ENSMUSG00000095742        NA        NA        NA          NA          NA

Differential Abundance

With DA we test for differences between clusters across conditions, to investigate which clusters change accordingly to the treatment (the tomato injection in our case).

We first setup some code and variables for further analysis, like quantifying the number of cells per each cell type and fit a model to catch differences between the injected cells and the background.

The performed steps are very similar to the ones for DEGs analysis, but this time we start our analysis on the computed abundances and without normalizing the data with TMM.

library(RColorBrewer)
n <- 68
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))

abundances <- table(merged$celltype.mapped, merged$sample) 
abundances <- unclass(abundances) 
abundances
##                                 
##                                    5  6   7   8   9  10
##   Allantois                       52  7  69  64 154 134
##   Blood progenitors 1              4  1   8   2   3   9
##   Blood progenitors 2             20  4  13   9  21  52
##   Cardiomyocytes                  41 13  44  15  96 101
##   Caudal epiblast                  2  0   0   0   8  18
##   Caudal Mesoderm                  4  4   6   3   4  10
##   Caudal neurectoderm              0  1   0   0   2   7
##   Def. endoderm                    5  2   7   5  14  11
##   Endothelium                     33  7  21  14  50  32
##   Erythroid1                      19  5  61  48  27  45
##   Erythroid2                      34 14 127 200  58 112
##   Erythroid3                     117 83 151 380 290 551
##   ExE ectoderm                     1  2   0  37   0  35
##   ExE endoderm                     0  1   0   1   0   3
##   ExE mesoderm                    62 23 102  75  93 155
##   Forebrain/Midbrain/Hindbrain   130 78  96  84 171 351
##   Gut                             25 29  48  40  98 113
##   Haematoendothelial progenitors  37 10  40  47  81  68
##   Intermediate mesoderm           31  7  28  26  28  87
##   Mesenchyme                     151 28 127  75 239 146
##   Mixed mesoderm                   0  0   0   0   1   2
##   Neural crest                    12 12  37  36  41 178
##   NMP                             63 30  40  34  37  92
##   Notochord                        0  0   0   2   2   2
##   Paraxial mesoderm               77 39  70  52  80 220
##   Parietal endoderm                0  2   0  29   0   3
##   PGC                              0  1   5   3   2   6
##   Pharyngeal mesoderm             58 16  84  76 136 169
##   Rostral neurectoderm            18  9   5   5  31  45
##   Somitic mesoderm                46 19  21  12  14  56
##   Spinal cord                     48 38  45  23  65 115
##   Stripped                        13  8   2   0   1   3
##   Surface ectoderm                45 19 113  55 178 268
##   Visceral endoderm                1  1   0   0   3   1
df <- as.data.frame(abundances)
df <- cbind(rowSums(df[,c(1, 3, 5)]), rowSums(df[, c(2, 4, 6)]))
colnames(df) <- c("tomato", "control")

# Create a df for barplot
ct <- rep(rownames(df), 2)
cond <- c(rep("tomato", dim(df)[1]), rep("control",dim(df)[1]))
value <- c(df[,1], df[,2])

df_barplot <- data.frame(ct, cond, value)


ggplot(df_barplot, aes(x=reorder(ct, -value), y=value, fill=cond))+ 
  geom_bar(width=.5, stat='identity', position=position_dodge(.7)) + 
    theme_classic() +theme(axis.text.x = element_text(angle = 75, vjust = 1, hjust=1))+
    labs(y="#Cells", x="")

# Barplot
ggplot(df_barplot, aes(y=value, x=cond, fill=ct, color=ct)) + 
  geom_bar(position="fill", stat="identity") + 
  scale_color_manual(values=col_vector)+
  scale_fill_manual(values=col_vector) +
    labs(y="Cell proportion(%)", x="", fill=NULL, color=NULL) +theme_classic() 

library(edgeR)
abundances <- table(merged$celltype.mapped, merged$sample) 
abundances <- unclass(abundances) 
# Attaching some column metadata.
extra.info <- colData(merged)[match(colnames(abundances), merged$sample),]
y.ab <- DGEList(abundances, samples=extra.info)
keep <- filterByExpr(y.ab, group=extra.info$tomato)
y.ab <- y.ab[keep,]
design <- model.matrix(~factor(pool) + factor(tomato), y.ab$samples)
y.ab <- estimateDisp(y.ab, design, trend="none")
fit.ab <- glmQLFit(y.ab, design, robust=TRUE, abundance.trend=FALSE)

Background on compositional effect

As mentioned before, in DA we don’t normalize our data with calcNormFactors function, because this approach considers that most of the input features do not vary between conditions. This cannot be applied to this analysis because we have a small number of cell populations that all can change due to the treatment. Leaving us to normalize only for library depth, which in pseudo-bulk data means by the total number of cells in each sample (cell type).

On the other hand, this can lead our data to be susceptible to compositional effect, that means that our conclusions can be biased by the amount of cells present in each cell type. And this amount of cells can be totally unbalanced between cell types.

For example, a specific cell type can be 40% of the total amount of cells present in the experiment, while another just the 3%. The differences in terms of abundance of these cell types are detected between the different conditions, but our final interpretation could be biased if we don’t consider this aspect.

We now look at one of the approaches for handling the compositional effect.

Assuming most labels do not change

We can use a similar approach used during the DEGs analysis, assuming that most labels are not changing, in particular if we think about the low number of DEGs resulted from the previous analysis.

To do so, we first normalize the data with calcNormFactors and then we fit and estimate a QL-model for our abundance data.

y.ab2 <- calcNormFactors(y.ab)
y.ab2$samples$norm.factors
## [1] 1.1510286 0.9866729 1.1042093 0.7722489 1.0457096 0.9874631

We then follow the already seen edgeR analysis steps.

y.ab2 <- estimateDisp(y.ab2, design, trend="none")
fit.ab2 <- glmQLFit(y.ab2, design, robust=TRUE, abundance.trend=FALSE)
res2 <- glmQLFTest(fit.ab2, coef=ncol(design))
summary(decideTests(res2))
##        factor(tomato)TRUE
## Down                    0
## NotSig                 21
## Up                      0
topTags(res2, n=10)
## Coefficient:  factor(tomato)TRUE 
##                                   logFC   logCPM        F      PValue
## Erythroid3                   -0.9956036 17.35279 8.611516 0.005394638
## Neural crest                 -1.0917248 14.85476 7.461485 0.009177889
## Mesenchyme                    0.8901738 16.31463 6.479534 0.014672428
## Endothelium                   0.7921383 14.11509 3.411336 0.071803077
## Erythroid2                   -0.5895162 15.96667 2.591693 0.114916989
## Cardiomyocytes                0.6200248 14.93268 2.545840 0.118083259
## Allantois                     0.5277110 15.54533 1.997632 0.164914229
## Forebrain/Midbrain/Hindbrain -0.4508956 16.56015 1.769367 0.190638748
## Paraxial mesoderm            -0.4472657 15.77907 1.591961 0.214008619
## NMP                          -0.4256713 15.07249 1.306089 0.259577002
##                                     FDR
## Erythroid3                   0.09636783
## Neural crest                 0.09636783
## Mesenchyme                   0.10270700
## Endothelium                  0.37696616
## Erythroid2                   0.41329141
## Cardiomyocytes               0.41329141
## Allantois                    0.49474269
## Forebrain/Midbrain/Hindbrain 0.49935344
## Paraxial mesoderm            0.49935344
## NMP                          0.54511170

Session Info

## R Under development (unstable) (2023-11-22 r85609)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] RColorBrewer_1.1-3           edgeR_4.1.2                 
##  [3] limma_3.59.1                 batchelor_1.19.0            
##  [5] scran_1.31.0                 scater_1.31.1               
##  [7] ggplot2_3.4.4                scuttle_1.13.0              
##  [9] MouseGastrulationData_1.17.0 SpatialExperiment_1.13.0    
## [11] SingleCellExperiment_1.25.0  SummarizedExperiment_1.33.0 
## [13] Biobase_2.63.0               GenomicRanges_1.55.1        
## [15] GenomeInfoDb_1.39.1          IRanges_2.37.0              
## [17] S4Vectors_0.41.2             BiocGenerics_0.49.1         
## [19] MatrixGenerics_1.15.0        matrixStats_1.1.0           
## [21] BiocStyle_2.31.0            
## 
## loaded via a namespace (and not attached):
##   [1] jsonlite_1.8.7                magrittr_2.0.3               
##   [3] ggbeeswarm_0.7.2              magick_2.8.1                 
##   [5] farver_2.1.1                  rmarkdown_2.25               
##   [7] fs_1.6.3                      zlibbioc_1.49.0              
##   [9] ragg_1.2.6                    vctrs_0.6.4                  
##  [11] memoise_2.0.1                 DelayedMatrixStats_1.25.1    
##  [13] RCurl_1.98-1.13               htmltools_0.5.7              
##  [15] S4Arrays_1.3.1                AnnotationHub_3.11.0         
##  [17] curl_5.1.0                    BiocNeighbors_1.21.0         
##  [19] SparseArray_1.3.1             sass_0.4.7                   
##  [21] bslib_0.6.1                   desc_1.4.2                   
##  [23] cachem_1.0.8                  ResidualMatrix_1.13.0        
##  [25] igraph_1.5.1                  mime_0.12                    
##  [27] lifecycle_1.0.4               pkgconfig_2.0.3              
##  [29] rsvd_1.0.5                    Matrix_1.6-3                 
##  [31] R6_2.5.1                      fastmap_1.1.1                
##  [33] GenomeInfoDbData_1.2.11       shiny_1.8.0                  
##  [35] digest_0.6.33                 colorspace_2.1-0             
##  [37] AnnotationDbi_1.65.2          rprojroot_2.0.4              
##  [39] dqrng_0.3.1                   irlba_2.3.5.1                
##  [41] ExperimentHub_2.11.0          textshaping_0.3.7            
##  [43] RSQLite_2.3.3                 beachmat_2.19.0              
##  [45] labeling_0.4.3                filelock_1.0.2               
##  [47] fansi_1.0.5                   httr_1.4.7                   
##  [49] abind_1.4-5                   compiler_4.4.0               
##  [51] bit64_4.0.5                   withr_2.5.2                  
##  [53] BiocParallel_1.37.0           viridis_0.6.4                
##  [55] DBI_1.1.3                     highr_0.10                   
##  [57] rappdirs_0.3.3                DelayedArray_0.29.0          
##  [59] rjson_0.2.21                  bluster_1.13.0               
##  [61] tools_4.4.0                   vipor_0.4.5                  
##  [63] beeswarm_0.4.0                interactiveDisplayBase_1.41.0
##  [65] httpuv_1.6.12                 glue_1.6.2                   
##  [67] promises_1.2.1                grid_4.4.0                   
##  [69] Rtsne_0.16                    cluster_2.1.5                
##  [71] generics_0.1.3                gtable_0.3.4                 
##  [73] metapod_1.11.0                BiocSingular_1.19.0          
##  [75] ScaledMatrix_1.11.0           utf8_1.2.4                   
##  [77] XVector_0.43.0                ggrepel_0.9.4                
##  [79] BiocVersion_3.19.1            pillar_1.9.0                 
##  [81] stringr_1.5.1                 BumpyMatrix_1.11.0           
##  [83] later_1.3.1                   splines_4.4.0                
##  [85] dplyr_1.1.4                   BiocFileCache_2.11.1         
##  [87] lattice_0.22-5                bit_4.0.5                    
##  [89] tidyselect_1.2.0              locfit_1.5-9.8               
##  [91] Biostrings_2.71.1             knitr_1.45                   
##  [93] gridExtra_2.3                 xfun_0.41                    
##  [95] statmod_1.5.0                 stringi_1.8.2                
##  [97] yaml_2.3.7                    evaluate_0.23                
##  [99] codetools_0.2-19              tibble_3.2.1                 
## [101] BiocManager_1.30.22           cli_3.6.1                    
## [103] xtable_1.8-4                  systemfonts_1.0.5            
## [105] munsell_0.5.0                 jquerylib_0.1.4              
## [107] Rcpp_1.0.11                   dbplyr_2.4.0                 
## [109] png_0.1-8                     parallel_4.4.0               
## [111] ellipsis_0.3.2                pkgdown_2.0.7                
## [113] blob_1.2.4                    sparseMatrixStats_1.15.0     
## [115] bitops_1.0-7                  viridisLite_0.4.2            
## [117] scales_1.3.0                  purrr_1.0.2                  
## [119] crayon_1.5.2                  rlang_1.1.2                  
## [121] formatR_1.14                  KEGGREST_1.43.0

Further Reading

Exercises

  • Test differential expressed genes with just 2 replicates per condition and look into the changes in the results and possible emerging issues.

  • Test differential expressed genes computed with muscat package and check for differences in the results.