STdata package

The STdata package provides access to a collection of spatially resolved transcriptomics datasets, which have been formatted into the SpatialExperiment Bioconductor object class.

These datasets have been collected from various publicly available sources, and will cover several technological platforms.

We provide these datasets as SpatialExperiment objects to make them easier to access, so that we and others can use them for examples, demonstrations, tutorials, and other purposes.

Datasets

The current version of the package contains the following datasets, both from the 10x Genomics Visium platform.

The package is under active development, and we will add several additional datasets in the near future (see also Challenges discussed at Birds-of-a-Feather workshop).

  • mouse_coronal (10x Genomics Visium): a single coronal section from the mouse brain, spanning one hemisphere. Previously provided by 10x Genomics from their website.

  • human_DLPFC (10x Genomics Visium): a single sample (sample 151673) from our previously published dataset consisting of 12 samples from the human dorsolateral prefrontal cortex (DLPFC) in the human brain, from 3 neurotypical donors. The original dataset is available via the spatialLIBD Bioconductor package, and is described in our paper Maynard and Collado-Torres et al. 2020.

Install package

The latest versions of the SpatialExperiment and STdata packages can be installed from GitHub. These will be moved to Bioconductor (devel) in the near future.

In the code below, we have commented out the installation code, so that this is not automatically run when we compile the workshop materials. However, on your own system, you can un-comment these lines and run them to install the packages if necessary.

# install SpatialExperiment package (note "devel" branch)
# remotes::install_github("drighelli/SpatialExperiment", ref = "devel")

# install STdata package
# remotes::install_github("lmweber/STdata")

Load data

Next, we run a short example showing how to load one of the datasets from the STdata package. This will be loaded as a SpatialExperiment object. Then, we inspect the object dimensions and structure.

# load packages
library(SpatialExperiment)
library(STdata)
# load data object
spe <- load_data("mouse_coronal")
# inspect data object
spe
## class: SpatialExperiment 
## dim: 32285 2702 
## metadata(0):
## assays(1): counts
## rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000095019 ENSMUSG00000095041
## rowData names(3): gene_id gene_name feature_type
## colnames(2702): AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 ...
##   TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
## colData names(9): barcode_id array_row ... y_coord in_tissue
## reducedDimNames(0):
## altExpNames(0):
## spatialCoordsNames(3) : x_coord y_coord in_tissue
## inTissue(1): 2702
## imgData(6): sample_id image_id ... height scaleFactor
dim(spe)
## [1] 32285  2702
assayNames(spe)
## [1] "counts"
rowData(spe)
## DataFrame with 32285 rows and 3 columns
##                               gene_id   gene_name    feature_type
##                           <character> <character>     <character>
## ENSMUSG00000051951 ENSMUSG00000051951        Xkr4 Gene Expression
## ENSMUSG00000089699 ENSMUSG00000089699      Gm1992 Gene Expression
## ENSMUSG00000102331 ENSMUSG00000102331     Gm19938 Gene Expression
## ENSMUSG00000102343 ENSMUSG00000102343     Gm37381 Gene Expression
## ENSMUSG00000025900 ENSMUSG00000025900         Rp1 Gene Expression
## ...                               ...         ...             ...
## ENSMUSG00000095523 ENSMUSG00000095523  AC124606.1 Gene Expression
## ENSMUSG00000095475 ENSMUSG00000095475  AC133095.2 Gene Expression
## ENSMUSG00000094855 ENSMUSG00000094855  AC133095.1 Gene Expression
## ENSMUSG00000095019 ENSMUSG00000095019  AC234645.1 Gene Expression
## ENSMUSG00000095041 ENSMUSG00000095041  AC149090.1 Gene Expression
colData(spe)
## DataFrame with 2702 rows and 9 columns
##                            barcode_id array_row array_col pxl_col_in_fullres
##                           <character> <integer> <integer>          <integer>
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1        50       102               7237
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1         3        43               1611
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1        59        19               8315
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1        14        94               2927
## AAACCGGGTAGGTACC-1 AAACCGGGTAGGTACC-1        42        28               6280
## ...                               ...       ...       ...                ...
## TTGTTGTGTGTCAAGA-1 TTGTTGTGTGTCAAGA-1        31        77               4963
## TTGTTTCACATCCAGG-1 TTGTTTCACATCCAGG-1        58        42               8195
## TTGTTTCATTAGTCTA-1 TTGTTTCATTAGTCTA-1        60        30               8434
## TTGTTTCCATACAACT-1 TTGTTTCCATACAACT-1        45        27               6639
## TTGTTTGTGTAAATTC-1 TTGTTTGTGTAAATTC-1         7        51               2089
##                    pxl_row_in_fullres   sample_id   x_coord   y_coord in_tissue
##                             <integer> <character> <integer> <numeric> <logical>
## AAACAAGTATCTCCCA-1               8229    Sample01      8229      2755      TRUE
## AAACAATCTACTAGCA-1               4169    Sample01      4169      8381      TRUE
## AAACACCAATAACTGC-1               2518    Sample01      2518      1677      TRUE
## AAACAGAGCGACTCCT-1               7678    Sample01      7678      7065      TRUE
## AAACCGGGTAGGTACC-1               3137    Sample01      3137      3712      TRUE
## ...                               ...         ...       ...       ...       ...
## TTGTTGTGTGTCAAGA-1               6508    Sample01      6508      5029      TRUE
## TTGTTTCACATCCAGG-1               4100    Sample01      4100      1797      TRUE
## TTGTTTCATTAGTCTA-1               3275    Sample01      3275      1558      TRUE
## TTGTTTCCATACAACT-1               3068    Sample01      3068      3353      TRUE
## TTGTTTGTGTAAATTC-1               4719    Sample01      4719      7903      TRUE
## DataFrame with 2702 rows and 3 columns
##                      x_coord   y_coord in_tissue
##                    <integer> <numeric> <logical>
## AAACAAGTATCTCCCA-1      8229      2755      TRUE
## AAACAATCTACTAGCA-1      4169      8381      TRUE
## AAACACCAATAACTGC-1      2518      1677      TRUE
## AAACAGAGCGACTCCT-1      7678      7065      TRUE
## AAACCGGGTAGGTACC-1      3137      3712      TRUE
## ...                      ...       ...       ...
## TTGTTGTGTGTCAAGA-1      6508      5029      TRUE
## TTGTTTCACATCCAGG-1      4100      1797      TRUE
## TTGTTTCATTAGTCTA-1      3275      1558      TRUE
## TTGTTTCCATACAACT-1      3068      3353      TRUE
## TTGTTTGTGTAAATTC-1      4719      7903      TRUE
imgData(spe)
## DataFrame with 2 rows and 6 columns
##     sample_id    image_id   data     width    height scaleFactor
##   <character> <character> <list> <integer> <integer>   <numeric>
## 1    Sample01      lowres              576       600   0.0510334
## 2    Sample01       hires             1921      2000   0.1701114

Plot data

We can also create a simple plot of the dataset, to check that it has been loaded correctly. We can use plotting functions available in the ggspavis package.

The ggspavis package is also available from GitHub, so we install this first (if not already installed).

Then, we load the package, and create a plot.

# install ggspavis package
# remotes::install_github("lmweber/ggspavis")
# load package
library(ggspavis)
# create plot
plotSpots(spe)

Exercise

The above plot does not show much – only the layout of the Visium spots under tissue in this sample. This is useful as a check that the dataset has been loaded correctly, but is not a very interesting plot.

Next, we will create a more interesting plot showing ground truth cluster labels for a different dataset, and then continue with an exercise to modify this plot to display different information.

For this exercise, we will use the “human DLPFC” dataset (which contains ground truth cluster labels).

# load data object
spe <- load_data("human_DLPFC")
# create basic plot
plotSpots(spe)

# create plot showing ground truth cluster labels
plotSpots(spe, discrete = "ground_truth", palette = "libd_layer_colors")

For the exercise, try to modify this plot to show a continuous color gradient of total UMI counts per spot (instead of ground truth cluster labels).

Below are some hints for how to do this.

  • inspect the spe object to check how the data is stored (e.g. assays, row data, column data, spatial coordinates)
  • check the help page of the plotting function (with ?plotSpots) to check the required arguments
  • are the values you need (total counts per spot) stored somewhere in the object? e.g. row data or column data? if not, how could you calculate them and store them in the object, so that the plotting function can access them?