Stem cell project, web app

This project integrates the expression profiles from 73 differentiation-affecting knock-out mutants and WT (RC9) in mouse embryonic stem cells. Measurements were taken in two conditions after 24h: 2i medium preserving the naive state of cells, and differentiating medium (N24).

Our goals are to:

Identify mechanisms of differentiation delay
Find new key regulators of pluripotency
Verify findings using targeted double knock-outs to affect a more precise and severe differentiation delay

The shiny web app integrates multiple visualizations and summary statistics to make all information, from the gene level to network modules / pathways / GO terms, directly available for exploratory analysis.

The main part of the app consists of two different parts.

The left side consists of different panels and has 3 different tabs to browse through:
- Genes
- Pre-computed clusters
- Custom genes
The right side consists of a t-sne plot which can be used for KO selection and is always shown

The content and functions in the different parts of the app are described in the following sections.

1 Genes

Display information on differential expression of single genes.

1.1 Table: single gene statistics

option	description
gene	Name of the gene
avgRPKM	Average RPKM across all conditions (RC9 and KOs in 2i and N24)
p.2i	Adjusted p-value of the changes in 2i (each KO vs. wild type)
p.N24	Adjusted p-value of the changes in N24 (each KO vs. wild type)
logFC.2i	Log-fold-change in 2i
logFC.N24	Log-fold-change in N24
naive-association	(marker converse R2) fraction of variance in the gene’s expression that is explained by naive marker expression. -> How tightly associated is the expression of this gene to the core pluripotency markers (Nanog, Esrrb, Tbx3, Tfcp2l1, Klf4, Prdm14, Zfp42)? Note: this measure is static for a given gene, independent of selected samples.
zX.N24*	Z-value (or mean of z-values) of this gene’s expression across all knockouts in N24. A high magnitude z-value indicates that the respective gene shows extreme behavior in the selected sample(s) compared to all other samples.
zX.2i*	Z-value (or mean of z-values) of this gene’s expression across all knockouts in 2i. A high magnitude z-value indicates that the respective gene shows extreme behavior in the selected sample(s) compared to all other samples.

*only visible if KO samples have been selected and committed.

Important note:

If multiple knockout samples are selected and committed, all p-values, log-fold-changes and z-scores are calculated as the mean of selected samples.

1.2 Filters and filter presets

Filters may be set on all columns of the table. For numeric fields, a range of values can be set in the header field by slider, or by manual input, e.g.: “0…0.1”

Presets can be set via check-boxes above the table. These are: Show only genes significant (adj. p-value <= 0.1) for N24 changes & 2i changes Show genes tightly associated with marker genes, i.e. require naive-association >= 0.65

1.3 Visualization options

If no gene is selected, the avg. change of naive marker genes in WT is mapped to the sample T-sne map. If a gene is selected in the table, additional attributes may be mapped:

option	description
avg. naive marger changes (N24)	Average Log-fold-change of naive marker genes vs wild type in N24 for every KO
N24 vs 2i (logFC)	Log-fold-change between 2i and N24 conditionsin wild type
N24 vs RC9/N24 (logFC)	Log-fold-change KO vs wild type in N24
2i vs. RC9/2i type	Log-fold-change KO vs wild type in 2i

1.4 Gene details

If a gene has been selected in the table, a plot of log-expression values of this gene in all committed samples and the wild type is shown below. Additionally, the full annotation associated to this gene as used in the analysis (GO, reactome) is available in two collapsible panes.

2 Pre-computed clusters

We carried out cluster analysis on specific subsets of the N24 knockout response. Results of the cluster analysis are accessible in the “Pre-computed clusters” tab. The clustering, encompassing the constitutive knockout response clusters and inducedN24 clusters, can be selected by drop down menu (“Select a cluster”). This will initialize a heatmap showing the mean of KON24 vs. RC9N24 log-fold-changes for each cluster and knockout, along with the corresponding naive marker log-fold-changes (”Heatmap, mean logFC of clusters”). A cluster is then selected for further inspection.

Cluster averaged log-fold-changes, or their corresponding z-values from row-normalization, are visualized on the knockout t-sne map.
The results of a GO enrichment analysis of the cluster genes are displayed in the main table.
Via the “go to cluster genes” button it is furthermore possible to switch to the “Genes” tab and inspect, sort, and filter all cluster genes.

3 Select custom geneset

In this tab you can analyze genes of interest by defining a custom geneset. The left column has the input field where gene names (mgi symbols) of can be added. Genes are separated by new lines otherwise they are not recognized as multiple genes. The right column contains the ‘Map custom genes’ button which checks for the occurrence of the custom genes in the KO data and the time course data. Direct feedback of mapped genes and genes that were not found shows up in the right column after clicking the ‘Map custom genes’ button.

The mapped genes are visualized in three different panes that open upon clicking on their headers:

3.1 Heatmap KOs:

After opening this pane a heatmap consisting of three sections will be shown.

commited KOs from the t-sne plot (top row)
log2FCs of mapped custom genes (middle section)
log2FCs of naive marker genes (bottom section)

The heatmap contains log2FCs between each KO (columns) and RC9. Depending on which of the fields in the top left corner is selected the log2FCs show the comparison of KO vs RC9 either at N24 or 2i. A tick box on the top either orders the KOs by naive marker expression (if selected) or by clustering of mapped genes over the KOs. A slider at the bottom provides the option to adjust the scaling of the color space.

3.2 Heatmap Time Course:

The Heatmap in this pane shows the changes of expression over time in relation to 2i (naive state). The order of the columns (time points) can not be changed and as they are ordered by time. The selection fields at the top change how the expression over time is visualized. Here you can either select log2FCs, scaled log2FCs, TPMs, scaled TPMs or log10 TPMs. A slider at the bottom again gives the option to adjust scaling of the shown color space.

3.3 Time course kinetics:

The last pane contains a visualization of the original data points and the results after applying Gaussian process regression for the time course analysis. Each gene is shown in one plot and the plots are positioned on a grid. The number of columns in the grid can be adjusted by the slider on the top (1 to 5 columns). Each plot shows TPMs of original measurements (black dots) and the results from the Gaussian process regression (red line).

Note: If more than 100 custom genes were mapped in the first place this plot is not shown. If you want to plot more than 100 genes please split up the data in corresponding batches and repeat the plot.

4 Knockout sample map and clusters

The right column of the app contains additional panes for selection and visualization of data.

The header area field will show the current selection of knockout samples, if any.

Knockout samples may be selected in the knockout sample map
To compute summary statistics, the selection needs to be committed (using the ”Commit selection button”). Currently committed knockouts will be shown in the header of the left column.

4.1 T-sne projection of knock-out samples

The Knock-out sample t-sne pane serves as the central area to select single knockout samples or clusters of knockouts on which summary statistics (both gene-level and geneset-level) are calculated.

T-sne is a state-of-the-art visualization technique for high-dimensional data. It allows the placement of complex gene expression profiles in 2D, similar to PCA. However, in contrast to PCA, relative distances between points do not have a trivial interpretation and should be ignored. Instead, t-sne creates a non-linear embedding of the neighborhood of each point (expression profile), i.e. it preserves neighborhoods of similar knockout gene expression profiles.

At present, there are four subsets of genes that the t-sne projection may be calculated from:

Naive markers: group knockouts according to the expression patterns of the seven naive pluripotency markers Tbx3, Nanog, Klf4, Tfc2l1, Zfp42, Esrrb, and Prdm14.
N24 (diff.): group knockouts according to the N24 condition expression patterns of genes that are differentiation-associated (significantly changed in the wild type between 2i and N24).
2i: group knockouts based on their expression patterns in the 2i condition.
N24 & 2i: group knockouts based on their combined expression patterns in both conditions.

5 How do different parts of the app interact

There are certain actions that trigger interaction between different panes or different fields in the app. Some of those actions are already described in previous sections but will be mentioned here as well. The connection between different tabs and panels helps to analyze specific genes or KOs.

values to map to t-sne:

This option is found on the “Genes” tab and allows the user to change the colors mapped to the samples in the t-sne plot. The standard option selected is the average change of naive marker genes. But there are three alternative options when a gene is selected in the main table of the “Genes” tab. Here the user can chose to either visualize the WT change of this gene in the different samples (N24 vs 2i) or the change between the corresponding KOs ant the WT (either in 2i or N24).

selecting a gene:

Selecting a gene from the main table in the “Genes” tab will allow to select different options for colors n the t-sne plot. Additionally a plot in the left bottom of the “Genes” tab shows the change of this gene in WT and commited KOs. The panes “GO annotations” and “Reactome pathways” will show GO Terms and Reactome pathways that include the corresponding gene.

commit selection:

This button is used to commit a selection of KOs from the t-sne plot. This will have an effect on different panels:

The fold changes ad pvals from the main table of the “Genes” tab change to the average of all selected KOs (no selection will return the average of all KOs)
The heatmaps in the “Pre-computed clusters” tab as well as the heatmap showing the KO data in the “Select custom geneset” will highlight the selected KOs in bright blue.

go to cluster genes:

This button in the “Pre-computed clusters” tab will take the user back to the “Genes” tab and show a table that only contains the genes from the previously chosen cluster.

go to custom genes:

Takes the user to the “Genes” tab and shows a table that only contains mapped custom genes.

GO annotations (click to expand)

Reactome pathways (click to expand)

Heatmap, mean logFC of clusters

Heatmap, cluster genes

Heatmap KOs

Heatmap Time Course

Time course kinetics

Knock-out sample t-sne