WebGeneKFCA

## Dataset Arabidopsis Selenate details

Description:

Gene expression in roots and shoots of plants grown on selenate. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9311

Size:
22810 probesets x 8 experiments
Species:
arabidopsis thaliana
InputData:
Microarray, ATH1-12501
Density estimation

You can have different analysis over these datasets based on different preprocessor. This preprocesor stage is necessary to normalize the raw input matrix from the dataset.

For a given experiment each gene will work on different levels of gene expression, this makes the comparison among different genes impossible. Thus it is required to normalize the expression of each gene to make it comparable among them.
The normalization consists in applying the next formula where $$x_{ij}$$ is the expression of gene i at the experiment j, $$xn_i$$ is the normalized output:

• $$xn_{ij}=log(x_{ij})$$
• $$xn_{ij}=log(x_{ij}/\frac {1} {n}\sum _{j=1}^{n}x_{ij})$$
• $$xn_{ij}=log(x_{ij}/\sqrt [n] {\prod _{j=1}^{n}x_{ij}})$$
• $$xn_{ij}=log(x_{ij}/max_{j=1}^{n}(x_{ij}))$$
• $$xn_{ij}=(log(x_{i,j})-\overline{log(x_{i·})})/\sum _{j=1}^{n}log(x_{ij})·\sqrt {m}$$
• Mean 0 and var 1 in rows and columns of log(x_{ij})
• $$xn_{ij}=x_{ij}$$
• $$xn_{ij}=x_{ij}/\frac {1} {n}\sum _{j=1}^{n}x_{ij}$$
• $$xn_{ij}=x_{ij}/\sqrt [n] {\prod _{j=1}^{n}x_{ij}}$$
• $$xn_{ij}=x_{ij}/max_{j=1}^{n}(x_{ij})$$
• $$xn_{ij}=(x_{i,j}-\overline{x_{i·}})/\sum _{j=1}^{n}x_{ij}·\sqrt {m}$$
• Mean 0 and var 1 in rows and columns

For this dataset there are 5 different analysis with different preprocessors.
Preprocessor 2$$xn_{ij}=log(x_{ij}/\sqrt [n] {\prod _{j=1}^{n}x_{ij}})$$Raw probeset value2014-07-06 13:05:23.0
Preprocessor 4$$xn_{ij}=log(x_{ij}/\sqrt [n] {\prod _{j=1}^{n}x_{ij}})$$Average2014-07-08 19:49:51.0
Preprocessor 1$$xn_{ij}=log(x_{ij}/\frac {1} {n}\sum _{j=1}^{n}x_{ij})$$Raw probeset value2013-05-31 14:22:02.0
Preprocessor 3$$xn_{ij}=log(x_{ij}/\frac {1} {n}\sum _{j=1}^{n}x_{ij})$$Average2014-07-06 13:09:45.0