## Dataset Breast_A details

Description:

Microarray data from Broad Institute “Cancer Program Data Sets” which was produced by (van’t Veer et al., 2002) (http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi) Array S54 was removed because it is an outlier.

Size:
1213 probesets x 97 experiments
Species:
Homo Sapiens
InputData:
Microarray, HG-U133 Plus
Density estimation

You can have different analysis over these datasets based on different preprocessor. This preprocesor stage is necessary to normalize the raw input matrix from the dataset.

For a given experiment each gene will work on different levels of gene expression, this makes the comparison among different genes impossible. Thus it is required to normalize the expression of each gene to make it comparable among them.
The normalization consists in applying the next formula where $$x_{ij}$$ is the expression of gene i at the experiment j, $$xn_i$$ is the normalized output:

• $$xn_{ij}=log(x_{ij})$$
• $$xn_{ij}=log(x_{ij}/\frac {1} {n}\sum _{j=1}^{n}x_{ij})$$
• $$xn_{ij}=log(x_{ij}/\sqrt [n] {\prod _{j=1}^{n}x_{ij}})$$
• $$xn_{ij}=log(x_{ij}/max_{j=1}^{n}(x_{ij}))$$
• $$xn_{ij}=(log(x_{i,j})-\overline{log(x_{i·})})/\sum _{j=1}^{n}log(x_{ij})·\sqrt {m}$$
• Mean 0 and var 1 in rows and columns of log(x_{ij})
• $$xn_{ij}=x_{ij}$$
• $$xn_{ij}=x_{ij}/\frac {1} {n}\sum _{j=1}^{n}x_{ij}$$
• $$xn_{ij}=x_{ij}/\sqrt [n] {\prod _{j=1}^{n}x_{ij}}$$
• $$xn_{ij}=x_{ij}/max_{j=1}^{n}(x_{ij})$$
• $$xn_{ij}=(x_{i,j}-\overline{x_{i·}})/\sum _{j=1}^{n}x_{ij}·\sqrt {m}$$
• Mean 0 and var 1 in rows and columns

For this dataset there is one analysis with the following preprocessor:

NameAlgorithmGene Expression TypeCreation Date
Grouped by class$$xn_{ij}=(x_{i,j}-\overline{x_{i·}})/\sum _{j=1}^{n}x_{ij}·\sqrt {m}$$Gene expresion value2015-08-16 12:00:30.0

