Exploratory Analysis

Sample Groups

The specified traits were tested based on criteria for defining sample groups. The table below summarizes these traits.

Trait Number of groups
sampleGroup 13
TISSUE_TYPE 4
DONOR_SEX 2

Region Annotations

In addition to CpG sites, there are 7 sets of genomic regions to be covered in the analysis. The table below gives a summary of these annotations.

Annotation Description Regions in the Dataset
cpgislands

CpG island track of the UCSC Genome browser

25971
genes

Ensembl genes, version Ensembl Genes 78

51562
promoters

Promoter regions of Ensembl genes, version Ensembl Genes 78

55461
tiling

Genome tiling regions of length 5000

543013
tiling1kb n.a. 2590437
gencode22promoters

Reference annotation derived from Gencode 22 gene annotations. Gene annotation. Promotor annotation. Promoters are defined as 1500 bp upstream to 500 downstream of the TSS.

55818
ensembleRegBuildBPall

Ensembl Regulatory build from BLUEPRINT data release 20150820 -- all

471283

Region length distributions

The plots below show region size distributions for the region types above.

Region type

Figure 1

Open PDF Figure 1

Distribution of region lengths

Number of sites per region

The plots below show the distributions of the number of sites per region type.

Region type

Figure 2

Open PDF Figure 2

Distribution of the number of sites per region

Region site distributions

The plots below show distributions of sites across the different region types.

Region type

Figure 3

Open PDF Figure 3

Distribution of sites across regions. relative coordinates of 0 and 1 corresponds to the start and end coordinates of that region respectively. Coordinates smaller than 0 and greater than 1 denote flanking regions normalized by region length.

Analysis of Sample Replicates

Sample replicates were compared. This section shows pairwise scatterplots for each sample replicate group on both site and region level.

replicate
site/region

Figure 4

Figure 4

Scatterplot for replicate methylation comparison. The transparency corresponds to point density. The 1% of the points in the sparsest populated plot regions are drawn explicitly.

The following table contains pearson correlation coefficients:

sites cpgislands genes promoters tiling tiling1kb gencode22promoters ensembleRegBuildBPall
AML_BM_B_UMCG_2007_229 vs. AML_BM_B_UMCG_2009_052 (AML_BM) 0.8086 0.9306 0.8819 0.9499 0.8398 0.7092 0.9499 0.7796
AML_BM_B_UMCG_2007_229 vs. AML_BM_B_UMCG_2012_082 (AML_BM) 0.8442 0.9161 0.9072 0.9556 0.8737 0.7735 0.9558 0.8488
AML_BM_B_UMCG_2009_052 vs. AML_BM_B_UMCG_2012_082 (AML_BM) 0.7857 0.8513 0.8676 0.9271 0.866 0.7294 0.9272 0.7717
AML_PBMC_V_UMCG_2003_180 vs. AML_PBMC_V_UMCG_2004_152 (AML_PBMC) 0.859 0.9557 0.9244 0.97 0.9073 0.7844 0.97 0.8245
AML_PBMC_V_UMCG_2003_180 vs. AML_PBMC_V_UMCG_2007_282 (AML_PBMC) 0.8529 0.9702 0.916 0.9698 0.8499 0.7419 0.9699 0.8298
AML_PBMC_V_UMCG_2004_152 vs. AML_PBMC_V_UMCG_2007_282 (AML_PBMC) 0.8257 0.9578 0.8975 0.9609 0.7982 0.6765 0.961 0.7924
Bcell_mem_V_C003N3 vs. Bcell_mem_V_NC11.41 (Bcell_mem) 0.9073 0.9801 0.9227 0.9634 0.876 0.8161 0.9634 0.8656
Bcell_mem_V_C003N3 vs. Bcell_mem_V_S004P1 (Bcell_mem) 0.9399 0.9947 0.9648 0.9858 0.9397 0.8637 0.9858 0.9162
Bcell_mem_V_NC11.41 vs. Bcell_mem_V_S004P1 (Bcell_mem) 0.9398 0.9878 0.952 0.9769 0.9264 0.8695 0.977 0.913
Bcell_naive_V_NC11.41 vs. Bcell_naive_V_S001JP (Bcell_naive) 0.9654 0.9954 0.9829 0.9937 0.977 0.929 0.9937 0.957
Bcell_naive_V_NC11.41 vs. Bcell_naive_V_S00DM8 (Bcell_naive) 0.9637 0.9956 0.9829 0.9939 0.9744 0.9196 0.994 0.952
Bcell_naive_V_NC11.41 vs. Bcell_naive_C_C003K9 (Bcell_naive) 0.9562 0.9905 0.9787 0.9923 0.9739 0.9217 0.9922 0.9433
Bcell_naive_V_NC11.41 vs. Bcell_naive_C_C0068L (Bcell_naive) 0.9576 0.9908 0.9794 0.9927 0.9756 0.9268 0.9927 0.9479
Bcell_naive_V_S001JP vs. Bcell_naive_V_S00DM8 (Bcell_naive) 0.9545 0.9951 0.98 0.9929 0.9706 0.9035 0.9929 0.9437
Bcell_naive_V_S001JP vs. Bcell_naive_C_C003K9 (Bcell_naive) 0.9512 0.9903 0.9766 0.9915 0.9708 0.9073 0.9915 0.9346
Bcell_naive_V_S001JP vs. Bcell_naive_C_C0068L (Bcell_naive) 0.9519 0.9903 0.9773 0.9915 0.9713 0.9097 0.9915 0.937
Bcell_naive_V_S00DM8 vs. Bcell_naive_C_C003K9 (Bcell_naive) 0.9466 0.9907 0.9745 0.9911 0.9646 0.8912 0.9911 0.9281
Bcell_naive_V_S00DM8 vs. Bcell_naive_C_C0068L (Bcell_naive) 0.9471 0.9911 0.9745 0.9908 0.9645 0.8924 0.9909 0.9304
Bcell_naive_C_C003K9 vs. Bcell_naive_C_C0068L (Bcell_naive) 0.9645 0.9955 0.9845 0.9952 0.9806 0.9355 0.9953 0.9639
DC_C_S00CP6 vs. DC_C_S00D71 (DC) 0.9594 0.9963 0.9826 0.9936 0.9743 0.9113 0.9936 0.9475
Endo_prol_C_S00BJM vs. Endo_prol_C_S00DCS (Endo_prol) 0.9449 0.9963 0.9748 0.9904 0.9579 0.8821 0.9904 0.9321
Endo_rest_C_S00BJM vs. Endo_rest_C_S00DCS (Endo_rest) 0.9402 0.9961 0.9723 0.9882 0.9502 0.8656 0.9882 0.9233
Eryth_C_S002R5 vs. Eryth_C_S002S3 (Eryth) 0.9207 0.9963 0.9641 0.9868 0.943 0.8264 0.9868 0.9047
Leuk_myel_B_pz284_ATRA vs. Leuk_myel_B_pz284_CTR (Leuk_myel) 0.9628 0.9986 0.9818 0.993 0.969 0.9137 0.993 0.9551
Leuk_myel_B_pz284_ATRA vs. Leuk_myel_B_pz289_CTR (Leuk_myel) 0.8864 0.9776 0.9358 0.9697 0.8687 0.7629 0.9697 0.8434
Leuk_myel_B_pz284_ATRA vs. Leuk_myel_B_pz290_CTR (Leuk_myel) 0.8347 0.9618 0.9103 0.9647 0.8542 0.7281 0.9648 0.8072
Leuk_myel_B_pz284_CTR vs. Leuk_myel_B_pz289_CTR (Leuk_myel) 0.8898 0.9771 0.9374 0.9705 0.875 0.7688 0.9705 0.8467
Leuk_myel_B_pz284_CTR vs. Leuk_myel_B_pz290_CTR (Leuk_myel) 0.838 0.9624 0.9105 0.9651 0.8559 0.7293 0.9652 0.8079
Leuk_myel_B_pz289_CTR vs. Leuk_myel_B_pz290_CTR (Leuk_myel) 0.8452 0.9572 0.9068 0.9571 0.8173 0.6863 0.9572 0.7841
Mac_f0_V_C005VG vs. Mac_f0_V_N000314138902_t6BG (Mac_f0) 0.9528 0.9938 0.9807 0.9928 0.9714 0.9073 0.9929 0.9444
Mac_f0_V_C005VG vs. Mac_f0_V_N000314138902_t6 (Mac_f0) 0.9488 0.9938 0.9789 0.9921 0.9633 0.8924 0.9921 0.9375
Mac_f0_V_C005VG vs. Mac_f0_V_S001S7 (Mac_f0) 0.9468 0.9941 0.9785 0.9922 0.9681 0.9004 0.9922 0.9414
Mac_f0_V_C005VG vs. Mac_f0_V_S0022I (Mac_f0) 0.9498 0.9941 0.9793 0.9927 0.9697 0.9015 0.9926 0.9423
Mac_f0_V_C005VG vs. Mac_f0_V_S00390 (Mac_f0) 0.9477 0.9945 0.9784 0.9923 0.9669 0.8979 0.9923 0.9413
Mac_f0_V_C005VG vs. Mac_f0_C_S00BHQ (Mac_f0) 0.9415 0.9901 0.9732 0.9894 0.9601 0.8772 0.9895 0.9195
Mac_f0_V_C005VG vs. Mac_f0_C_S00DVR (Mac_f0) 0.9353 0.99 0.9678 0.9868 0.9512 0.8568 0.9868 0.9065
Mac_f0_V_N000314138902_t6BG vs. Mac_f0_V_N000314138902_t6 (Mac_f0) 0.9639 0.9991 0.9853 0.9948 0.9727 0.9153 0.9948 0.9556
Mac_f0_V_N000314138902_t6BG vs. Mac_f0_V_S001S7 (Mac_f0) 0.954 0.9958 0.9815 0.9937 0.9723 0.9145 0.9937 0.9509
Mac_f0_V_N000314138902_t6BG vs. Mac_f0_V_S0022I (Mac_f0) 0.9577 0.9962 0.9825 0.9942 0.9743 0.9169 0.9941 0.9533
Mac_f0_V_N000314138902_t6BG vs. Mac_f0_V_S00390 (Mac_f0) 0.955 0.9959 0.9811 0.9936 0.9712 0.9126 0.9937 0.9515
Mac_f0_V_N000314138902_t6BG vs. Mac_f0_C_S00BHQ (Mac_f0) 0.9489 0.9929 0.9755 0.9902 0.9613 0.8846 0.9902 0.9228
Mac_f0_V_N000314138902_t6BG vs. Mac_f0_C_S00DVR (Mac_f0) 0.9413 0.9918 0.9681 0.9866 0.9507 0.8602 0.9866 0.905
Mac_f0_V_N000314138902_t6 vs. Mac_f0_V_S001S7 (Mac_f0) 0.9499 0.9957 0.9783 0.9921 0.9618 0.8937 0.9921 0.9398
Mac_f0_V_N000314138902_t6 vs. Mac_f0_V_S0022I (Mac_f0) 0.9536 0.9961 0.9798 0.9929 0.9646 0.898 0.993 0.9433
Mac_f0_V_N000314138902_t6 vs. Mac_f0_V_S00390 (Mac_f0) 0.9508 0.9959 0.9784 0.9923 0.9612 0.8929 0.9924 0.9411
Mac_f0_V_N000314138902_t6 vs. Mac_f0_C_S00BHQ (Mac_f0) 0.9454 0.9927 0.9752 0.9904 0.9564 0.8771 0.9903 0.9242
Mac_f0_V_N000314138902_t6 vs. Mac_f0_C_S00DVR (Mac_f0) 0.9382 0.9916 0.9685 0.9872 0.9475 0.8554 0.9872 0.9084
Mac_f0_V_S001S7 vs. Mac_f0_V_S0022I (Mac_f0) 0.9528 0.9954 0.981 0.9936 0.9735 0.9119 0.9936 0.9506
Mac_f0_V_S001S7 vs. Mac_f0_V_S00390 (Mac_f0) 0.9506 0.9956 0.9807 0.9933 0.9729 0.9101 0.9934 0.9496
Mac_f0_V_S001S7 vs. Mac_f0_C_S00BHQ (Mac_f0) 0.9434 0.9919 0.9732 0.9893 0.9602 0.8791 0.9893 0.9195
Mac_f0_V_S001S7 vs. Mac_f0_C_S00DVR (Mac_f0) 0.9367 0.9913 0.9666 0.9861 0.9506 0.8564 0.9861 0.9042
Mac_f0_V_S0022I vs. Mac_f0_V_S00390 (Mac_f0) 0.9538 0.9958 0.9819 0.994 0.9735 0.9112 0.994 0.9514
Mac_f0_V_S0022I vs. Mac_f0_C_S00BHQ (Mac_f0) 0.9466 0.9925 0.9747 0.9899 0.9628 0.8828 0.9899 0.9235
Mac_f0_V_S0022I vs. Mac_f0_C_S00DVR (Mac_f0) 0.9399 0.9917 0.9685 0.987 0.953 0.8605 0.987 0.9071
Mac_f0_V_S00390 vs. Mac_f0_C_S00BHQ (Mac_f0) 0.9446 0.9918 0.9739 0.9895 0.9612 0.8799 0.9894 0.9219
Mac_f0_V_S00390 vs. Mac_f0_C_S00DVR (Mac_f0) 0.9386 0.9918 0.9684 0.9868 0.9525 0.8585 0.9868 0.9072
Mac_f0_C_S00BHQ vs. Mac_f0_C_S00DVR (Mac_f0) 0.9463 0.9961 0.9768 0.9907 0.9621 0.875 0.9907 0.9292
Mac_f1_V_S001MJ vs. Mac_f1_V_S001S7 (Mac_f1) 0.9513 0.9952 0.9806 0.9933 0.9719 0.9088 0.9933 0.9479
Mac_f1_V_S001MJ vs. Mac_f1_V_S0022I (Mac_f1) 0.9521 0.996 0.9807 0.9937 0.9725 0.9086 0.9937 0.9484
Mac_f1_V_S001MJ vs. Mac_f1_V_S00H6O (Mac_f1) 0.956 0.9959 0.9823 0.9941 0.974 0.9154 0.9941 0.9523
Mac_f1_V_S001MJ vs. Mac_f1_C_S0018A (Mac_f1) 0.9445 0.9924 0.974 0.9901 0.9623 0.8821 0.9901 0.9235
Mac_f1_V_S001MJ vs. Mac_f1_C_S007SK (Mac_f1) 0.9484 0.9927 0.9762 0.9916 0.9678 0.8945 0.9916 0.9325
Mac_f1_V_S001S7 vs. Mac_f1_V_S0022I (Mac_f1) 0.9529 0.9953 0.981 0.9939 0.9749 0.9183 0.9939 0.9545
Mac_f1_V_S001S7 vs. Mac_f1_V_S00H6O (Mac_f1) 0.9567 0.9953 0.9834 0.9945 0.9773 0.9262 0.9945 0.9592
Mac_f1_V_S001S7 vs. Mac_f1_C_S0018A (Mac_f1) 0.9449 0.9914 0.9736 0.9895 0.9606 0.8834 0.9895 0.922
Mac_f1_V_S001S7 vs. Mac_f1_C_S007SK (Mac_f1) 0.9489 0.992 0.976 0.9911 0.9666 0.8965 0.9912 0.9318
Mac_f1_V_S0022I vs. Mac_f1_V_S00H6O (Mac_f1) 0.9573 0.9954 0.9827 0.9944 0.9768 0.9239 0.9944 0.9577
Mac_f1_V_S0022I vs. Mac_f1_C_S0018A (Mac_f1) 0.9465 0.9921 0.9743 0.9904 0.9634 0.8857 0.9904 0.9241
Mac_f1_V_S0022I vs. Mac_f1_C_S007SK (Mac_f1) 0.9498 0.9924 0.9768 0.9917 0.968 0.8974 0.9917 0.9332
Mac_f1_V_S00H6O vs. Mac_f1_C_S0018A (Mac_f1) 0.9491 0.9917 0.9744 0.9899 0.9627 0.8888 0.9899 0.924
Mac_f1_V_S00H6O vs. Mac_f1_C_S007SK (Mac_f1) 0.9531 0.9926 0.9772 0.9918 0.9687 0.9024 0.9918 0.9344
Mac_f1_C_S0018A vs. Mac_f1_C_S007SK (Mac_f1) 0.9583 0.9962 0.9814 0.9935 0.9722 0.904 0.9936 0.9467
Mac_f2_V_S00622 vs. Mac_f2_V_S006VI (Mac_f2) 0.909 0.9934 0.9613 0.9886 0.9482 0.8376 0.9885 0.9116
Mac_f2_V_S00622 vs. Mac_f2_V_S00BS4 (Mac_f2) 0.9455 0.9939 0.9782 0.9921 0.9678 0.901 0.9921 0.9431
Mac_f2_V_S00622 vs. Mac_f2_V_S00FTN (Mac_f2) 0.9475 0.9939 0.9791 0.9933 0.9714 0.9129 0.9933 0.9486
Mac_f2_V_S00622 vs. Mac_f2_C_S00C1H (Mac_f2) 0.9329 0.9906 0.9668 0.9873 0.9527 0.8633 0.9874 0.9049
Mac_f2_V_S006VI vs. Mac_f2_V_S00BS4 (Mac_f2) 0.9216 0.9957 0.9664 0.9898 0.9533 0.8472 0.9898 0.9178
Mac_f2_V_S006VI vs. Mac_f2_V_S00FTN (Mac_f2) 0.9234 0.9955 0.9676 0.9905 0.9556 0.856 0.9905 0.924
Mac_f2_V_S006VI vs. Mac_f2_C_S00C1H (Mac_f2) 0.918 0.9927 0.9644 0.988 0.9481 0.8364 0.988 0.9026
Mac_f2_V_S00BS4 vs. Mac_f2_V_S00FTN (Mac_f2) 0.9579 0.9963 0.9828 0.994 0.9744 0.9154 0.994 0.9514
Mac_f2_V_S00BS4 vs. Mac_f2_C_S00C1H (Mac_f2) 0.9456 0.9927 0.9733 0.9893 0.9608 0.8749 0.9893 0.9161
Mac_f2_V_S00FTN vs. Mac_f2_C_S00C1H (Mac_f2) 0.9495 0.9936 0.9753 0.9903 0.9631 0.8866 0.9903 0.9242
Mono_V_C000S5 vs. Mono_V_C0010K (Mono) 0.9696 0.9952 0.9883 0.9962 0.9835 0.9531 0.9963 0.9735
Mono_V_C000S5 vs. Mono_V_C001UY (Mono) 0.9738 0.9959 0.9896 0.9965 0.9857 0.9574 0.9966 0.9763
Mono_V_C000S5 vs. Mono_V_C004SQ (Mono) 0.9754 0.996 0.9898 0.9966 0.9861 0.9599 0.9966 0.9772
Mono_V_C000S5 vs. Mono_V_N000314138902_t0 (Mono) 0.9631 0.9961 0.9844 0.9944 0.9763 0.9262 0.9944 0.9567
Mono_V_C000S5 vs. Mono_V_S007G7 (Mono) 0.9053 0.988 0.9453 0.978 0.934 0.824 0.9781 0.8608
Mono_V_C000S5 vs. Mono_C_C005PS (Mono) 0.9687 0.9925 0.9839 0.9944 0.98 0.9478 0.9944 0.9613
Mono_V_C000S5 vs. Mono_C_S000RD (Mono) 0.9644 0.9914 0.979 0.9937 0.9794 0.9435 0.9937 0.9588
Mono_V_C0010K vs. Mono_V_C001UY (Mono) 0.9709 0.9956 0.9887 0.9963 0.9835 0.9534 0.9963 0.9739
Mono_V_C0010K vs. Mono_V_C004SQ (Mono) 0.9726 0.9958 0.9891 0.9963 0.9842 0.9563 0.9963 0.9751
Mono_V_C0010K vs. Mono_V_N000314138902_t0 (Mono) 0.9603 0.9958 0.9835 0.9939 0.974 0.9217 0.994 0.9548
Mono_V_C0010K vs. Mono_V_S007G7 (Mono) 0.9002 0.9877 0.9425 0.9771 0.9296 0.8156 0.9772 0.8536
Mono_V_C0010K vs. Mono_C_C005PS (Mono) 0.9655 0.9921 0.9828 0.9938 0.9764 0.9428 0.9938 0.9596
Mono_V_C0010K vs. Mono_C_S000RD (Mono) 0.9615 0.9906 0.9781 0.993 0.9772 0.9402 0.993 0.9574
Mono_V_C001UY vs. Mono_V_C004SQ (Mono) 0.9776 0.9961 0.9908 0.9968 0.9878 0.9631 0.9968 0.979
Mono_V_C001UY vs. Mono_V_N000314138902_t0 (Mono) 0.9655 0.9959 0.985 0.9945 0.9776 0.9295 0.9946 0.9592
Mono_V_C001UY vs. Mono_V_S007G7 (Mono) 0.9077 0.9883 0.9452 0.9782 0.9356 0.8248 0.9783 0.8618
Mono_V_C001UY vs. Mono_C_C005PS (Mono) 0.9708 0.9922 0.985 0.9945 0.9818 0.9506 0.9945 0.9626
Mono_V_C001UY vs. Mono_C_S000RD (Mono) 0.966 0.991 0.9798 0.9936 0.9807 0.9455 0.9935 0.9595
Mono_V_C004SQ vs. Mono_V_N000314138902_t0 (Mono) 0.9679 0.9964 0.986 0.9949 0.9788 0.9328 0.995 0.9611
Mono_V_C004SQ vs. Mono_V_S007G7 (Mono) 0.907 0.9876 0.9451 0.9783 0.9346 0.8244 0.9783 0.8595
Mono_V_C004SQ vs. Mono_C_C005PS (Mono) 0.973 0.9929 0.9867 0.9949 0.9828 0.9543 0.9949 0.9644
Mono_V_C004SQ vs. Mono_C_S000RD (Mono) 0.9679 0.9915 0.9809 0.9939 0.9814 0.9485 0.9939 0.9609
Mono_V_N000314138902_t0 vs. Mono_V_S007G7 (Mono) 0.9038 0.9884 0.9466 0.9788 0.9346 0.8152 0.9788 0.863
Mono_V_N000314138902_t0 vs. Mono_C_C005PS (Mono) 0.9614 0.9928 0.9811 0.993 0.9734 0.9214 0.9931 0.9462
Mono_V_N000314138902_t0 vs. Mono_C_S000RD (Mono) 0.9562 0.9913 0.9761 0.9922 0.9721 0.9161 0.9922 0.9428
Mono_V_S007G7 vs. Mono_C_C005PS (Mono) 0.9097 0.983 0.9462 0.9786 0.937 0.8286 0.9786 0.8631
Mono_V_S007G7 vs. Mono_C_S000RD (Mono) 0.9015 0.9819 0.9391 0.9767 0.9335 0.8188 0.9767 0.8557
Mono_C_C005PS vs. Mono_C_S000RD (Mono) 0.978 0.9957 0.9873 0.9966 0.9878 0.9658 0.9965 0.9808
NK_V_C002CT vs. NK_V_C006G5 (NK) 0.9539 0.9958 0.982 0.9946 0.975 0.9171 0.9946 0.956
Neut_mat_V_C000S5 vs. Neut_mat_V_C0010K (Neut_mat) 0.9678 0.9947 0.9875 0.9959 0.9835 0.9547 0.9959 0.9741
Neut_mat_V_C000S5 vs. Neut_mat_V_C0011I (Neut_mat) 0.9672 0.9951 0.9864 0.9955 0.9815 0.9472 0.9955 0.9702
Neut_mat_V_C000S5 vs. Neut_mat_V_C001UY (Neut_mat) 0.9711 0.9955 0.9875 0.996 0.9839 0.9546 0.996 0.9742
Neut_mat_V_C000S5 vs. Neut_mat_C_C00184 (Neut_mat) 0.9579 0.992 0.9786 0.9923 0.9699 0.9213 0.9923 0.9436
Neut_mat_V_C000S5 vs. Neut_mat_C_C004GD (Neut_mat) 0.9611 0.9914 0.9809 0.9933 0.9759 0.9363 0.9933 0.9537
Neut_mat_V_C0010K vs. Neut_mat_V_C0011I (Neut_mat) 0.969 0.9958 0.9875 0.996 0.9826 0.9486 0.996 0.9713
Neut_mat_V_C0010K vs. Neut_mat_V_C001UY (Neut_mat) 0.9725 0.9956 0.9887 0.9963 0.9844 0.9552 0.9963 0.9747
Neut_mat_V_C0010K vs. Neut_mat_C_C00184 (Neut_mat) 0.96 0.9918 0.9801 0.9925 0.9706 0.9222 0.9925 0.9453
Neut_mat_V_C0010K vs. Neut_mat_C_C004GD (Neut_mat) 0.9635 0.992 0.9824 0.9937 0.9766 0.9376 0.9937 0.9557
Neut_mat_V_C0011I vs. Neut_mat_V_C001UY (Neut_mat) 0.9724 0.9957 0.9882 0.9962 0.9844 0.9529 0.9962 0.9738
Neut_mat_V_C0011I vs. Neut_mat_C_C00184 (Neut_mat) 0.9603 0.9922 0.9805 0.9929 0.9735 0.9248 0.9929 0.9474
Neut_mat_V_C0011I vs. Neut_mat_C_C004GD (Neut_mat) 0.9635 0.9933 0.9824 0.994 0.9779 0.9367 0.994 0.9556
Neut_mat_V_C001UY vs. Neut_mat_C_C00184 (Neut_mat) 0.9641 0.9923 0.9813 0.9931 0.9745 0.9294 0.9931 0.949
Neut_mat_V_C001UY vs. Neut_mat_C_C004GD (Neut_mat) 0.9668 0.9918 0.9837 0.994 0.9791 0.9425 0.994 0.9581
Neut_mat_C_C00184 vs. Neut_mat_C_C004GD (Neut_mat) 0.9733 0.996 0.9879 0.9956 0.9828 0.9465 0.9957 0.9698
Plas_B_MO7071 vs. Plas_B_V156 (Plas) 0.9487 0.992 0.9659 0.9837 0.9635 0.9429 0.9837 0.9554
TCD4_V_S007DD vs. TCD4_V_S008H1 (TCD4) 0.9613 0.9961 0.983 0.993 0.9723 0.9035 0.993 0.9446
TCD4_V_S007DD vs. TCD4_V_S009W4 (TCD4) 0.9617 0.9967 0.9827 0.9936 0.9743 0.9068 0.9936 0.9463
TCD4_V_S008H1 vs. TCD4_V_S009W4 (TCD4) 0.9618 0.9964 0.9825 0.9931 0.9734 0.9052 0.9931 0.9457
TCD8_V_C00256 vs. TCD8_V_C003VO (TCD8) 0.9579 0.9959 0.9829 0.995 0.9769 0.9263 0.995 0.9613
TCD8_V_C00256 vs. TCD8_C_C0066P (TCD8) 0.9421 0.9903 0.9742 0.9912 0.9623 0.8959 0.9912 0.9366
TCD8_V_C00256 vs. TCD8_C_S00C2F (TCD8) 0.9465 0.9914 0.9714 0.9884 0.9549 0.8769 0.9885 0.92
TCD8_V_C003VO vs. TCD8_C_C0066P (TCD8) 0.9457 0.9893 0.9761 0.9916 0.9703 0.9076 0.9916 0.9395
TCD8_V_C003VO vs. TCD8_C_S00C2F (TCD8) 0.9501 0.99 0.9735 0.9891 0.9629 0.8891 0.9891 0.9232
TCD8_C_C0066P vs. TCD8_C_S00C2F (TCD8) 0.9592 0.9961 0.9805 0.9923 0.9722 0.9086 0.9923 0.9425

Low-dimensional Representation

Dimension reduction is used to visually inspect the dataset for a strong signal in the methylation values that is related to samples' clinical or batch processing annotation. RnBeads implements two methods for dimension reduction - principal component analysis (PCA) and multidimensional scaling (MDS).

One or more of the methylation matrices was augmented before applying the dimension reduction techniques because it contains missing values. The column Missing lists the number of dimensions ignored due to missing values. In the case of MDS, dimensions are ignored only if they contain missing values for all samples. In contrast, sites or regions with missing values in any sample are ignored prior to PCA.

Sites/regions Technique Dimensions Missing Selected
sites MDS 23607313 0 23607313
sites PCA 23607313 12137673 11469640
cpgislands MDS 25971 0 25971
cpgislands PCA 25971 1559 24412
genes MDS 51562 0 51562
genes PCA 51562 6540 45022
promoters MDS 55461 0 55461
promoters PCA 55461 4214 51247
tiling MDS 543013 0 543013
tiling PCA 543013 32272 510741
tiling1kb MDS 2590437 0 2590437
tiling1kb PCA 2590437 571989 2018448
gencode22promoters MDS 55818 0 55818
gencode22promoters PCA 55818 4225 51593
ensembleRegBuildBPall MDS 471283 0 471283
ensembleRegBuildBPall PCA 471283 99209 372074

Multidimensional Scaling

The scatter plot below visualizes the samples transformed into a two-dimensional space using MDS.

Location type
Distance
Sample representation
Sample color

Figure 5

Open PDF Figure 5

Scatter plot showing samples after performing Kruskal's non-metric mutidimensional scaling.

Principal Component Analysis

Similarly, the figure below shows the values of selected principal components in a scatter plot.

Location type
Principal components
Sample representation
Sample color

Figure 6

Open PDF Figure 6

Scatter plot showing the samples' coordinates on principal components.

The figure below shows the cumulative distribution functions of variance explained by the principal components.

Location type

Figure 7

Open PDF Figure 7

Cumulative distribution function of percentange of variance explained.

The table below gives for each location type a number of principal components that explain at least 95 percent of the total variance. The full tables of variances explained by all components are available in comma-separated values files accompanying this report.

Location Type Number of Components Full Table File
sites 57 csv
cpgislands 31 csv
genes 42 csv
promoters 43 csv
tiling 30 csv
tiling1kb 48 csv
gencode22promoters 43 csv
ensembleRegBuildBPall 49 csv

Batch Effects

In this section, different properties of the dataset are tested for significant associations. The properties can include sample coordinates in the principal component space, phenotype traits and intensities of control probes. The tests used to calculate a p-value given two properties depend on the essence of the data:

Note that the p-values presented in this report are not corrected for multiple testing.

Associations between Principal Components and Traits

The computed sample coordinates in the principal component space were tested for association with the specified traits. Below is a list of the traits and the tests performed.

Trait Test
sampleGroup Kruskal-Wallis
TISSUE_TYPE Kruskal-Wallis
DONOR_SEX Wilcoxon

The heatmap below summarizes the results of permutation tests performed for associations. Significant p-values (values less than 0.01) are displayed in pink background.

Region type

Figure 8

Open PDF Figure 8

Heatmap presenting a table of p-values. Significant p-values (less than 0.01) are printed in pink boxes. Non-significant values are represented by blue boxes. Bright grey cells, if present, denote missing values.

The full tables of p-values for each location type are available in CSV (comma-separated value) files below.

Location Type File Name
sites csv
cpgislands csv
genes csv
promoters csv
tiling csv
tiling1kb csv
gencode22promoters csv
ensembleRegBuildBPall csv

Associations between Traits

This section summarizes the associations between pairs of traits.

The figure below visualizes the tests that were performed on trait pairs based on the description provided above. In addition, the calculated p-values for associations between traits are shown. Significant p-values (values less than 0.01) are displayed in pink background. The full table of p-values is available in a dedicated file that accompanies this report.

Heatmap of

Figure 9

Open PDF Figure 9

(1) Table of performed tests on pairs of traits. Test names (Correlation + permutation test, Fisher's exact test, Wilcoxon rank sum test and/or Kruskal-Wallis one-way analysis of variance) are color-coded according to the legend given above.
(2) Table of resulting p-values from the performed tests on pairs of traits. Significant p-values (less than 0.01) are printed in pink boxes Non-significant values are represented by blue boxes. White cells, if present, denote missing values.

Methylation Value Distributions

Methylation value distributions were assessed based on selected sample groups. This was done on site and region levels. This section contains the generated density plots.

Methylation Value Densities of Sample Groups

The plots below compare the distributions of methylation values in different sample groups, as defined by the traits listed above.

Sample trait
Methylation of

Figure 10

Open PDF Figure 10

Beta value density estimation according to sample grouping.

Methylation Value Densities of Site Categories

In a similar fashion, the plot below compares the distributions of beta values in different site types.

Sample group
Site category

Figure 11

Open PDF Figure 11

Methylation value density estimation according to sample grouping and site category.

Clustering

The figure below shows clustering of samples using several algorithms and distance metrics.

Site/region level
Dissimilarity metric
Agglomeration strategy (linkage)
Sample color based on

Figure 12

Open PDF Figure 12

Hierarchical clustering of samples based on all methylation values. The heatmap displays methylation percentiles per sample. The legend for sample coloring can be found in the figure below.

Site/region level
Dissimilarity metric
Agglomeration strategy (linkage)
Sample color based on
Site/region color based on
Visualize

Figure 13

Open PDF Figure 13

Hierarchical clustering of samples based on all methylation values. The heatmap displays only selected sites/regions with the highest variance across all samples. The legend for locus and sample coloring can be found in the figure below.

Site/region level
Sample color based on
Site/region color based on

Figure 14

Open PDF Figure 14

Probe and sample colors used in the heatmaps in the previous figures.

Identified Clusters

Using the average silhouette value as a measure of cluster assignment [1], it is possible to infer the number of clusters produced by each of the studied methods. The figure below shows the corresponding mean silhouette value for every observed separation into clusters.

Site/region level
Dissimilarity metric

Figure 15

Open PDF Figure 15

Line plot visualizing mean silhouette values of the clustering algorithm outcomes for each applicable value of K (number of clusters).

The table below summarizes the number of clusters identified by the algorithms.

Site/region level

Metric Algorithm Clusters
correlation-based hierarchical (average linkage) 3
correlation-based hierarchical (complete linkage) 3
correlation-based hierarchical (median linkage) 2
Manhattan distance hierarchical (average linkage) 2
Manhattan distance hierarchical (complete linkage) 2
Manhattan distance hierarchical (median linkage) 2
Euclidean distance hierarchical (average linkage) 2
Euclidean distance hierarchical (complete linkage) 2
Euclidean distance hierarchical (median linkage) 2

Clusters and Traits

The figure below shows associations between clusterings and the examined traits. Associations are quantified using the adjusted Rand index [2]. Rand indices near 1 indicate high agreement while values close to -1 indicate seperation. The full table of all computed indices is stored in the following comma separated files:

Site/region level
Dissimilarity metric

Figure 16

Open PDF Figure 16

Heatmap visualizing Rand indices computed between sample traits (rows) and clustering algorithm outcomes (columns).

References

  1. Rousseeuw, P. J. (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65
  2. Hubert, L. and Arabie, P. (1985) Comparing partitions. Journal of Classification, 2(1), 193-218