1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
"""All possible correlation methods."""
= 1
= 2
= 3
"""All possible p-values adjustment methods."""
= 1
= 2
= 3
:
:
:
:
:
:
"""
Represents a correlation analysis result. Includes Gene, GEM, CpG Site ID (if specified) correlation statistic, p-value and adjusted p-value.
:param gene: Gene name
:param gem: Gene Expression Modulator (GEM) name
:param cpg_site_id: CpG Site ID
:param correlation: Correlation statistic (Pearson, Spearman or Kendall, as selected)
:param p_value: P-value
:param adjusted_p_value: Adjusted p-value (Benjamini-Hochberg, Benjamini-Yekutieli or Bonferroni, as selected)
"""
=
=
=
=
=
=
"""
Computes the correlation between both mRNA and GEM files' rows.
:param gene_file_path: Gene file's path
:param gem_file_path: Gene Expression Modulator (GEM) file's path
:param correlation_method: Correlation method to compute (Spearman = 1, Kendall = 2 or Pearson = 3)
:param correlation_threshold: The threshold to discard all results whose correlation statistic values are below this value
:param sort_buf_size: Number of elements to sort by block in disk during p-value adjustment process. Greater blocks are faster but consume more memory
:param adjustment_method: P-value adjustment method (Benjamini-Hochberg = 1, Benjamini-Yekutieli = 2 or Bonferroni = 3)
:param is_all_vs_all: True if all Genes must be evaluated with all GEMs. Otherwise, only matching Genes/GEM will be evaluated (useful for CNA or Methylation analysis)
:param gem_contains_cpg: Set to True if your GEM data contains CpG Site IDs as the second column to preserve the GEM/CpG Site reference
:param collect_gem_dataset: True to make the GEM dataset available in memory. This has a HUGE impact in analysis performance. Specify a boolean value to force or use None to allocate in memory automatically when GEM dataset size is small (<= 100MB)
:param keep_top_n: Specify a number of results to keep or None to return all the resulting combinations
:return: A tuple with a vec of CorResult, the number of combinations before truncating by 'keep_top_n' parameter and the number of combinations evaluated
"""
...
"""Raises when a general error occurs, such as a read error, file does not exist, among others."""
...
"""Raises when the length of samples in both datasets are different."""
...
"""Raises when Samples in both datasets are different, but they have the same length (maybe they are in different order)."""
...
"""Raises when an invalid correlation method is provided. Only values 1 (Spearman), 2 (Kendall) or 3 (Pearson) are valid."""
...
"""Raises when an invalid adjustment method is provided. Only values 1 (Benjamini-Hochberg), 2 (Benjamini-Yekutieli) or 3 (Bonferroni) are valid."""
...