metaclean3.segments¶
Module Contents¶
Functions¶
|
|
|
Get ranges (quantiles) for reference values. |
|
Get ranges (quantiles) for test values. |
|
Test whether test ranges fall within reference ranges. |
|
Test whether test and reference values have a similar distribution. |
|
Extracts values for reference segment. |
|
Tests whether reference segment is similar to its adjacent segment(s). |
|
Extracts values for reference segment; this function is an alternative to |
|
Merge segments that contain values that are not significantly different |
|
Merge segments with minimum gain at their changepoints until one segment is |
|
Merge segments if their quantiles are within a range of each other. |
|
Check if there are sporadic segments within the reference segments that |
|
Same as refine_reference_segment() but intended for when there are |
|
Check ends of the reference segment to see if there are any sporacidity. |
-
metaclean3.segments._check_chpts0M_segments(chpts0M=
None, segments=None, r=None, rl=None)¶
-
metaclean3.segments.get_ref_ranges(ref_values: list, ignore_no: int =
5, percent_diff: float =0.05, percent_shift: float =0.1)¶ Get ranges (quantiles) for reference values. Intended for internal use; made public for external QA.
- Args:
- ref_values (list): A list of numeric vector(s) that the
function will create ranges for.
ignore_no (int, optional): See in_percentiles(). Defaults to 5. percent_diff (float, optional): See in_percentiles(). Defaults to 0.05. percent_shift (float, optional): See in_percentiles(). Defaults to 0.1.
- Returns:
- tuple:
- list: a list of four floats containing upper and lower bounds of
maximum and minimum ranges.
- list: A list of floats representing the shift for each element in
ref_values.
-
metaclean3.segments.get_test_ranges(ref_values: int, test_value: list | numpy.ndarray | pandas.Series, small_no_per_seg: int =
50, percent_diff: float =0.05, percent_shift: float =0.1, lenient: bool =True)¶ Get ranges (quantiles) for test values. Intended for internal use only; made public for external QA.
- Args:
- ref_lens (int): The maximum length of value vectors in the
reference segment; most of the time, this is the length of the reference segment, but an arbitrary length can be given.
- test_value (list | numpy.ndarray | pandas.Series): A numeric vector that
the function will create ranges for.
- small_no_per_seg (int, optional): Vectors with a smaller length than
this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.
percent_diff (float, optional): See in_percentiles(). Defaults to 0.05. percent_shift (float, optional): See in_percentiles(). Defaults to 0.1. lenient (bool, optional): See in_percentiles(). Defaults to True.
- Returns:
- list: a list of four floats containing upper and lower bounds of
maximum and minimum ranges.
- metaclean3.segments.apply_ranges(ref_range: list | numpy.ndarray, test_range: list | numpy.ndarray)¶
Test whether test ranges fall within reference ranges. Intended for internal use only; made public for external QA.
- Args:
ref_range (list | numpy.ndarray): A reference ranges vector of length 4. test_range (list | numpy.ndarray): A test ranges vector of length 4.
- Returns:
bool: Whether or not test ranges fall within reference ranges.
-
metaclean3.segments.in_percentiles(ref_values: list, test_value: list | numpy.ndarray | pandas.Series, check_mean: bool =
False, check_var: bool =False, switch: bool =True, ignore_no: int =5, small_no_per_seg: int =50, percent_diff: float =0.05, percent_shift: float =0.1, lenient: bool =True)¶ Test whether test and reference values have a similar distribution. Intended for internal use only; made public for external QA.
- Args:
ref_values (list): A vector of reference values. test_value (list | numpy.ndarray | pandas.Series): A vector of test
values.
- check_mean (bool, optional): Whether or not to check similarity of
reference and test means. Defaults to False.
- check_var (bool, optional): Whether or not to check similarity of
reference and test variance. Defaults to False.
- switch (bool, optional): Whether or not to check similarity of reference
and test mean and variance if test ranges do not fall within reference ranges. Defaults to True.
- ignore_no (int, optional): Indicates the minimum length of an acceptable
vector. Defaults to 5.
- small_no_per_seg (int, optional): Vectors with a smaller length than
this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.
- percent_diff (float, optional): Quantile to be calculated from 0 and 1.
Defaults to 0.05.
- percent_shift (float, optional): Percent above and below mean of values.
Defaults to 0.1.
- lenient (bool, optional): If test_value is a small vector, users can
choose whether or not the ranges should be lenient or not. Defaults to True.
- Returns:
bool: Indicates whether or not test ranges are within reference ranges.
-
metaclean3.segments.get_ref_updown(r: int, chpts0M: list | numpy.ndarray | pandas.Series, vlist: list, ref_percents: list =
[1, 0.5], ref_percent: float =0.4)¶ Extracts values for reference segment. Intended for internal use only; made public for external QA.
- Args:
r (int): Index of the reference segment in chpts0M. chpts0M (list | numpy.ndarray | pandas.Series): A vector containing the
first index of each segment.
vlist (list): List of numeric vector of values. ref_percents (list, optional): List of percent of reference values
next to the test values to extract. Defaults to [1, 0.5].
- ref_percent (float, optional): Percent of length of vlist;
a minimum of this value and the length of reference values is taken as another alternative to how many reference next to the test values to extract. Defaults to 0.4.
- Returns:
list: List of reference values extracted.
-
metaclean3.segments.go_updown(r: int, chpts0M: list | numpy.ndarray, lim: int, refs: list, vlist: list, ip_func: collections.abc.Callable =
in_percentiles, exception_no: int =2, ignore_no: int =5)¶ Tests whether reference segment is similar to its adjacent segment(s). Intended for internal use only; made public for external QA.
- Args:
r (int): Index of the reference segment in chpts0M. chpts0M (list | numpy.ndarray | pandas.Series): A vector containing the
first index of each segment.
- lim (int): Upper limit for the index of segments to test i.e. the index
of the end or the next reference segment to test .
refs (list): List of reference vectors. vlist (list): List of all value vectors. ip_func (Callable, optional): Function used to determine whether test
and reference values are similar. Defaults to in_percentiles().
- exception_no (int, optional): Users can choose to check test values
exception_no segments away from reference segment (i.e. non-adjacent test segments). Defaults to 2.
ignore_no (int, optional): See in_percentiles(). Defaults to 5.
- Returns:
- tuple:
- bool: Whether test values and adjacent reference values have similar
distribution.
- numpy.ndarray: Indices of segments that were skipped in between test
and similar reference segment.
-
metaclean3.segments.get_ref_func2(vlist: list, ref_inds: numpy.ndarray, test_i1: int, test_i2: int, small_no_per_seg: int =
50, small_ref_prop: float =0.5, sorted_index: numpy.ndarray | None =None)¶ Extracts values for reference segment; this function is an alternative to get_ref_updown() for when the user wants to subset values in the reference segment that are closest to some test segment. The reference segment indices do not need to be continuous. Intended for internal use only; made public for external QA.
- Args:
vlist (list): List of value vectors. ref_inds (numpy.ndarray): Indices of reference values. test_i1 (int): Starting index of test segment. test_i2 (int): Ending index of test segment. small_no_per_seg (int, optional): Vectors with a smaller length than
this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.
- small_ref_prop (float, optional): Proportion of reference segment to
test. Defaults to 0.5.
- sorted_index (numpy.ndarray | None, optional): Indices sorted based on
segment quantiles; see utils.order_value_bins(). Defaults to None.
- Returns:
list: List of reference value vectors.
-
metaclean3.segments.merge_segments_test(values: list | numpy.ndarray | pandas.Series, segments: list | numpy.ndarray | pandas.Series, merge_signif_test: Literal[t] | Literal[wilcox] =
'wilcox', p_thres: float =0.05, ignore_no: int =5, strict: bool =True)¶ Merge segments that contain values that are not significantly different based on some significance test. Intended for internal use only; made public for external QA.
- Args:
values (list | numpy.ndarray | pandas.Series): Vector of values. segments (list | numpy.ndarray | pandas.Series): Segment label vector. merge_signif_test (str, optional): Type of significance test to used
i.e. ‘t’ (T-test) or ‘wilcox’ (Wilcoxan). Defaults to ‘wilcox’.
p_thres (float, optional): P-value threshold. Defaults to 0.05. ignore_no (int, optional): See in_percentiles(). Defaults to 5. strict (bool, optional): If set to True, function will only test
adjacent segments; otherwise, . Defaults to True.
- Returns:
numpy.ndarray: an updated version of segments after merging.
-
metaclean3.segments.merge_segments_gain(values: list | numpy.ndarray | pandas.Series, segments: list | numpy.ndarray, cost_model: str =
'rank', binseg_jump: int =2, binseg_min_size: int =2, knee: bool =False, min_ref_percent: float =0.4)¶ Merge segments with minimum gain at their changepoints until one segment is at least min_ref_percent percent of the maximum length. Intended for internal use only; made public for external QA.
- Args:
values (list | numpy.ndarray | pandas.Series): Vector of values. segments (list | numpy.ndarray | pandas.Series): Segment label vector. cost_model (str, optional): Type of cost calculation method to use;
see ruptures.Binseg. Defaults to ‘rank’.
- binseg_jump (int, optional): See argument jump in ruptures.Binseg.
Defaults to 2.
- binseg_min_size (int, optional): See argument min_size in
ruptures.Binseg. Defaults to 2.
- knee (bool, optional): Use gain value knee as stopping criteria. For
testing purposes only, do not change. Defaults to False.
- min_ref_percent (float, optional): Stopping criteria; indicates the
minimum length (proportion of total) of any segment. Defaults to 0.4.
- Returns:
numpy.ndarray: an updated version of segments after merging segments.
-
metaclean3.segments.merge_segments_quant(vlist: list, segments: list | numpy.ndarray | pandas.Series, ip_func: collections.abc.Callable =
in_percentiles, get_ref_fun: collections.abc.Callable =get_ref_updown, small_no_per_seg: int =50, keep_skipped: str ='all')¶ Merge segments if their quantiles are within a range of each other. Intended for internal use only; made public for external QA.
- Args:
vlist (list): List of value vectors. segments (list | numpy.ndarray | pandas.Series): Segment label vector. ip_func (Callable, optional): Function used to determine whether test
and reference values are similar. Defaults to in_percentiles().
- get_ref_fun (Callable, optional): A function that extracts a list of
reference values. Defaults to get_ref_updown().
- small_no_per_seg (int, optional): Vectors with a smaller length than
this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.
- keep_skipped (str, optional): Indicates whether to keep all, some, or no
skipped segments i.e. ‘all’, ‘some’, ‘none’. Defaults to ‘all’.
- Returns:
numpy.ndarray: an updated version of segments after merging segments.
-
metaclean3.segments.refine_reference_segment(vlist: list, segments: list | numpy.ndarray, rl: int | None =
None, sorted_index: numpy.ndarray | None =None, ip_func: collections.abc.Callable =in_percentiles, small_no_per_seg: int =50, overlap_no: int =0, split_by_segment: bool =True, multi_refine: bool =True)¶ Check if there are sporadic segments within the reference segments that should be merged with the reference segment. Intended for internal use only; made public for external QA.
- Args:
vlist (list): List of value vectors. segments (list | numpy.ndarray | pandas.Series): Segment label vector. rl (int | None, optional): Reference segment label; if set to None,
sets the reference segment as the largest segment. Defaults to None.
- sorted_index (numpy.ndarray | None, optional): Indices sorted based on
segment quantiles; see utils.order_value_bins(). Defaults to None.
- ip_func (Callable, optional): Function used to determine whether test
and reference values are similar. Defaults to in_percentiles().
- small_no_per_seg (int, optional): Vectors with a smaller length than
this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.
- overlap_no (int, optional): Numer of values allowed to overlap.
Defaults to 0.
- split_by_segment (bool, optional): If there are multiple segments
in between the reference segment, choose whether to test them separately. Defaults to True.
- multi_refine (bool, optional): Optional, if set to True, function will
refine repeatedly. Defaults to True.
- Returns:
numpy.ndarray: an updated version of segments after merging segments.
-
metaclean3.segments.refine_reference_segment_inbetween(segments: list | numpy.ndarray, rl: int | None =
None, small_no_per_seg: int =50)¶ Same as refine_reference_segment() but intended for when there are sporadic parts of the reference segment between other segments. Intended for internal use only; made public for external QA.
- Args:
segments (list | numpy.ndarray | pandas.Series): Segment label vector. rl (int | None, optional): Reference segment label; if set to None,
sets the reference segment as the largest segment. Defaults to None.
- small_no_per_seg (int, optional): Vectors with a smaller length than
this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.
- Returns:
numpy.ndarray: an updated version of segments after merging segments.
-
metaclean3.segments.remove_reference_ends(values: list | numpy.ndarray | pandas.Series, segments: list | numpy.ndarray | pandas.Series, rl: int | None =
None, ip_func: collections.abc.Callable =in_percentiles, small_no_per_seg: int =50)¶ Check ends of the reference segment to see if there are any sporacidity. Merge ends if applicable. Intended for internal use only; made public for external QA.
- Args:
values (list | numpy.ndarray | pandas.Series): Vector of values. segments (list | numpy.ndarray | pandas.Series): Segment label vector. rl (int | None, optional): Reference segment label; if set to None,
sets the reference segment as the largest segment. Defaults to None.
- ip_func (Callable, optional): Function used to determine whether test
and reference values are similar. Defaults to in_percentiles().
- small_no_per_seg (int, optional): Vectors with a smaller length than
this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.
- Returns:
numpy.ndarray: an updated version of segments after merging segments.