metaclean3.segments

Module Contents

Functions

_check_chpts0M_segments([chpts0M, segments, r, rl])

get_ref_ranges(ref_values[, ignore_no, percent_diff, ...])

Get ranges (quantiles) for reference values.

get_test_ranges(ref_values, test_value[, ...])

Get ranges (quantiles) for test values.

apply_ranges(ref_range, test_range)

Test whether test ranges fall within reference ranges.

in_percentiles(ref_values, test_value[, check_mean, ...])

Test whether test and reference values have a similar distribution.

get_ref_updown(r, chpts0M, vlist[, ref_percents, ...])

Extracts values for reference segment.

go_updown(r, chpts0M, lim, refs, vlist[, ip_func, ...])

Tests whether reference segment is similar to its adjacent segment(s).

get_ref_func2(vlist, ref_inds, test_i1, test_i2[, ...])

Extracts values for reference segment; this function is an alternative to

merge_segments_test(values, segments[, ...])

Merge segments that contain values that are not significantly different

merge_segments_gain(values, segments[, cost_model, ...])

Merge segments with minimum gain at their changepoints until one segment is

merge_segments_quant(vlist, segments[, ip_func, ...])

Merge segments if their quantiles are within a range of each other.

refine_reference_segment(vlist, segments[, rl, ...])

Check if there are sporadic segments within the reference segments that

refine_reference_segment_inbetween(segments[, rl, ...])

Same as refine_reference_segment() but intended for when there are

remove_reference_ends(values, segments[, rl, ip_func, ...])

Check ends of the reference segment to see if there are any sporacidity.

metaclean3.segments._check_chpts0M_segments(chpts0M=None, segments=None, r=None, rl=None)
metaclean3.segments.get_ref_ranges(ref_values: list, ignore_no: int = 5, percent_diff: float = 0.05, percent_shift: float = 0.1)

Get ranges (quantiles) for reference values. Intended for internal use; made public for external QA.

Args:
ref_values (list): A list of numeric vector(s) that the

function will create ranges for.

ignore_no (int, optional): See in_percentiles(). Defaults to 5. percent_diff (float, optional): See in_percentiles(). Defaults to 0.05. percent_shift (float, optional): See in_percentiles(). Defaults to 0.1.

Returns:
tuple:
list: a list of four floats containing upper and lower bounds of

maximum and minimum ranges.

list: A list of floats representing the shift for each element in

ref_values.

metaclean3.segments.get_test_ranges(ref_values: int, test_value: list | numpy.ndarray | pandas.Series, small_no_per_seg: int = 50, percent_diff: float = 0.05, percent_shift: float = 0.1, lenient: bool = True)

Get ranges (quantiles) for test values. Intended for internal use only; made public for external QA.

Args:
ref_lens (int): The maximum length of value vectors in the

reference segment; most of the time, this is the length of the reference segment, but an arbitrary length can be given.

test_value (list | numpy.ndarray | pandas.Series): A numeric vector that

the function will create ranges for.

small_no_per_seg (int, optional): Vectors with a smaller length than

this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.

percent_diff (float, optional): See in_percentiles(). Defaults to 0.05. percent_shift (float, optional): See in_percentiles(). Defaults to 0.1. lenient (bool, optional): See in_percentiles(). Defaults to True.

Returns:
list: a list of four floats containing upper and lower bounds of

maximum and minimum ranges.

metaclean3.segments.apply_ranges(ref_range: list | numpy.ndarray, test_range: list | numpy.ndarray)

Test whether test ranges fall within reference ranges. Intended for internal use only; made public for external QA.

Args:

ref_range (list | numpy.ndarray): A reference ranges vector of length 4. test_range (list | numpy.ndarray): A test ranges vector of length 4.

Returns:

bool: Whether or not test ranges fall within reference ranges.

metaclean3.segments.in_percentiles(ref_values: list, test_value: list | numpy.ndarray | pandas.Series, check_mean: bool = False, check_var: bool = False, switch: bool = True, ignore_no: int = 5, small_no_per_seg: int = 50, percent_diff: float = 0.05, percent_shift: float = 0.1, lenient: bool = True)

Test whether test and reference values have a similar distribution. Intended for internal use only; made public for external QA.

Args:

ref_values (list): A vector of reference values. test_value (list | numpy.ndarray | pandas.Series): A vector of test

values.

check_mean (bool, optional): Whether or not to check similarity of

reference and test means. Defaults to False.

check_var (bool, optional): Whether or not to check similarity of

reference and test variance. Defaults to False.

switch (bool, optional): Whether or not to check similarity of reference

and test mean and variance if test ranges do not fall within reference ranges. Defaults to True.

ignore_no (int, optional): Indicates the minimum length of an acceptable

vector. Defaults to 5.

small_no_per_seg (int, optional): Vectors with a smaller length than

this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.

percent_diff (float, optional): Quantile to be calculated from 0 and 1.

Defaults to 0.05.

percent_shift (float, optional): Percent above and below mean of values.

Defaults to 0.1.

lenient (bool, optional): If test_value is a small vector, users can

choose whether or not the ranges should be lenient or not. Defaults to True.

Returns:

bool: Indicates whether or not test ranges are within reference ranges.

metaclean3.segments.get_ref_updown(r: int, chpts0M: list | numpy.ndarray | pandas.Series, vlist: list, ref_percents: list = [1, 0.5], ref_percent: float = 0.4)

Extracts values for reference segment. Intended for internal use only; made public for external QA.

Args:

r (int): Index of the reference segment in chpts0M. chpts0M (list | numpy.ndarray | pandas.Series): A vector containing the

first index of each segment.

vlist (list): List of numeric vector of values. ref_percents (list, optional): List of percent of reference values

next to the test values to extract. Defaults to [1, 0.5].

ref_percent (float, optional): Percent of length of vlist;

a minimum of this value and the length of reference values is taken as another alternative to how many reference next to the test values to extract. Defaults to 0.4.

Returns:

list: List of reference values extracted.

metaclean3.segments.go_updown(r: int, chpts0M: list | numpy.ndarray, lim: int, refs: list, vlist: list, ip_func: collections.abc.Callable = in_percentiles, exception_no: int = 2, ignore_no: int = 5)

Tests whether reference segment is similar to its adjacent segment(s). Intended for internal use only; made public for external QA.

Args:

r (int): Index of the reference segment in chpts0M. chpts0M (list | numpy.ndarray | pandas.Series): A vector containing the

first index of each segment.

lim (int): Upper limit for the index of segments to test i.e. the index

of the end or the next reference segment to test .

refs (list): List of reference vectors. vlist (list): List of all value vectors. ip_func (Callable, optional): Function used to determine whether test

and reference values are similar. Defaults to in_percentiles().

exception_no (int, optional): Users can choose to check test values

exception_no segments away from reference segment (i.e. non-adjacent test segments). Defaults to 2.

ignore_no (int, optional): See in_percentiles(). Defaults to 5.

Returns:
tuple:
bool: Whether test values and adjacent reference values have similar

distribution.

numpy.ndarray: Indices of segments that were skipped in between test

and similar reference segment.

metaclean3.segments.get_ref_func2(vlist: list, ref_inds: numpy.ndarray, test_i1: int, test_i2: int, small_no_per_seg: int = 50, small_ref_prop: float = 0.5, sorted_index: numpy.ndarray | None = None)

Extracts values for reference segment; this function is an alternative to get_ref_updown() for when the user wants to subset values in the reference segment that are closest to some test segment. The reference segment indices do not need to be continuous. Intended for internal use only; made public for external QA.

Args:

vlist (list): List of value vectors. ref_inds (numpy.ndarray): Indices of reference values. test_i1 (int): Starting index of test segment. test_i2 (int): Ending index of test segment. small_no_per_seg (int, optional): Vectors with a smaller length than

this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.

small_ref_prop (float, optional): Proportion of reference segment to

test. Defaults to 0.5.

sorted_index (numpy.ndarray | None, optional): Indices sorted based on

segment quantiles; see utils.order_value_bins(). Defaults to None.

Returns:

list: List of reference value vectors.

metaclean3.segments.merge_segments_test(values: list | numpy.ndarray | pandas.Series, segments: list | numpy.ndarray | pandas.Series, merge_signif_test: Literal[t] | Literal[wilcox] = 'wilcox', p_thres: float = 0.05, ignore_no: int = 5, strict: bool = True)

Merge segments that contain values that are not significantly different based on some significance test. Intended for internal use only; made public for external QA.

Args:

values (list | numpy.ndarray | pandas.Series): Vector of values. segments (list | numpy.ndarray | pandas.Series): Segment label vector. merge_signif_test (str, optional): Type of significance test to used

i.e. ‘t’ (T-test) or ‘wilcox’ (Wilcoxan). Defaults to ‘wilcox’.

p_thres (float, optional): P-value threshold. Defaults to 0.05. ignore_no (int, optional): See in_percentiles(). Defaults to 5. strict (bool, optional): If set to True, function will only test

adjacent segments; otherwise, . Defaults to True.

Returns:

numpy.ndarray: an updated version of segments after merging.

metaclean3.segments.merge_segments_gain(values: list | numpy.ndarray | pandas.Series, segments: list | numpy.ndarray, cost_model: str = 'rank', binseg_jump: int = 2, binseg_min_size: int = 2, knee: bool = False, min_ref_percent: float = 0.4)

Merge segments with minimum gain at their changepoints until one segment is at least min_ref_percent percent of the maximum length. Intended for internal use only; made public for external QA.

Args:

values (list | numpy.ndarray | pandas.Series): Vector of values. segments (list | numpy.ndarray | pandas.Series): Segment label vector. cost_model (str, optional): Type of cost calculation method to use;

see ruptures.Binseg. Defaults to ‘rank’.

binseg_jump (int, optional): See argument jump in ruptures.Binseg.

Defaults to 2.

binseg_min_size (int, optional): See argument min_size in

ruptures.Binseg. Defaults to 2.

knee (bool, optional): Use gain value knee as stopping criteria. For

testing purposes only, do not change. Defaults to False.

min_ref_percent (float, optional): Stopping criteria; indicates the

minimum length (proportion of total) of any segment. Defaults to 0.4.

Returns:

numpy.ndarray: an updated version of segments after merging segments.

metaclean3.segments.merge_segments_quant(vlist: list, segments: list | numpy.ndarray | pandas.Series, ip_func: collections.abc.Callable = in_percentiles, get_ref_fun: collections.abc.Callable = get_ref_updown, small_no_per_seg: int = 50, keep_skipped: str = 'all')

Merge segments if their quantiles are within a range of each other. Intended for internal use only; made public for external QA.

Args:

vlist (list): List of value vectors. segments (list | numpy.ndarray | pandas.Series): Segment label vector. ip_func (Callable, optional): Function used to determine whether test

and reference values are similar. Defaults to in_percentiles().

get_ref_fun (Callable, optional): A function that extracts a list of

reference values. Defaults to get_ref_updown().

small_no_per_seg (int, optional): Vectors with a smaller length than

this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.

keep_skipped (str, optional): Indicates whether to keep all, some, or no

skipped segments i.e. ‘all’, ‘some’, ‘none’. Defaults to ‘all’.

Returns:

numpy.ndarray: an updated version of segments after merging segments.

metaclean3.segments.refine_reference_segment(vlist: list, segments: list | numpy.ndarray, rl: int | None = None, sorted_index: numpy.ndarray | None = None, ip_func: collections.abc.Callable = in_percentiles, small_no_per_seg: int = 50, overlap_no: int = 0, split_by_segment: bool = True, multi_refine: bool = True)

Check if there are sporadic segments within the reference segments that should be merged with the reference segment. Intended for internal use only; made public for external QA.

Args:

vlist (list): List of value vectors. segments (list | numpy.ndarray | pandas.Series): Segment label vector. rl (int | None, optional): Reference segment label; if set to None,

sets the reference segment as the largest segment. Defaults to None.

sorted_index (numpy.ndarray | None, optional): Indices sorted based on

segment quantiles; see utils.order_value_bins(). Defaults to None.

ip_func (Callable, optional): Function used to determine whether test

and reference values are similar. Defaults to in_percentiles().

small_no_per_seg (int, optional): Vectors with a smaller length than

this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.

overlap_no (int, optional): Numer of values allowed to overlap.

Defaults to 0.

split_by_segment (bool, optional): If there are multiple segments

in between the reference segment, choose whether to test them separately. Defaults to True.

multi_refine (bool, optional): Optional, if set to True, function will

refine repeatedly. Defaults to True.

Returns:

numpy.ndarray: an updated version of segments after merging segments.

metaclean3.segments.refine_reference_segment_inbetween(segments: list | numpy.ndarray, rl: int | None = None, small_no_per_seg: int = 50)

Same as refine_reference_segment() but intended for when there are sporadic parts of the reference segment between other segments. Intended for internal use only; made public for external QA.

Args:

segments (list | numpy.ndarray | pandas.Series): Segment label vector. rl (int | None, optional): Reference segment label; if set to None,

sets the reference segment as the largest segment. Defaults to None.

small_no_per_seg (int, optional): Vectors with a smaller length than

this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.

Returns:

numpy.ndarray: an updated version of segments after merging segments.

metaclean3.segments.remove_reference_ends(values: list | numpy.ndarray | pandas.Series, segments: list | numpy.ndarray | pandas.Series, rl: int | None = None, ip_func: collections.abc.Callable = in_percentiles, small_no_per_seg: int = 50)

Check ends of the reference segment to see if there are any sporacidity. Merge ends if applicable. Intended for internal use only; made public for external QA.

Args:

values (list | numpy.ndarray | pandas.Series): Vector of values. segments (list | numpy.ndarray | pandas.Series): Segment label vector. rl (int | None, optional): Reference segment label; if set to None,

sets the reference segment as the largest segment. Defaults to None.

ip_func (Callable, optional): Function used to determine whether test

and reference values are similar. Defaults to in_percentiles().

small_no_per_seg (int, optional): Vectors with a smaller length than

this value is considered a “small” vector; small vectors are treated differently because smaller segments can contain more sporadic values. Defaults to 50.

Returns:

numpy.ndarray: an updated version of segments after merging segments.