fsds package¶
Top-level package for fsds.
Submodules¶
fsds.default module¶
Main module.
-
fsds.default.
Cohen_d
(group1, group2, correction=False)[source]¶ Compute Cohen’s d d = (group1.mean()-group2.mean())/pool_variance. pooled_variance= (n1 * var1 + n2 * var2) / (n1 + n2)
- Args:
group1 (Series or NumPy array): group 1 for calculating d group2 (Series or NumPy array): group 2 for calculating d correction (bool): Apply equation correction if N<50. Default is False.
- Url with small ncorrection equation:
- Returns:
- d (float): calculated d value
INTERPRETATION OF COHEN’s D: > Small effect = 0.2 > Medium Effect = 0.5 > Large Effect = 0.8
-
fsds.default.
evaluate_PDF
(rv, x=4)[source]¶ Input: a random variable object, standard deviation output : x and y values for the normal distribution
-
fsds.default.
find_outliers_IQR
(data, col=None)[source]¶ Use Tukey’s Method of outlier removal AKA InterQuartile-Range Rule and return boolean series where True indicates it is an outlier. - Calculates the range between the 75% and 25% quartiles - Outliers fall outside upper and lower limits, using a treshold of 1.5*IQR the 75% and 25% quartiles.
- IQR Range Calculation:
- res = df.describe() IQR = res[‘75%’] - res[‘25%’] lower_limit = res[‘25%’] - 1.5*IQR upper_limit = res[‘75%’] + 1.5*IQR
- Args:
- data (DataFrame,Series,or ndarray): data to test for outliers. col (str): If passing a DataFrame, must specify column to use.
- Returns:
- [boolean Series]: A True/False for each row use to slice outliers.
EXAMPLE USE: >> idx_outs = find_outliers_df(df,col=’AdjustedCompensation’) >> good_data = data[~idx_outs].copy()
-
fsds.default.
find_outliers_Z
(data, col=None)[source]¶ Use scipy to calcualte absoliute Z-scores and return boolean series where True indicates it is an outlier
- Args:
- data (DataFrame,Series,or ndarray): data to test for outliers. col (str): If passing a DataFrame, must specify column to use.
- Returns:
- [boolean Series]: A True/False for each row use to slice outliers.
EXAMPLE USE: >> idx_outs = find_outliers_df(df,col=’AdjustedCompensation’) >> good_data = data[~idx_outs].copy()
-
fsds.default.
overlap_superiority
(group1, group2, n=1000)[source]¶ Estimates overlap and superiority based on a sample.
group1: scipy.stats rv object group2: scipy.stats rv object n: sample size
-
fsds.default.
p_value_welch_ttest
(a, b, two_sided=False)[source]¶ Calculates the p-value for Welch’s t-test given two samples. By default, the returned p-value is for a one-sided t-test. Set the two-sided parameter to True if you wish to perform a two-sided t-test instead.