fsds package¶

Top-level package for fsds.

Submodules¶

fsds.default module¶

Main module.

fsds.default.Cohen_d(group1, group2, correction=False)[source]¶

Compute Cohen’s d d = (group1.mean()-group2.mean())/pool_variance. pooled_variance= (n1 * var1 + n2 * var2) / (n1 + n2)

Args:

group1 (Series or NumPy array): group 1 for calculating d group2 (Series or NumPy array): group 2 for calculating d correction (bool): Apply equation correction if N<50. Default is False.

Url with small ncorrection equation:

https://www.statisticshowto.datasciencecentral.com/cohens-d/

Returns:

d (float): calculated d value

INTERPRETATION OF COHEN’s D: > Small effect = 0.2 > Medium Effect = 0.5 > Large Effect = 0.8

fsds.default.evaluate_PDF(rv, x=4)[source]¶: Input: a random variable object, standard deviation output : x and y values for the normal distribution

fsds.default.find_outliers_IQR(data, col=None)[source]¶

Use Tukey’s Method of outlier removal AKA InterQuartile-Range Rule and return boolean series where True indicates it is an outlier. - Calculates the range between the 75% and 25% quartiles - Outliers fall outside upper and lower limits, using a treshold of 1.5*IQR the 75% and 25% quartiles.

IQR Range Calculation:: res = df.describe() IQR = res[‘75%’] - res[‘25%’] lower_limit = res[‘25%’] - 1.5*IQR upper_limit = res[‘75%’] + 1.5*IQR
Args:: data (DataFrame,Series,or ndarray): data to test for outliers. col (str): If passing a DataFrame, must specify column to use.
Returns:: [boolean Series]: A True/False for each row use to slice outliers.

EXAMPLE USE: >> idx_outs = find_outliers_df(df,col=’AdjustedCompensation’) >> good_data = data[~idx_outs].copy()

fsds.default.find_outliers_Z(data, col=None)[source]¶

Use scipy to calcualte absoliute Z-scores and return boolean series where True indicates it is an outlier

Args:: data (DataFrame,Series,or ndarray): data to test for outliers. col (str): If passing a DataFrame, must specify column to use.
Returns:: [boolean Series]: A True/False for each row use to slice outliers.

EXAMPLE USE: >> idx_outs = find_outliers_df(df,col=’AdjustedCompensation’) >> good_data = data[~idx_outs].copy()

fsds.default.overlap_superiority(group1, group2, n=1000)[source]¶

Estimates overlap and superiority based on a sample.

group1: scipy.stats rv object group2: scipy.stats rv object n: sample size

fsds.default.p_value_welch_ttest(a, b, two_sided=False)[source]¶: Calculates the p-value for Welch’s t-test given two samples. By default, the returned p-value is for a one-sided t-test. Set the two-sided parameter to True if you wish to perform a two-sided t-test instead.

fsds.default.plot_pdfs(cohen_d=2)[source]¶

Plot PDFs for distributions that differ by some number of stds.

cohen_d: number of standard deviations between the means

fsds.default.welch_df(a, b)[source]¶

fsds.default.welch_t(a, b)[source]¶

fsds package¶

Submodules¶

fsds.default module¶

fsds.fsds module¶

fsds.imports module¶

fs_ds

Navigation

Related Topics