gstools.normalizer.YeoJohnson

class gstools.normalizer.YeoJohnson(data=None, **parameter)[source]

Bases: Normalizer

Yeo-Johnson (2000) transformed fields.

Parameters:
  • data (array_like, optional) – Input data to fit the transformation in order to gain normality. The default is None.

  • lmbda (float, optional) – Shape parameter. Default: 1

Notes

This transformation is given by [Yeo2000]:

\[\begin{split}y=\begin{cases} \frac{(x+1)^{\lambda} - 1}{\lambda} & x\geq 0,\, \lambda\neq 0 \\ \log(x+1) & x\geq 0,\, \lambda = 0 \\ -\frac{(|x|+1)^{2-\lambda} - 1}{2-\lambda} & x<0,\, \lambda\neq 2 \\ -\log(|x|+1) & x<0,\, \lambda = 2 \end{cases}\end{split}\]

References

[Yeo2000]

I.K. Yeo and R.A. Johnson, “A new family of power transformations to improve normality or symmetry.” Biometrika, 87(4), pp.954-959, (2000).

Attributes:
name

str: The name of the normalizer class.

Methods

denormalize(data)

Transform to input distribution.

derivative(data)

Factor for normal PDF to gain target PDF.

fit(data[, skip])

Fitting the transformation to data by maximizing Log-Likelihood.

kernel_loglikelihood(data)

Kernel Log-Likelihood for given data with current parameters.

likelihood(data)

Likelihood for given data with current parameters.

loglikelihood(data)

Log-Likelihood for given data with current parameters.

normalize(data)

Transform to normal distribution.

denormalize(data)

Transform to input distribution.

Parameters:

data (array_like) – Input data (normal distributed).

Returns:

Denormalized data.

Return type:

numpy.ndarray

derivative(data)

Factor for normal PDF to gain target PDF.

Parameters:

data (array_like) – Input data (not normal distributed).

Returns:

Derivative of the normalization transformation function.

Return type:

numpy.ndarray

fit(data, skip=None, **kwargs)

Fitting the transformation to data by maximizing Log-Likelihood.

Parameters:
  • data (array_like) – Input data to fit the transformation to in order to gain normality.

  • skip (list of str or None, optional) – Names of parameters to be skipped in fitting. The default is None.

  • **kwargs – Keyword arguments passed to scipy.optimize.minimize_scalar when only one parameter present or scipy.optimize.minimize.

Returns:

Optimal parameters given by names.

Return type:

dict

kernel_loglikelihood(data)

Kernel Log-Likelihood for given data with current parameters.

Parameters:

data (array_like) – Input data to fit the transformation to in order to gain normality.

Returns:

Kernel Log-Likelihood of the given data.

Return type:

float

Notes

This loglikelihood function is neglecting additive constants, that are not needed for optimization.

likelihood(data)

Likelihood for given data with current parameters.

Parameters:

data (array_like) – Input data to fit the transformation to in order to gain normality.

Returns:

Likelihood of the given data.

Return type:

float

loglikelihood(data)

Log-Likelihood for given data with current parameters.

Parameters:

data (array_like) – Input data to fit the transformation to in order to gain normality.

Returns:

Log-Likelihood of the given data.

Return type:

float

normalize(data)

Transform to normal distribution.

Parameters:

data (array_like) – Input data (not normal distributed).

Returns:

Normalized data.

Return type:

numpy.ndarray

default_parameter = {'lmbda': 1}

Default parameter of the YeoJohnson-Normalizer.

Type:

dict

denormalize_range = (-inf, inf)

Valid range for output/normal data.

Type:

tuple

property name

The name of the normalizer class.

Type:

str

normalize_range = (-inf, inf)

Valid range for input data.

Type:

tuple