artan.mixture package¶
Submodules¶
artan.mixture.mixture_params module¶
-
class
artan.mixture.mixture_params.
HasBatchTrainEnabled
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for enabling batch EM train mode
-
batchTrainEnabled
= Param(parent='undefined', name='batchTrainEnabled', doc='Flag to enable batch EM. Unless enabled, the transformer will do online EM. Online EM can be done withboth streaming and batch dataframes, whereas batch EM can only be done with batch dataframes. Default is false')¶
-
-
class
artan.mixture.mixture_params.
HasBatchTrainMaxIter
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for batch train max iterations
-
batchTrainMaxIter
= Param(parent='undefined', name='batchTrainMaxIter', doc='Maximum iterations in batch train mode, default is 30')¶
-
-
class
artan.mixture.mixture_params.
HasBatchTrainTol
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for batch train iteration stop tolerance
-
batchTrainTol
= Param(parent='undefined', name='batchTrainTol', doc='Min change in loglikelihood to stop iterations in batch EM mode. Default is 0.1')¶
-
-
class
artan.mixture.mixture_params.
HasDecayRate
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for decaying step size parameter
-
decayRate
= Param(parent='undefined', name='decayRate', doc='Step size as a decaying function rather than a constant, which might be preferred in batch training.If set, the step size will be replaced with the output of the functionstepSize = (2 + kIter)**(-decayRate)')¶
-
-
class
artan.mixture.mixture_params.
HasInitialMixtureModelCol
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for initial mixture model parameter.
-
getInitialMixtureModelCol
()[source]¶ Gets the value of initial mixture model col or its default value.
-
initialMixtureModelCol
= Param(parent='undefined', name='initialMixtureModelCol', doc='Sets the initial mixture model from struct column conforming to mixture distribution')¶
-
-
class
artan.mixture.mixture_params.
HasInitialWeights
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for initial mixture weights parameter.
-
initialWeights
= Param(parent='undefined', name='initialWeights', doc='Initial weights of the mixtures. The weights should sum up to 1.0 .')¶
-
-
class
artan.mixture.mixture_params.
HasInitialWeightsCol
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for initial mixture weights parameter.
-
initialWeightsCol
= Param(parent='undefined', name='initialWeightsCol', doc='Initial weights of mixtures from dataframe column')¶
-
-
class
artan.mixture.mixture_params.
HasMinibatchSize
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for mini-batch size parameter
-
minibatchSize
= Param(parent='undefined', name='minibatchSize', doc='Size for batching samples together in online EM algorithm. Estimate will be produced once per each batchHaving larger batches increases stability with increased memory footprint. Each minibatch is stored inmixture transformer state independently from spark minibatches.')¶
-
-
class
artan.mixture.mixture_params.
HasMinibatchSizeCol
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for mini-batch size parameter
-
minibatchSizeCol
= Param(parent='undefined', name='minibatchSizeCol', doc='Set minibatch size from dataframe column')¶
-
-
class
artan.mixture.mixture_params.
HasMixtureCount
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for number of components in the mixture
-
mixtureCount
= Param(parent='undefined', name='mixtureCount', doc='Number of finite mixture components, must ge > 0')¶
-
-
class
artan.mixture.mixture_params.
HasSampleCol
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for sample column parameter.
-
sampleCol
= Param(parent='undefined', name='sampleCol', doc='Column name for input to mixture models')¶
-
-
class
artan.mixture.mixture_params.
HasStepSize
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for controlling the inertia of the current parameter.
-
stepSize
= Param(parent='undefined', name='stepSize', doc='Weights the current parameter of the model against the old parameter. A step size of 1.0 means ignorethe old parameter, whereas a step size of 0 means ignore the current parameter. Values closer to 1.0 willincrease speed of convergence, but might have adverse effects on stability. In online setting,it is advised to set small values close to 0.0. Default is 0.01')¶
-
-
class
artan.mixture.mixture_params.
HasStepSizeCol
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for step size parameter
-
stepSizeCol
= Param(parent='undefined', name='stepSizeCol', doc='stepSize parameter from dataframe column instead of a constant value across all samples')¶
-
-
class
artan.mixture.mixture_params.
HasUpdateHoldout
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for update holdout parameter
-
updateHoldout
= Param(parent='undefined', name='updateHoldout', doc='Controls after how many samples the mixture will start calculating estimates. Preventing updatein first few samples might be preferred for stability.')¶
-
-
class
artan.mixture.mixture_params.
HasUpdateHoldoutCol
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for update holdout parameter
-
updateHoldoutCol
= Param(parent='undefined', name='updateHoldoutCol', doc='updateHoldout from dataframe column rather than a constant value across all states')¶
-
-
class
artan.mixture.mixture_params.
MixtureParams
[source]¶ Bases:
artan.mixture.mixture_params.HasSampleCol
,artan.mixture.mixture_params.HasStepSize
,artan.mixture.mixture_params.HasStepSizeCol
,artan.mixture.mixture_params.HasInitialWeights
,artan.mixture.mixture_params.HasInitialWeightsCol
,artan.mixture.mixture_params.HasMinibatchSize
,artan.mixture.mixture_params.HasUpdateHoldout
,artan.mixture.mixture_params.HasDecayRate
,artan.mixture.mixture_params.HasInitialMixtureModelCol
,artan.mixture.mixture_params.HasMinibatchSizeCol
,artan.mixture.mixture_params.HasUpdateHoldoutCol
,artan.mixture.mixture_params.HasBatchTrainEnabled
,artan.mixture.mixture_params.HasBatchTrainMaxIter
,artan.mixture.mixture_params.HasBatchTrainTol
,artan.mixture.mixture_params.HasMixtureCount
-
setBatchTrainMaxIter
(value)[source]¶ Sets the max number of iterations in batch train mode
Default is 30
Parameters: value – Int Returns: MixtureTransformer
-
setBatchTrainTol
(value)[source]¶ Sets the minimum loglikelihood improvement for stopping iterations in batch EM train mode
Defaullt is 0.1
Parameters: value – Float Returns: MixtureTransformer
-
setDecayRate
(value)[source]¶ Sets the step size as a decaying function rather than a constant step size, which might be preferred for batch training. If set, the step size will be replaced with the output of following function:
stepSize = (2 + kIter)**(-decayRate)
Where kIter is incremented by 1 on each minibatch.
Returns: MixtureTransformer
-
setInitialMixtureModelCol
(value)[source]¶ Sets the initial mixture model directly from dataframe column
Parameters: value – String Returns: MixtureTransformer
-
setInitialWeights
(value)[source]¶ Sets the initial weights of the mixtures. The weights should sum up to 1.0.
Parameters: value – List[Float] Returns: MixtureTransformer
-
setInitialWeightsCol
(value)[source]¶ Sets the initial mixture weights parameter from dataframe column
Parameters: value – String Returns: MixtureTransformer
-
setMinibatchSize
(value)[source]¶ Sets the minibatch size for batching samples together in online EM algorithm. Estimate will be produced once per each batch. Having larger batches increases stability with increased memory footprint.
Default is 1
Parameters: value – Int Returns: MixtureTransformer
-
setMinibatchSizeCol
(value)[source]¶ Sets the minibatch size from dataframe column rather than a constant minibatch size across all states. Overrides setMinibatchSize setting.
Parameters: value – Int Returns: MixtureTransformer
-
setMixtureCount
(value)[source]¶ Sets the number of components in the finite mixture
Parameters: value – Int Returns: MixtureTransformer
-
setSampleCol
(value)[source]¶ Sets the sample column for the mixture model inputs. Depending on the mixture distribution, sample type should be different.
Bernoulli => Boolean Poisson => Long MultivariateGaussian => Vector
Parameters: value – String Returns: MixtureTransformer
-
setStepSize
(value)[source]¶ Sets the step size parameter, which weights the current parameter of the model against the old parameter. A step size of 1.0 means ignore the old parameter, whereas a step size of 0 means ignore the current parameter. Values closer to 1.0 will increase speed of convergence, but might have adverse effects on stability. For online EM, it is advised to set it close to 0.0.
Default is 0.1
Parameters: value – Int Returns: MixtureTransformer
-
setStepSizeCol
(value)[source]¶ Sets the step size from dataframe column, which would allow setting different step sizes accross measurements. Overrides the value set by setStepSize
Parameters: value – String Returns: MixtureTransformer
-
artan.mixture.bernoulli_mixture module¶
-
class
artan.mixture.bernoulli_mixture.
BernoulliMixture
[source]¶ Bases:
artan.state.stateful_transformer.StatefulTransformer
,artan.mixture.mixture_params.MixtureParams
,artan.mixture.bernoulli_mixture._HasInitialProbabilities
,artan.mixture.bernoulli_mixture._HasInitialProbabilitiesCol
,artan.mixture.bernoulli_mixture._HasBernoulliMixtureModelCol
,artan.utils.ArtanJavaMLReadable
,pyspark.ml.util.JavaMLWritable
Online bernoulli mixture estimator with a stateful transformer, based on Cappe (2011) Online Expectation-Maximisation paper.
Outputs an estimate for each input sample in a single pass, by replacing the E-step in EM with a recursively averaged stochastic E-step.
-
setInitialProbabilities
(value)[source]¶ Sets the initial bernoulli probabilities of the mixtures. The length of the array should be equal to mixture count, each element in the array should be between 0 and 1.
Default is equally spaced probabilities between 0 and 1
Parameters: value – List[Float] Returns: BernoulliMixture
-
artan.mixture.multivariate_gaussian_mixture module¶
-
class
artan.mixture.multivariate_gaussian_mixture.
MultivariateGaussianMixture
[source]¶ Bases:
artan.state.stateful_transformer.StatefulTransformer
,artan.mixture.mixture_params.MixtureParams
,artan.mixture.multivariate_gaussian_mixture._HasInitialMeans
,artan.mixture.multivariate_gaussian_mixture._HasInitialMeansCol
,artan.mixture.multivariate_gaussian_mixture._HasInitialCovariances
,artan.mixture.multivariate_gaussian_mixture._HasInitialCovariancesCol
,artan.utils.ArtanJavaMLReadable
,pyspark.ml.util.JavaMLWritable
Online gaussian mixture estimator with a stateful transformer, based on Cappe (2011) Online Expectation-Maximisation paper.
Outputs an estimate for each input sample in a single pass, by replacing the E-step in EM with a recursively averaged stochastic E-step.
-
setInitialCovariances
(value)[source]¶ Sets the initial covariance matrices of the mixtures as a nested array of doubles. The dimensions of the array should be mixtureCount x sampleSize**2
Parameters: value – List[List[Float]] Returns: MultivariateGaussianMixture
-
setInitialCovariancesCol
(value)[source]¶ Sets the initial covariance matrices of the mixtures from dataframe column. Overrides the value set by setInitialCovariances
Parameters: value – String Returns: MultivariateGaussianMixture
-
artan.mixture.poisson_mixture module¶
-
class
artan.mixture.poisson_mixture.
PoissonMixture
[source]¶ Bases:
artan.state.stateful_transformer.StatefulTransformer
,artan.mixture.mixture_params.MixtureParams
,artan.mixture.poisson_mixture._HasInitialRates
,artan.mixture.poisson_mixture._HasInitialRatesCol
,artan.utils.ArtanJavaMLReadable
,pyspark.ml.util.JavaMLWritable
Online poisson mixture estimator with a stateful transformer, based on Cappe (2011) Online Expectation-Maximisation paper.
Outputs an estimate for each input sample in a single pass, by replacing the E-step in EM with a recursively averaged stochastic E-step.