artan.state package

Submodules

artan.state.stateful_transformer module

class artan.state.stateful_transformer.HasEventTimeCol[source]

Bases: pyspark.ml.param.Params

Mixin for param for event time column.

eventTimeCol = Param(parent='undefined', name='eventTimeCol', doc='Column marking the event time of the received measurements')
getEventTimeCol()[source]

Gets the value of event time column or its default value.

class artan.state.stateful_transformer.HasStateKeyCol[source]

Bases: pyspark.ml.param.Params

Mixin for param for state key column.

getStateKeyCol()[source]

Gets the value of state key column or its default value.

stateKeyCol = Param(parent='undefined', name='stateKeyCol', doc='State key column. State keys uniquely identify the each state in stateful transformers,thus controlling the number of states and the degree of parallelization')
class artan.state.stateful_transformer.HasStateTimeoutDuration[source]

Bases: pyspark.ml.param.Params

Mixin for param for state timeout duration.

getStateTimeoutDuration()[source]

Gets the value of state timeout duration or its default value.

stateTimeoutDuration = Param(parent='undefined', name='stateTimeoutDuration', doc='Duration to wait before timing out the state')
class artan.state.stateful_transformer.HasStateTimeoutMode[source]

Bases: pyspark.ml.param.Params

Mixin for param for state timeout mode for clearing states without updates, one of “none”, “process” or “event”.

getTimeoutMode()[source]

Gets the value of timeout mode or its default value.

timeoutMode = Param(parent='undefined', name='timeoutMode', doc="Timeout mode for clearing the states that didn't receive measurements.")
class artan.state.stateful_transformer.HasWatermarkDuration[source]

Bases: pyspark.ml.param.Params

Mixin for param for watermark duration.

getWatermarkDuration()[source]

Gets the value of watermark duration or its default value.

watermarkDuration = Param(parent='undefined', name='watermarkDuration', doc='Watermark duration')
class artan.state.stateful_transformer.StatefulTransformer(java_obj=None)[source]

Bases: pyspark.ml.wrapper.JavaTransformer, artan.state.stateful_transformer.HasStateKeyCol, artan.state.stateful_transformer.HasEventTimeCol, artan.state.stateful_transformer.HasWatermarkDuration, artan.state.stateful_transformer.HasStateTimeoutDuration, artan.state.stateful_transformer.HasStateTimeoutMode

Base mixin for stateful transformations

setEventTimeCol(value)[source]

Sets the event time column in the input DataFrame.

Parameters:value – String, column name of the eventTime timestamp
Returns:StatefulTransformer
setStateKeyCol(value)[source]

Sets the state key column. Each value in the column should uniquely identify a stateful transformer. Each unique value will result in a separate state.

setStateTimeoutDuration(value)[source]

Sets the state timeout duration for all states, only valid when state timeout mode is not ‘none’.

Must be a valid duration string, such as ‘10 minutes’.

Parameters:value – String, duration specifying state timeout
Returns:StatefulTransformer
setStateTimeoutMode(value)[source]

Sets the state timeout mode. Supported values are ‘none’, ‘process’ and ‘event’. Enabling state timeout will clear the state after a certain timeout duration which can be set. If a state receives measurements after it times out, the state will be initialized as if it received no measurements.

  • ‘none’: No state timeout, state is kept indefinitely.
  • ‘process’: Process time based state timeout, state will be cleared if no measurements are received for
    a duration based on processing time. Effects all states. Timeout duration must be set with setStateTimeoutDuration.
  • ‘event’: Event time based state timeout, state will be cleared if no measurements are recieved for a duration
    based on event time determined by watermark. Effects all states. Timeout duration must be set with setStateTimeoutDuration. Additionally, event time column and it’s watermark duration must be set with setEventTimeCol and setWatermarkDuration. Note that this will result in dropping measurements occuring later than the watermark.

Default is ‘none’

Parameters:value – String, one of ‘none’, ‘process’ or ‘event’
Returns:StatefulTransformer
setWatermarkDuration(value)[source]

Set the watermark duration for all states, only valid when state timeout mode is ‘event’. Must be a valid duration string, such as ‘10 minutes’.

Parameters:value – String, duration specifying watermark from eventTime timestamp
Returns:StatefulTransformer