artan.state package¶
Submodules¶
artan.state.stateful_transformer module¶
-
class
artan.state.stateful_transformer.
HasEventTimeCol
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for param for event time column.
-
eventTimeCol
= Param(parent='undefined', name='eventTimeCol', doc='Column marking the event time of the received measurements')¶
-
-
class
artan.state.stateful_transformer.
HasStateKeyCol
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for param for state key column.
-
stateKeyCol
= Param(parent='undefined', name='stateKeyCol', doc='State key column. State keys uniquely identify the each state in stateful transformers,thus controlling the number of states and the degree of parallelization')¶
-
-
class
artan.state.stateful_transformer.
HasStateTimeoutDuration
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for param for state timeout duration.
-
stateTimeoutDuration
= Param(parent='undefined', name='stateTimeoutDuration', doc='Duration to wait before timing out the state')¶
-
-
class
artan.state.stateful_transformer.
HasStateTimeoutMode
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for param for state timeout mode for clearing states without updates, one of “none”, “process” or “event”.
-
timeoutMode
= Param(parent='undefined', name='timeoutMode', doc="Timeout mode for clearing the states that didn't receive measurements.")¶
-
-
class
artan.state.stateful_transformer.
HasWatermarkDuration
[source]¶ Bases:
pyspark.ml.param.Params
Mixin for param for watermark duration.
-
watermarkDuration
= Param(parent='undefined', name='watermarkDuration', doc='Watermark duration')¶
-
-
class
artan.state.stateful_transformer.
StatefulTransformer
(java_obj=None)[source]¶ Bases:
pyspark.ml.wrapper.JavaTransformer
,artan.state.stateful_transformer.HasStateKeyCol
,artan.state.stateful_transformer.HasEventTimeCol
,artan.state.stateful_transformer.HasWatermarkDuration
,artan.state.stateful_transformer.HasStateTimeoutDuration
,artan.state.stateful_transformer.HasStateTimeoutMode
Base mixin for stateful transformations
-
setEventTimeCol
(value)[source]¶ Sets the event time column in the input DataFrame.
Parameters: value – String, column name of the eventTime timestamp Returns: StatefulTransformer
-
setStateKeyCol
(value)[source]¶ Sets the state key column. Each value in the column should uniquely identify a stateful transformer. Each unique value will result in a separate state.
-
setStateTimeoutDuration
(value)[source]¶ Sets the state timeout duration for all states, only valid when state timeout mode is not ‘none’.
Must be a valid duration string, such as ‘10 minutes’.
Parameters: value – String, duration specifying state timeout Returns: StatefulTransformer
-
setStateTimeoutMode
(value)[source]¶ Sets the state timeout mode. Supported values are ‘none’, ‘process’ and ‘event’. Enabling state timeout will clear the state after a certain timeout duration which can be set. If a state receives measurements after it times out, the state will be initialized as if it received no measurements.
- ‘none’: No state timeout, state is kept indefinitely.
- ‘process’: Process time based state timeout, state will be cleared if no measurements are received for
- a duration based on processing time. Effects all states. Timeout duration must be set with setStateTimeoutDuration.
- ‘event’: Event time based state timeout, state will be cleared if no measurements are recieved for a duration
- based on event time determined by watermark. Effects all states. Timeout duration must be set with setStateTimeoutDuration. Additionally, event time column and it’s watermark duration must be set with setEventTimeCol and setWatermarkDuration. Note that this will result in dropping measurements occuring later than the watermark.
Default is ‘none’
Parameters: value – String, one of ‘none’, ‘process’ or ‘event’ Returns: StatefulTransformer
-