Streams module

StreamGenerator([n_chunks, chunk_size, ...])

Data streams generator for both stationary and drifting data streams.

ARFFParser(path[, chunk_size, n_chunks])

Stream-aware parser of datasets in ARFF format.

CSVParser(path[, chunk_size, n_chunks])

Stream-aware parser of datasets in CSV format.

NPYParser(path[, chunk_size, n_chunks])

Stream-aware parser of datasets in numpy format.

class strlearn.streams.ARFFParser(path, chunk_size=200, n_chunks=250)

Bases: object

Stream-aware parser of datasets in ARFF format.

Parameters
  • path (string) – Path to the ARFF file.

  • chunk_size (integer, optional (default=200)) – The number of instances in each data chunk.

  • n_chunks (integer, optional (default=250)) – The number of data chunks, that the stream is composed of.

Example

>>> import strlearn as sl
>>> stream = sl.streams.ARFFParser("Agrawal.arff")
>>> clf = sl.classifiers.AccumulatedSamplesClassifier()
>>> evaluator = sl.evaluators.PrequentialEvaluator()
>>> evaluator.process(clf, stream)
>>> stream.reset()
>>> print(evaluator.scores_)
...
[[0.855      0.80815508 0.79478582 0.80815508 0.89679715]
[0.795      0.75827674 0.7426779  0.75827674 0.84644195]
[0.8        0.75313899 0.73559983 0.75313899 0.85507246]
...
[0.885      0.86181169 0.85534199 0.86181169 0.91119691]
[0.895      0.86935764 0.86452058 0.86935764 0.92134831]
[0.87       0.85104088 0.84813907 0.85104088 0.9       ]]
get_chunk()

Generating a data chunk of a stream.

Used by all evaluators but also accesible for custom evaluation.

Returns

Generated samples and target values.

Return type

tuple {array-like, shape (n_samples, n_features), array-like, shape (n_samples, )}

is_dry()

Checking if we have reached the end of the stream.

Returns

flag showing if the stream has ended

Return type

boolean

reset()

Reset processed stream and close ARFF file.

class strlearn.streams.CSVParser(path, chunk_size=200, n_chunks=250)

Bases: object

Stream-aware parser of datasets in CSV format.

Parameters
  • path (string) – Path to the csv file.

  • chunk_size (integer, optional (default=200)) – The number of instances in each data chunk.

  • n_chunks (integer, optional (default=250)) – The number of data chunks, that the stream is composed of.

Example

>>> import strlearn as sl
>>> stream = sl.streams.CSVParser("Agrawal.csv")
>>> clf = sl.classifiers.AccumulatedSamplesClassifier()
>>> evaluator = sl.evaluators.PrequentialEvaluator()
>>> evaluator.process(clf, stream)
>>> stream.reset()
>>> print(evaluator.scores_)
...
[[0.855      0.80815508 0.79478582 0.80815508 0.89679715]
[0.795      0.75827674 0.7426779  0.75827674 0.84644195]
[0.8        0.75313899 0.73559983 0.75313899 0.85507246]
...
[0.885      0.86181169 0.85534199 0.86181169 0.91119691]
[0.895      0.86935764 0.86452058 0.86935764 0.92134831]
[0.87       0.85104088 0.84813907 0.85104088 0.9       ]]
get_chunk()

Generating a data chunk of a stream.

Used by all evaluators but also accesible for custom evaluation.

Returns

Generated samples and target values.

Return type

tuple {array-like, shape (n_samples, n_features), array-like, shape (n_samples, )}

is_dry()

Checking if we have reached the end of the stream.

Returns

flag showing if the stream has ended

Return type

boolean

reset()

Reset stream to the beginning.

class strlearn.streams.NPYParser(path, chunk_size=200, n_chunks=250)

Bases: object

Stream-aware parser of datasets in numpy format.

Parameters
  • path (string) – Path to the npy file.

  • chunk_size (integer, optional (default=200)) – The number of instances in each data chunk.

  • n_chunks (integer, optional (default=250)) – The number of data chunks, that the stream is composed of.

Example

>>> import strlearn as sl
>>> stream = sl.streams.NPYParser("Agrawal.npy")
>>> clf = sl.classifiers.AccumulatedSamplesClassifier()
>>> evaluator = sl.evaluators.PrequentialEvaluator()
>>> evaluator.process(clf, stream)
>>> stream.reset()
>>> print(evaluator.scores_)
...
[[0.855      0.80815508 0.79478582 0.80815508 0.89679715]
[0.795      0.75827674 0.7426779  0.75827674 0.84644195]
[0.8        0.75313899 0.73559983 0.75313899 0.85507246]
...
[0.885      0.86181169 0.85534199 0.86181169 0.91119691]
[0.895      0.86935764 0.86452058 0.86935764 0.92134831]
[0.87       0.85104088 0.84813907 0.85104088 0.9       ]]
get_chunk()

Generating a data chunk of a stream.

Used by all evaluators but also accesible for custom evaluation.

Returns

Generated samples and target values.

Return type

tuple {array-like, shape (n_samples, n_features), array-like, shape (n_samples, )}

is_dry()

Checking if we have reached the end of the stream.

Returns

flag showing if the stream has ended

Return type

boolean

reset()

Reset stream to the beginning.

class strlearn.streams.StreamGenerator(n_chunks=250, chunk_size=200, random_state=1410, n_drifts=0, concept_sigmoid_spacing=None, n_classes=2, n_features=20, n_informative=2, n_redundant=2, n_repeated=0, n_clusters_per_class=2, recurring=False, weights=None, incremental=False, y_flip=0.01, **kwargs)

Bases: object

Data streams generator for both stationary and drifting data streams.

A key element of the stream-learn package is a generator that allows to prepare a replicable (according to the given random_state value) classification dataset with class distribution changing over the course of stream, with base concepts build on a default class distributions for the scikit-learn package from the make_classification() function. These types of distributions try to reproduce the rules for generating the Madelon set. The StreamGenerator is capable of preparing any variation of the data stream known in the general taxonomy of data streams.

Parameters
  • n_chunks (integer, optional (default=250)) – The number of data chunks, that the stream is composed of.

  • chunk_size (integer, optional (default=200)) – The number of instances in each data chunk.

  • random_state (integer, optional (default=1410)) – The seed used by the random number generator.

  • n_drifts (integer, optional (default=4)) – The number of concept changes in the data stream.

  • concept_sigmoid_spacing (float, optional (default=10.)) – Value that determines the shape of sigmoid function and how sudden is the change of concept. The higher the value, the more sudden the drift is.

  • n_classes (integer, optional (default=2)) – The number of classes in the generated data stream.

  • y_flip (float or tuple (default=0.01)) – Label noise for whole dataset or separate classes.

  • recurring (boolean, optional (default=False)) – Determines if the streams can go back to the previously encountered concepts.

  • weights (array-like, shape (n_classes, ) or tuple (only for 2 classes)) – If array - class weight for static imbalance, if 3-valued tuple - (n_drifts, concept_sigmoid_spacing, IR amplitude [0-1]) for generation of continous dynamically imbalanced streams, if 2-valued tuple - (mean value, standard deviation) for generation of discreete dynamically imbalanced streams.

Example

>>> import strlearn as sl
>>> stream = sl.streams.StreamGenerator(n_drifts=2, weights=[0.2, 0.8], concept_sigmoid_spacing=5)
>>> clf = sl.classifiers.AccumulatedSamplesClassifier()
>>> evaluator = sl.evaluators.PrequentialEvaluator()
>>> evaluator.process(clf, stream)
>>> print(evaluator.scores_)
[[0.955      0.93655817 0.93601827 0.93655817 0.97142857]
 [0.94       0.91397849 0.91275313 0.91397849 0.96129032]
 [0.9        0.85565271 0.85234488 0.85565271 0.93670886]
 ...
 [0.815      0.72584133 0.70447376 0.72584133 0.8802589 ]
 [0.83       0.69522145 0.65223303 0.69522145 0.89570552]
 [0.845      0.67267706 0.61257135 0.67267706 0.90855457]]
get_chunk()

Generating a data chunk of a stream.

Used by all evaluators but also accesible for custom evaluation.

Returns

Generated samples and target values.

Return type

tuple {array-like, shape (n_samples, n_features), array-like, shape (n_samples, )}

save_to_arff(filepath)

Save generated stream to the ARFF format file.

Parameters

filepath (string) – Path to the file where data will be saved in ARFF format.

save_to_csv(filepath)

Save generated stream to the csv format file.

Parameters

filepath (string) – Path to the file where data will be saved in csv format.

save_to_npy(filepath)

Save generated stream to the numpy format file.

Parameters

filepath (string) – Path to the file where data will be saved in numpy format.