SHOGUN  6.1.3
List of all members | Public Member Functions
DataManager Class Reference

Detailed Description

Class DataManager for fetching/streaming test data block-wise. It can handle data coming from multiple sources. The number of data sources is represented by the num_distributions parameter in the constructor of the data manager. It can handle heterogenous data sources, and it can stream multiple blocks per burst, as the computation would require. The size of the blocks and the number of blocks to be fetched per burst can be set externally.

This class is designed to be used on a stack. An instance of DataManager should not be serialzied or copied or moved around. In Shogun, it is helpful when used inside just the implementation inside a PIMPL.

Definition at line 63 of file DataManager.h.

Public Member Functions

 DataManager (index_t num_distributions)
 
 DataManager (const DataManager &other)=delete
 
DataManageroperator= (const DataManager &other)=delete
 
 ~DataManager ()
 
void set_blocksize (index_t blocksize)
 
void set_num_blocks_per_burst (index_t num_blocks_per_burst)
 
InitPerFeature samples_at (index_t i)
 
CFeaturessamples_at (index_t i) const
 
index_tnum_samples_at (index_t i)
 
const index_t num_samples_at (index_t i) const
 
const index_t blocksize_at (index_t i) const
 
index_t get_num_samples () const
 
index_t get_min_blocksize () const
 

Constructor & Destructor Documentation

◆ DataManager() [1/2]

DataManager ( index_t  num_distributions)

Default constructor.

Parameters
num_distributionsnumber of data sources (i.e. CFeature objects)

Definition at line 43 of file DataManager.cpp.

◆ DataManager() [2/2]

DataManager ( const DataManager other)
delete

Disabled copy constructor

Parameters
otherother instance

◆ ~DataManager()

Destructor

Definition at line 55 of file DataManager.cpp.

Member Function Documentation

◆ blocksize_at()

const index_t blocksize_at ( index_t  i) const

Getter for the number of samples from a specified data source in a block.

Parameters
iThe data source index.
Returns
The number of samples from i-th data source in a block.

Definition at line 192 of file DataManager.cpp.

◆ get_min_blocksize()

index_t get_min_blocksize ( ) const
Returns
The minimum block-size that can be fetched from the specified data sources. For example, if there are two data sources, with samples 20 and 30, respectively, then minimum blocksize can be 5 (2 from 1st data source, 3 from the 2nd), and there can be then 10 such blocks.

Definition at line 72 of file DataManager.cpp.

◆ get_num_samples()

index_t get_num_samples ( ) const
Returns
Total number of samples that can be fetched from all the data sources.

Definition at line 59 of file DataManager.cpp.

◆ num_samples_at() [1/2]

index_t & num_samples_at ( index_t  i)

Setter for the number of samples. Setting this number is mandatory for streaming features. For other type of feature objects, this number equals the number of vectors, and is set internally.

Example usage:

DataManager data_mgr;
data_mgr.num_sample_at(0) = 10;
data_mgr.num_sample_at(1) = 15;
Parameters
iThe data source index, at which the number of samples is to be set.
Returns
A reference for the number of samples for the specified data source to be used as lvalue.

Definition at line 169 of file DataManager.cpp.

◆ num_samples_at() [2/2]

const index_t num_samples_at ( index_t  i) const

Getter for the number of samples.

Parameters
iThe data source index, from which the number of samples is to be obtained.
Returns
The number of samples for the specified data source.

Definition at line 179 of file DataManager.cpp.

◆ operator=()

DataManager& operator= ( const DataManager other)
delete

Disabled assignment operator

Parameters
otherother instance

◆ samples_at() [1/2]

InitPerFeature samples_at ( index_t  i)

Setter for feature object as a data source. Since multiple data sources are supported, this method takes an index in which the feature object is set. Internally, it initializes a data fetcher object for the provided feature object.

Example usage:

DataManager data_mgr;
// feats_0 = some CFeatures instance
// feats_1 = some CFeatures instance
data_mgr.sample_at(0) = feats_0;
data_mgr.sample_at(1) = feats_1;
Parameters
iThe data source index, at which the feature object is to be set as a data source.
Returns
An initializer for the specified data source (that sets up a fetcher for this feature), to be used as lvalue.

Definition at line 146 of file DataManager.cpp.

◆ samples_at() [2/2]

CFeatures * samples_at ( index_t  i) const

Getter for feature object at a give data source index.

Parameters
iThe data source index, from which the feature object is to be obtained
Returns
The underlying CFeatures object at the specified data source.

Definition at line 156 of file DataManager.cpp.

◆ set_blocksize()

void set_blocksize ( index_t  blocksize)

Sets the blocksize for block-wise data fetching. It divides the block-size per data source according to the total number of feature vectors available from that source. More formally, if there are \(K\) data sources, \(X_k\), \(k=\[1,K]\), with number of feature vectors \(n_{X_k}\) from each, then setting a block-size of \(B\) would mean that in each next() call of the data manager instance, it will fetch \(rho_{X_k} B\) samples from each \(X_k\), where \(rho_{X_k}=n_{X_k}/n\), \(n=sum_k n_{X_k}\).

Parameters
blocksizeThe size of the block consisting of data from all the sources.

Definition at line 91 of file DataManager.cpp.

◆ set_num_blocks_per_burst()

void set_num_blocks_per_burst ( index_t  num_blocks_per_burst)

In order to speed up the computation, usually a number of blocks are fetched at once per next() call. This method sets that number.

Parameters
num_blocks_per_burstThe number of blocks to be fetched in a burst.

Definition at line 117 of file DataManager.cpp.


The documentation for this class was generated from the following files:

SHOGUN Machine Learning Toolbox - Documentation