Container Designs for CSV data

Usually, when processing CSV data, you would put the data through a number of stages.

Filtering by row on text input
Setting a data-type for each column
More filtering by row if data-type incorrect
Enumeration of data by row and standardisation
More filtering by row for the application
Pivotting the table
Getting row-names, getting column names, getting a matrix of values

To do all these, this design has been adopted. The CSV worksheet has a container and each row and each column can be put into its own container.

Also the header has a container and a payload is defined as a container for the rows without the header row. The row type is known as a record.

There are associated factory classes for the header, the columns, the payloads and the rows.

Containers

The worksheet is contained in eepgwde::detail::DataFrame::ma_any_t.

The worksheet is associated a eepgwde::detail::DataFrame. It is decomposed into a header eepgwde::detail::DataFrame::attr_pos_t and a payload eepgwde::detail::DataFrame::payload_t.

Any column can be obtained as a eepgwde::detail::DataFrame::column_t. The rows can be obtained as records eepgwde::detail::DataFrame::record_t.

See also:: eepgwde::detail::DataFrame::ma_any_t, eepgwde::detail::DataFrame, eepgwde::detail::DataFrame::column_t, eepgwde::detail::record_t

Frame: array container for boost::any cells

The data is loaded into an array of boost::any. The boost::any type has a number of rendering functions available see Input Data Types for QuantLib. The array data type is from boost::multi_array. It's a local typedef eepgwde::detail::DataFrame::ma_any_t. These are known as frames and are stored in a eepgwde::detail::DataFrame by reference.

DataFrame: container for the frame and header information

The DataFrame contains a boost::bimap for the header (the column names) of the CSV data. eepgwde::detail::DataFrame::attr_pos_t is a local typedef for this map.

DataFrame: containers for rows and columns

You can get a container for a column by name: and you can change the type of each column using eepgwde::DataFrame::render()

The eepgwde::detail::column_t data type can be used with eepgwde::detail::columnar to get different containers (list and vector) of the underlying column of data.

Each row of the eepgwde::detail::DataFrame::ma_any_t can be accessed as an iterator eepgwde::detail::DataFrame::iterator0. This can be used with eepgwde::detail::DataFrame::rowar. This can then be used to get different containers (list and map) of the underlying row of data.

The row is eepgwde::detail::record_t. This is a useful map structure. each cell in the record can be accessed by its column name.

DataFrame: containers for records and payloads

A payload is the data of the data-frame without the header. It is effected by using a view on the boost::multi_array; this has the type eepgwde::detail::DataFrame::payload_t and is stored as a a shared pointer.

A eepgwde::detail::DataFrame::payloader class provides a means of converting the payload to a vector of iterators to each record.

This can then be converted to a vector of eepgwde::detail::record_t.

Application developers need to extend the record_t class. This is only a typedef, so an encapsulating class has been put around it.

eepgwde::detail::DataFrame::Record

This class has been designed to be extended and should be used for the data-filtering.

There is a templated cast operator within eepgwde::detail::DataFrame::payloader allows any class to take encapsulate a record_t object.

Usually the encapsultated object's class would be a derivative of Record.

DataFrame: containers for records and payloads

A vector of Record is convenient to work with. It can be used with with the STL sort() and partition() utilities.

After using those, the vector can be given back to the

eepgwde::detail::DataFrame::payloader::as() method and converted back to a frame eepgwde::detail::DataFrame::ma_any_t.

Once that frame is available, it can be passed to eepgwde::detail::DataFrame and you can process it again.