Performance Validation Testing

This class is utilized to perform thorough testing on GAIuS Agents. Performance Validation Testing (PVT) has three test types:

  • Classification

  • Emotive Value

  • Emotive Polarity

Performance Validation Test

class ia.gaius.pvt.PerformanceValidationTest(agent: AgentClient, ingress_nodes: list, query_nodes: list, num_of_tests: int, pct_of_ds: float, pct_res_4_train: float, test_type: str, dataset_location: str = 'filepath', results_filepath=None, ds_filepath: str | None = None, test_prediction_strategy='continuous', clear_all_memory_before_training: bool = True, turn_prediction_off_during_training: bool = False, shuffle: bool = False, sio=None, task=None, user_id: str | None = None, mongo_db=None, dataset_info: dict | None = None, test_id=None, test_configuration: dict = {}, socket_channel: str = 'pvt_status', QUIET: bool = False)

Bases: object

Performance Validation Test (PVT) - Splits a GDF folder into training and testing sets. Based on the test type certain visualizations will be produced.

Test types:

  • Classification

  • Emotive Value

  • Emotives Polarity

__init__(agent: AgentClient, ingress_nodes: list, query_nodes: list, num_of_tests: int, pct_of_ds: float, pct_res_4_train: float, test_type: str, dataset_location: str = 'filepath', results_filepath=None, ds_filepath: str | None = None, test_prediction_strategy='continuous', clear_all_memory_before_training: bool = True, turn_prediction_off_during_training: bool = False, shuffle: bool = False, sio=None, task=None, user_id: str | None = None, mongo_db=None, dataset_info: dict | None = None, test_id=None, test_configuration: dict = {}, socket_channel: str = 'pvt_status', QUIET: bool = False)

Initialize the PVT object with all required parameters for execution

Parameters:
  • agent (AgentClient) – GAIuS Agent to use for trainings

  • ingress_nodes (list) – Ingress nodes for the GAIuS Agent (see ia.gaius.agent_client.AgentClient.set_query_nodes())

  • query_nodes (list) – Query nodes for the GAIuS Agent (see ia.gaius.agent_client.AgentClient.set_query_nodes())

  • num_of_tests (int) – Number of test iterations to complete

  • pct_of_ds (float) – Percent of the dataset to use for PVT (overall)

  • pct_res_4_train (float) – Percent of the dataset to be reserved for training

  • test_type (str) – classification, emotives_value, or emotives_polarity

  • dataset_location (str) – Location of dataset to utilise, “mongodb”, or “filepath”

  • results_filepath (_type_) – Where to store PVT results

  • ds_filepath (str) – Path to the directory containing training GDFs

  • test_prediction_strategy (str, optional) – _description_. Defaults to “continuous”.

  • clear_all_memory_before_training (bool, optional) – Whether the GAIuS agent’s memory should be cleared before each training. Defaults to True.

  • turn_prediction_off_during_training (bool, optional) – Whether predictions should be disabled during training to reduce computational load. Defaults to False.

  • shuffle (bool, optional) – Whether dataset should be shuffled before each test iteration. Defaults to False.

  • sio (_type_, optional) – SocketIO object to emit information on. Defaults to None.

  • task (_type_, optional) – Celery details to emit information about. Defaults to None.

  • user_id (str, optional) – user_id to emit information to on SocketIO. Defaults to ‘’.

  • mongo_db (pymongo.MongoClient, optional) – MongoDB where dataset should be retrieved from

  • dataset_info (dict, optional) – information about how to retrieve dataset, used for MongoDB query. If dataset_location is mongodb, this must have the user_id, dataset_id, results_collection, logs_collection, and data_files_collection_name keys

  • test_id (str, optional) – unique identifier to be sent with messages about this test. Also used for storing to mongodb

  • test_configuration (dict, optional) – dictionary storing additional metadata about test configuration, to be saved in mongodb with test results

  • socket_channel (str, optional) – SocketIO channel to broadcast results on. Defaults to ‘pvt_status’

  • QUIET (bool, optional) – flag used to disable log output during PVT. Defaults to False

compute_incidental_probabilities(test_step_info: dict)

Keep track of how well each node is doing during the testing phase. To be used for live visualizations

Parameters:

test_step_info (dict, required) – Dictionary containing information about the current predicted, actual answers, and other related metrics (e.g. precision, unknowns, residuals, response rate, etc.)

Returns:

updated test_step_info with the current running accuracy

Return type:

dict

conduct_pvt()

Function called to execute the PVT session. Determines test to run based on ‘test_type’ attribute

Results from PVT is stored in the ‘pvt_results’ attribute

Note

A complete example is shown in the __init__() function above. Please see that documentation for further information about how to conduct a PVT test

get_classification_metrics()

Builds classification data structures for each node

get_emotives_polarity_metrics()

Builds emotives polarity data structures for each node

get_emotives_value_metrics()

Builds emotives value data structures for each node

sum_sequence_emotives(sequence)

Sums all emotive values

test_agent()

Test agent on dataset test sequences provided in self.dataset.test_sequences

train_agent()

Takes a training set of gdf files, and then trains an agent on those records. The user can turn prediction off if the topology doesn’t have abstractions where prediction is needed to propagate data through the topology.

update_test_results_w_hive_classification_metrics(pvt_test_result)

Update pvt test result metrics with hive classifications metrics

update_test_results_w_hive_emotives_polarity_metrics(pvt_test_result)

Update pvt test result metrics with hive emotives polarity metrics

update_test_results_w_hive_emotives_value_metrics(pvt_test_result)

Update pvt test result metrics with hive classifications metrics

MongoData

class ia.gaius.pvt.mongo_interface.MongoData(mongo_dataset_details: dict, data_files_collection_name: str, mongo_db: MongoClient, dataset_collection_name: str = 'datasets')

Bases: object

Analogous object to the Data class, but utilizes a MongoDB cursor instead of a directory to reference data records

Start with a MongoDB document containing the name of all dataset files (located separately)

Only retrieve actual data files when calling retrieveDataRecord. Overloaded iterator functions to allow treating of object as a list.

Example

>>> mongo = pymongo.MongoClient('mongodb://mongodb:27017/')
>>> mongo_db = mongo.db['main_database']
>>> dataset_details = {"user_id": "ABCD1",
                    "dataset_id": "iris_0_0_13"}
>>> md = MongoData(mongo_dataset_details=dataset_details,
                mongo_db=mongo_db,
                data_files_collection_name='dataset_files')
>>> md.prep(percent_of_dataset_chosen=50,
            percent_reserved_for_training=50,
            shuffle=True)
>>> md.setIterMode('testing')
>>> for record in md:
...
__init__(mongo_dataset_details: dict, data_files_collection_name: str, mongo_db: MongoClient, dataset_collection_name: str = 'datasets')

Initialized dataset object from MongoDB

Parameters:
  • mongo_dataset_details (dict) – contains info about the user_id and dataset_id of the dataset. Used to query MongoDB

  • data_files_collection_name (str) – Collection in MongoDB where individual data records are stored

  • dataset_collection_name (str) – Collection in MongoDB where master dataset info record is stored

  • mongo_db (pymongo.MongoClient) – MongoDB client object to use for dataset lookups

Raises:
  • Exception – user_id field missing from mongo_dataset_details

  • Exception – dataset_id field missing from mongo_dataset_details

  • Exception – multiple datasets found pertaining to same user_id and dataset_id field

convertBinaryStringtoSequence(record)

Convert Binary string of multiple GDFs (delimited by newline) into a sequence of JSON objects

Parameters:

record (str, required) – binary string of GDFs to convert

Returns:

list of GDFs in json format

Return type:

list

classmethod delete_dataset(mongo_db, dataset_details: dict)

Upload a dataset to MongoDB from a local filepath

Parameters:
  • mongo_db (pymongo.MongoClient) – MongoDB database object

  • dataset_details (dict) – Dictionary containing details about dataset (e.g. name, user_id, dataset_id, collection names)

Returns:

String depicting action that was taken

Return type:

str

Example

from ia.gaius.pvt.mongo_interface import MongoData
...
dataset_details = {"user_id": "user-1234",
                   "dataset_name": "MNIST",
                    "dataset_id": "abba12",
                    "data_files_collection_name": "dataset_files",
                    "dataset_collection_name": "datasets"}
MongoData.delete_dataset(mongo_db=mongo_db,
                         dataset_details=dataset_details)
getSequence(record)

Wrapper function to retrieve a record from MongoDB and convert it into a sequence

Parameters:

record (ObjectId) – The MongoDB ObjectId of the data record to retrieve

Returns:

GDF sequence retrieved from MongoDB

Return type:

list

prep(percent_of_dataset_chosen: float, percent_reserved_for_training: float, shuffle: bool = False)

Prepare the dataset

Parameters:
  • percent_of_dataset_chosen (float) – The percent of the dataset to utilize

  • percent_reserved_for_training (float) – The training/testing split for the dataset (e.g. set to 80 for 80/20 training/testing split)

  • shuffle (bool, optional) – Whether to shuffle the data. Defaults to False.

retrieveDataRecord(document_id: ObjectId)

Retrieve a data record from MongoDB, pertaining to the ObjectId specifed

Parameters:

document_id (ObjectId, required) – data record to retrieve from mongo, located in the collection specified when calling __init__()

Raises:

Exception – Raised when MongoDB document is not found. Shows query performed that failed

Returns:

binary string depicting data sequence stored in MongoDB Document

Return type:

str

setIterMode(mode: str) None

Set mode to be used for iterating across dataset

Parameters:

mode (str) – set to “training” or “testing” depending on what set of sequences is to be iterated across

Raises:
  • Exception – When no data is in train_sequences or test_sequences, and prep() should be called first

  • Exception – When invalid mode specified in mode argument

classmethod upload_dataset(mongo_db, dataset_details: dict, filepath: str)

Upload a dataset to MongoDB from a local filepath

Parameters:
  • mongo_db (pymongo.MongoClient) – MongoDB Database object

  • dataset_details (dict) – Dictionary containing details about

  • dataset (e.g. name, user_id, dataset_id, collection names, etc.) –

  • filepath (str) – filepath of zip folder containing dataset (GDF records)

Returns:

_description_

Return type:

_type_

Example

from ia.gaius.pvt.mongo_interface import MongoData
...
dataset_details = {"user_id": "user-1234",
                   "dataset_name": "MNIST",
                   "dataset_id": "abba12",
                   "data_files_collection_name": "dataset_files",
                   "dataset_collection_name": "datasets"}
MongoData.upload_dataset(mongo_db=mongo_db,
                         dataset_details=dataset_details)

MongoDataRecords

class ia.gaius.pvt.mongo_interface.MongoDataRecords(dataset_records, DR: float, DF: float, shuffle: bool)

Bases: object

__init__(dataset_records, DR: float, DF: float, shuffle: bool)
Parameters:
  • dataset_records (str or list, required) – List of mongo ObjectIds to use as data records

  • DR (float, required) – fraction of total data to use for testing and training. 0 < DR < 100

  • DF (float, required) – fraction of the DR to use for training. The rest of the DR is used for testing. 0 < DF < 100

  • shuffle (bool, required) – whether to shuffle the data when creating sets

  • class (After creating the) – train_sequences and test_sequences for the data sets

  • variables (utilize the member) – train_sequences and test_sequences for the data sets

Variables:
  • train_sequences – the mongo documents to use for training

  • test_sequences – the mongo documents to use for testing

MongoResults

class ia.gaius.pvt.mongo_interface.MongoResults(mongo_db, result_collection_name: str, log_collection_name: str, test_id: str, user_id: str, dataset_id: str, test_configuration: dict | None = None)

Bases: object

Class to handle saving and linking result data inside MongoDB.

Provides functions to insert single log record during training/testing, save final result after test completion, and remediation/deletion function for test aborting, database cleanup

__init__(mongo_db, result_collection_name: str, log_collection_name: str, test_id: str, user_id: str, dataset_id: str, test_configuration: dict | None = None)

Initialize MongoResults object

Parameters:
  • mongo_db (pymongo.MongoClient) – Database where the results are to be stored

  • result_collection_name (str) – collection name to save final test results

  • log_collection_name (str) – collection name to save testing log documents

  • test_id (str) – unique-id for the test being conducted

  • user_id (str) – unique-id for the user conducting the test

  • dataset_id (str) – unique-id for the dataset being used in the test

  • test_configuration (dict) – object showing all of the options used for configuring pvt

addLogRecord(type: str, record: dict)

Called during the testing loop to insert a pvt status record into MongoDB

Parameters:
  • type (str) – Whether the record should be appended to the training or testing logs

  • record (dict) – the record to insert

Raises:

Exception – Thrown if the type provided is not supported

deleteResults()

Function used to remediate database in the event of a failed/aborted test

Returns:

dict showing the deleted result record, if any

Return type:

dict

reset()

Reset start time and testing/training logs in result_obj

retrieveResults()

Retreive test results from MongoDB based on user_id, dataset_id, and test_id

Raises:

Exception – If dataset master record is not found in database

Returns:

Entire test result object

Return type:

dict

saveResults(final_results: dict)

Save a document in MongoDB, linking the result doc to the logs documents

Parameters:

final_results (dict) – Information pertaining to the results of the test, to be stored in the results object for future use

Returns:

string of the ObjectId saved in MongoDB

Return type:

str

Example

uid = mongo_results.saveResults(final_state)

PVT Utils

Utilities for PVT computations

ia.gaius.pvt.pvt_utils.classification_metrics_builder(lst_of_labels: list) dict

Create Metrics Data Structure for a classification problem where labels are tracked and used.

Parameters:

lst_of_labels (list) – list of class labels

Returns:

Classification data structure

Return type:

dict

ia.gaius.pvt.pvt_utils.emotives_polarity_metrics_builder(lst_of_emotives: list) dict

Create Metrics Data Structure for each emotive in testset

ia.gaius.pvt.pvt_utils.emotives_value_metrics_builder(lst_of_emotives: list) dict

Create Metrics Data Structure for each emotive in testset

Parameters:

lst_of_emotives (list) – emotives list to populate data structure

Returns:

emotive metrics data structure

Return type:

dict

ia.gaius.pvt.pvt_utils.init_emotive_on_node(emotive: str, node: str, test_step_info: dict)

Helper function to initialize emotive information for live messages. Used if new emotive is encountered during testing (emotive only seen in specific records, not consistently across all)

Parameters:
  • emotive (str) – emotive name

  • node (str) – node to initialize emotive on

  • test_step_info (dict) – dictionary of live information, which should be initialized with new emotive

ia.gaius.pvt.pvt_utils.is_notebook() bool
ia.gaius.pvt.pvt_utils.make_modeled_emotives_(ensemble)

The emotives in the ensemble are of type: ‘emotives’:[{‘e1’: 4, ‘e2’: 5}, {‘e2’: 6}, {‘e1’: 5 ‘e3’: -4}]

ia.gaius.pvt.pvt_utils.model_per_emotive_(ensemble: dict, emotive: str, potential_normalization_factor: float) float

Using a Weighted Moving Average, though the ‘moving’ part refers to the prediction index.

Parameters:
  • ensemble (dict) – prediction ensemble used to model

  • emotive (str) – emotive name to to model

  • potential_normalization_factor (float) – normalization factor

Returns:

final emotive modelled value

Return type:

float

ia.gaius.pvt.pvt_utils.plot_confusion_matrix(test_num: int, class_metrics_data_structures: dict)

Takes a node classification test to create a confusion matrix. This version includes the i_dont_know or unknown label.

ia.gaius.pvt.pvt_utils.retrieve_emotive_val(emotive_name, actual)

Function to parse out emotive value from “actual” response. If emotive not present, return NaN

Parameters:
  • emotive_name (str) – name of emotive to retrieve

  • actual (dict) – dictionary of actual emotive values from test record

Returns:

value of the specified emotive, or NaN if not present

Return type:

float