Performance Validation Testing
This class is utilized to perform thorough testing on GAIuS Agents. Performance Validation Testing (PVT) has three test types:
Classification
Emotive Value
Emotive Polarity
Performance Validation Test
- class ia.gaius.pvt.PerformanceValidationTest(agent: AgentClient, ingress_nodes: list, query_nodes: list, num_of_tests: int, pct_of_ds: float, pct_res_4_train: float, test_type: str, dataset_location: str = 'filepath', results_filepath=None, ds_filepath: str | None = None, test_prediction_strategy='continuous', clear_all_memory_before_training: bool = True, turn_prediction_off_during_training: bool = False, shuffle: bool = False, sio=None, task=None, user_id: str | None = None, mongo_db=None, dataset_info: dict | None = None, test_id=None, test_configuration: dict = {}, socket_channel: str = 'pvt_status', QUIET: bool = False)
Bases:
object
Performance Validation Test (PVT) - Splits a GDF folder into training and testing sets. Based on the test type certain visualizations will be produced.
Test types:
Classification
Emotive Value
Emotives Polarity
- __init__(agent: AgentClient, ingress_nodes: list, query_nodes: list, num_of_tests: int, pct_of_ds: float, pct_res_4_train: float, test_type: str, dataset_location: str = 'filepath', results_filepath=None, ds_filepath: str | None = None, test_prediction_strategy='continuous', clear_all_memory_before_training: bool = True, turn_prediction_off_during_training: bool = False, shuffle: bool = False, sio=None, task=None, user_id: str | None = None, mongo_db=None, dataset_info: dict | None = None, test_id=None, test_configuration: dict = {}, socket_channel: str = 'pvt_status', QUIET: bool = False)
Initialize the PVT object with all required parameters for execution
- Parameters:
agent (AgentClient) – GAIuS Agent to use for trainings
ingress_nodes (list) – Ingress nodes for the GAIuS Agent (see
ia.gaius.agent_client.AgentClient.set_query_nodes()
)query_nodes (list) – Query nodes for the GAIuS Agent (see
ia.gaius.agent_client.AgentClient.set_query_nodes()
)num_of_tests (int) – Number of test iterations to complete
pct_of_ds (float) – Percent of the dataset to use for PVT (overall)
pct_res_4_train (float) – Percent of the dataset to be reserved for training
test_type (str) – classification, emotives_value, or emotives_polarity
dataset_location (str) – Location of dataset to utilise, “mongodb”, or “filepath”
results_filepath (_type_) – Where to store PVT results
ds_filepath (str) – Path to the directory containing training GDFs
test_prediction_strategy (str, optional) – _description_. Defaults to “continuous”.
clear_all_memory_before_training (bool, optional) – Whether the GAIuS agent’s memory should be cleared before each training. Defaults to True.
turn_prediction_off_during_training (bool, optional) – Whether predictions should be disabled during training to reduce computational load. Defaults to False.
shuffle (bool, optional) – Whether dataset should be shuffled before each test iteration. Defaults to False.
sio (_type_, optional) – SocketIO object to emit information on. Defaults to None.
task (_type_, optional) – Celery details to emit information about. Defaults to None.
user_id (str, optional) – user_id to emit information to on SocketIO. Defaults to ‘’.
mongo_db (pymongo.MongoClient, optional) – MongoDB where dataset should be retrieved from
dataset_info (dict, optional) – information about how to retrieve dataset, used for MongoDB query. If dataset_location is mongodb, this must have the user_id, dataset_id, results_collection, logs_collection, and data_files_collection_name keys
test_id (str, optional) – unique identifier to be sent with messages about this test. Also used for storing to mongodb
test_configuration (dict, optional) – dictionary storing additional metadata about test configuration, to be saved in mongodb with test results
socket_channel (str, optional) – SocketIO channel to broadcast results on. Defaults to ‘pvt_status’
QUIET (bool, optional) – flag used to disable log output during PVT. Defaults to False
- compute_incidental_probabilities(test_step_info: dict)
Keep track of how well each node is doing during the testing phase. To be used for live visualizations
- Parameters:
test_step_info (dict, required) – Dictionary containing information about the current predicted, actual answers, and other related metrics (e.g. precision, unknowns, residuals, response rate, etc.)
- Returns:
updated test_step_info with the current running accuracy
- Return type:
dict
- conduct_pvt()
Function called to execute the PVT session. Determines test to run based on ‘test_type’ attribute
Results from PVT is stored in the ‘pvt_results’ attribute
Note
A complete example is shown in the
__init__()
function above. Please see that documentation for further information about how to conduct a PVT test
- get_classification_metrics()
Builds classification data structures for each node
- get_emotives_polarity_metrics()
Builds emotives polarity data structures for each node
- get_emotives_value_metrics()
Builds emotives value data structures for each node
- sum_sequence_emotives(sequence)
Sums all emotive values
- test_agent()
Test agent on dataset test sequences provided in self.dataset.test_sequences
- train_agent()
Takes a training set of gdf files, and then trains an agent on those records. The user can turn prediction off if the topology doesn’t have abstractions where prediction is needed to propagate data through the topology.
- update_test_results_w_hive_classification_metrics(pvt_test_result)
Update pvt test result metrics with hive classifications metrics
- update_test_results_w_hive_emotives_polarity_metrics(pvt_test_result)
Update pvt test result metrics with hive emotives polarity metrics
- update_test_results_w_hive_emotives_value_metrics(pvt_test_result)
Update pvt test result metrics with hive classifications metrics
MongoData
- class ia.gaius.pvt.mongo_interface.MongoData(mongo_dataset_details: dict, data_files_collection_name: str, mongo_db: MongoClient, dataset_collection_name: str = 'datasets')
Bases:
object
Analogous object to the Data class, but utilizes a MongoDB cursor instead of a directory to reference data records
Start with a MongoDB document containing the name of all dataset files (located separately)
Only retrieve actual data files when calling retrieveDataRecord. Overloaded iterator functions to allow treating of object as a list.
Example
>>> mongo = pymongo.MongoClient('mongodb://mongodb:27017/') >>> mongo_db = mongo.db['main_database'] >>> dataset_details = {"user_id": "ABCD1", "dataset_id": "iris_0_0_13"} >>> md = MongoData(mongo_dataset_details=dataset_details, mongo_db=mongo_db, data_files_collection_name='dataset_files') >>> md.prep(percent_of_dataset_chosen=50, percent_reserved_for_training=50, shuffle=True) >>> md.setIterMode('testing') >>> for record in md: ...
- __init__(mongo_dataset_details: dict, data_files_collection_name: str, mongo_db: MongoClient, dataset_collection_name: str = 'datasets')
Initialized dataset object from MongoDB
- Parameters:
mongo_dataset_details (dict) – contains info about the user_id and dataset_id of the dataset. Used to query MongoDB
data_files_collection_name (str) – Collection in MongoDB where individual data records are stored
dataset_collection_name (str) – Collection in MongoDB where master dataset info record is stored
mongo_db (pymongo.MongoClient) – MongoDB client object to use for dataset lookups
- Raises:
Exception – user_id field missing from mongo_dataset_details
Exception – dataset_id field missing from mongo_dataset_details
Exception – multiple datasets found pertaining to same user_id and dataset_id field
- convertBinaryStringtoSequence(record)
Convert Binary string of multiple GDFs (delimited by newline) into a sequence of JSON objects
- Parameters:
record (str, required) – binary string of GDFs to convert
- Returns:
list of GDFs in json format
- Return type:
list
- classmethod delete_dataset(mongo_db, dataset_details: dict)
Upload a dataset to MongoDB from a local filepath
- Parameters:
mongo_db (pymongo.MongoClient) – MongoDB database object
dataset_details (dict) – Dictionary containing details about dataset (e.g. name, user_id, dataset_id, collection names)
- Returns:
String depicting action that was taken
- Return type:
str
Example
from ia.gaius.pvt.mongo_interface import MongoData ... dataset_details = {"user_id": "user-1234", "dataset_name": "MNIST", "dataset_id": "abba12", "data_files_collection_name": "dataset_files", "dataset_collection_name": "datasets"} MongoData.delete_dataset(mongo_db=mongo_db, dataset_details=dataset_details)
- getSequence(record)
Wrapper function to retrieve a record from MongoDB and convert it into a sequence
- Parameters:
record (ObjectId) – The MongoDB ObjectId of the data record to retrieve
- Returns:
GDF sequence retrieved from MongoDB
- Return type:
list
- prep(percent_of_dataset_chosen: float, percent_reserved_for_training: float, shuffle: bool = False)
Prepare the dataset
- Parameters:
percent_of_dataset_chosen (float) – The percent of the dataset to utilize
percent_reserved_for_training (float) – The training/testing split for the dataset (e.g. set to 80 for 80/20 training/testing split)
shuffle (bool, optional) – Whether to shuffle the data. Defaults to False.
- retrieveDataRecord(document_id: ObjectId)
Retrieve a data record from MongoDB, pertaining to the ObjectId specifed
- Parameters:
document_id (ObjectId, required) – data record to retrieve from mongo, located in the collection specified when calling
__init__()
- Raises:
Exception – Raised when MongoDB document is not found. Shows query performed that failed
- Returns:
binary string depicting data sequence stored in MongoDB Document
- Return type:
str
- setIterMode(mode: str) None
Set mode to be used for iterating across dataset
- Parameters:
mode (str) – set to “training” or “testing” depending on what set of sequences is to be iterated across
- Raises:
Exception – When no data is in train_sequences or test_sequences, and
prep()
should be called firstException – When invalid mode specified in mode argument
- classmethod upload_dataset(mongo_db, dataset_details: dict, filepath: str)
Upload a dataset to MongoDB from a local filepath
- Parameters:
mongo_db (pymongo.MongoClient) – MongoDB Database object
dataset_details (dict) – Dictionary containing details about
dataset (e.g. name, user_id, dataset_id, collection names, etc.) –
filepath (str) – filepath of zip folder containing dataset (GDF records)
- Returns:
_description_
- Return type:
_type_
Example
from ia.gaius.pvt.mongo_interface import MongoData ... dataset_details = {"user_id": "user-1234", "dataset_name": "MNIST", "dataset_id": "abba12", "data_files_collection_name": "dataset_files", "dataset_collection_name": "datasets"} MongoData.upload_dataset(mongo_db=mongo_db, dataset_details=dataset_details)
MongoDataRecords
- class ia.gaius.pvt.mongo_interface.MongoDataRecords(dataset_records, DR: float, DF: float, shuffle: bool)
Bases:
object
- __init__(dataset_records, DR: float, DF: float, shuffle: bool)
- Parameters:
dataset_records (str or list, required) – List of mongo ObjectIds to use as data records
DR (float, required) – fraction of total data to use for testing and training. 0 < DR < 100
DF (float, required) – fraction of the DR to use for training. The rest of the DR is used for testing. 0 < DF < 100
shuffle (bool, required) – whether to shuffle the data when creating sets
class (After creating the) – train_sequences and test_sequences for the data sets
variables (utilize the member) – train_sequences and test_sequences for the data sets
- Variables:
train_sequences – the mongo documents to use for training
test_sequences – the mongo documents to use for testing
MongoResults
- class ia.gaius.pvt.mongo_interface.MongoResults(mongo_db, result_collection_name: str, log_collection_name: str, test_id: str, user_id: str, dataset_id: str, test_configuration: dict | None = None)
Bases:
object
Class to handle saving and linking result data inside MongoDB.
Provides functions to insert single log record during training/testing, save final result after test completion, and remediation/deletion function for test aborting, database cleanup
- __init__(mongo_db, result_collection_name: str, log_collection_name: str, test_id: str, user_id: str, dataset_id: str, test_configuration: dict | None = None)
Initialize MongoResults object
- Parameters:
mongo_db (pymongo.MongoClient) – Database where the results are to be stored
result_collection_name (str) – collection name to save final test results
log_collection_name (str) – collection name to save testing log documents
test_id (str) – unique-id for the test being conducted
user_id (str) – unique-id for the user conducting the test
dataset_id (str) – unique-id for the dataset being used in the test
test_configuration (dict) – object showing all of the options used for configuring pvt
- addLogRecord(type: str, record: dict)
Called during the testing loop to insert a pvt status record into MongoDB
- Parameters:
type (str) – Whether the record should be appended to the training or testing logs
record (dict) – the record to insert
- Raises:
Exception – Thrown if the type provided is not supported
- deleteResults()
Function used to remediate database in the event of a failed/aborted test
- Returns:
dict showing the deleted result record, if any
- Return type:
dict
- reset()
Reset start time and testing/training logs in result_obj
- retrieveResults()
Retreive test results from MongoDB based on user_id, dataset_id, and test_id
- Raises:
Exception – If dataset master record is not found in database
- Returns:
Entire test result object
- Return type:
dict
- saveResults(final_results: dict)
Save a document in MongoDB, linking the result doc to the logs documents
- Parameters:
final_results (dict) – Information pertaining to the results of the test, to be stored in the results object for future use
- Returns:
string of the ObjectId saved in MongoDB
- Return type:
str
Example
uid = mongo_results.saveResults(final_state)
PVT Utils
Utilities for PVT computations
- ia.gaius.pvt.pvt_utils.classification_metrics_builder(lst_of_labels: list) dict
Create Metrics Data Structure for a classification problem where labels are tracked and used.
- Parameters:
lst_of_labels (list) – list of class labels
- Returns:
Classification data structure
- Return type:
dict
- ia.gaius.pvt.pvt_utils.emotives_polarity_metrics_builder(lst_of_emotives: list) dict
Create Metrics Data Structure for each emotive in testset
- ia.gaius.pvt.pvt_utils.emotives_value_metrics_builder(lst_of_emotives: list) dict
Create Metrics Data Structure for each emotive in testset
- Parameters:
lst_of_emotives (list) – emotives list to populate data structure
- Returns:
emotive metrics data structure
- Return type:
dict
- ia.gaius.pvt.pvt_utils.init_emotive_on_node(emotive: str, node: str, test_step_info: dict)
Helper function to initialize emotive information for live messages. Used if new emotive is encountered during testing (emotive only seen in specific records, not consistently across all)
- Parameters:
emotive (str) – emotive name
node (str) – node to initialize emotive on
test_step_info (dict) – dictionary of live information, which should be initialized with new emotive
- ia.gaius.pvt.pvt_utils.is_notebook() bool
- ia.gaius.pvt.pvt_utils.make_modeled_emotives_(ensemble)
The emotives in the ensemble are of type: ‘emotives’:[{‘e1’: 4, ‘e2’: 5}, {‘e2’: 6}, {‘e1’: 5 ‘e3’: -4}]
- ia.gaius.pvt.pvt_utils.model_per_emotive_(ensemble: dict, emotive: str, potential_normalization_factor: float) float
Using a Weighted Moving Average, though the ‘moving’ part refers to the prediction index.
- Parameters:
ensemble (dict) – prediction ensemble used to model
emotive (str) – emotive name to to model
potential_normalization_factor (float) – normalization factor
- Returns:
final emotive modelled value
- Return type:
float
- ia.gaius.pvt.pvt_utils.plot_confusion_matrix(test_num: int, class_metrics_data_structures: dict)
Takes a node classification test to create a confusion matrix. This version includes the i_dont_know or unknown label.
- ia.gaius.pvt.pvt_utils.retrieve_emotive_val(emotive_name, actual)
Function to parse out emotive value from “actual” response. If emotive not present, return NaN
- Parameters:
emotive_name (str) – name of emotive to retrieve
actual (dict) – dictionary of actual emotive values from test record
- Returns:
value of the specified emotive, or NaN if not present
- Return type:
float