Performance Validation Testing 

This class is utilized to perform thorough testing on GAIuS Agents. Performance Validation Testing (PVT) has three test types:

Classification
Emotive Value
Emotive Polarity

MongoData 

class ia.gaius.pvt.mongo_interface.MongoData(mongo_dataset_details: dict, data_files_collection_name: str, mongo_db: MongoClient, dataset_collection_name: str = 'datasets')

Bases: object

Analogous object to the Data class, but utilizes a MongoDB cursor instead of a directory to reference data records

Start with a MongoDB document containing the name of all dataset files (located separately)

Only retrieve actual data files when calling retrieveDataRecord. Overloaded iterator functions to allow treating of object as a list.

Example

>>> mongo = pymongo.MongoClient('mongodb://mongodb:27017/')
>>> mongo_db = mongo.db['main_database']
>>> dataset_details = {"user_id": "ABCD1",
                    "dataset_id": "iris_0_0_13"}
>>> md = MongoData(mongo_dataset_details=dataset_details,
                mongo_db=mongo_db,
                data_files_collection_name='dataset_files')
>>> md.prep(percent_of_dataset_chosen=50,
            percent_reserved_for_training=50,
            shuffle=True)
>>> md.setIterMode('testing')
>>> for record in md:
...

__init__(mongo_dataset_details: dict, data_files_collection_name: str, mongo_db: MongoClient, dataset_collection_name: str = 'datasets')

Initialized dataset object from MongoDB

Parameters:

mongo_dataset_details (dict) – contains info about the user_id and dataset_id of the dataset. Used to query MongoDB
data_files_collection_name (str) – Collection in MongoDB where individual data records are stored
dataset_collection_name (str) – Collection in MongoDB where master dataset info record is stored
mongo_db (pymongo.MongoClient) – MongoDB client object to use for dataset lookups

Raises:

Exception – user_id field missing from mongo_dataset_details
Exception – dataset_id field missing from mongo_dataset_details
Exception – multiple datasets found pertaining to same user_id and dataset_id field

convertBinaryStringtoSequence(record)

Convert Binary string of multiple GDFs (delimited by newline) into a sequence of JSON objects

Parameters:: record (str, required) – binary string of GDFs to convert
Returns:: list of GDFs in json format
Return type:: list

classmethod delete_dataset(mongo_db, dataset_details: dict)

Upload a dataset to MongoDB from a local filepath

Parameters:

mongo_db (pymongo.MongoClient) – MongoDB database object
dataset_details (dict) – Dictionary containing details about dataset (e.g. name, user_id, dataset_id, collection names)

Returns:

String depicting action that was taken

Return type:

str

Example

from ia.gaius.pvt.mongo_interface import MongoData
...
dataset_details = {"user_id": "user-1234",
                   "dataset_name": "MNIST",
                    "dataset_id": "abba12",
                    "data_files_collection_name": "dataset_files",
                    "dataset_collection_name": "datasets"}
MongoData.delete_dataset(mongo_db=mongo_db,
                         dataset_details=dataset_details)

getSequence(record)

Wrapper function to retrieve a record from MongoDB and convert it into a sequence

Parameters:: record (ObjectId) – The MongoDB ObjectId of the data record to retrieve
Returns:: GDF sequence retrieved from MongoDB
Return type:: list

prep(percent_of_dataset_chosen: float, percent_reserved_for_training: float, shuffle: bool = False)

Prepare the dataset

Parameters:

percent_of_dataset_chosen (float) – The percent of the dataset to utilize
percent_reserved_for_training (float) – The training/testing split for the dataset (e.g. set to 80 for 80/20 training/testing split)
shuffle (bool, optional) – Whether to shuffle the data. Defaults to False.

retrieveDataRecord(document_id: ObjectId)

Retrieve a data record from MongoDB, pertaining to the ObjectId specifed

Parameters:: document_id (ObjectId, required) – data record to retrieve from mongo, located in the collection specified when calling __init__()
Raises:: Exception – Raised when MongoDB document is not found. Shows query performed that failed
Returns:: binary string depicting data sequence stored in MongoDB Document
Return type:: str

setIterMode(mode: str) → None

Set mode to be used for iterating across dataset

Parameters:

mode (str) – set to “training” or “testing” depending on what set of sequences is to be iterated across

Raises:

Exception – When no data is in train_sequences or test_sequences, and prep() should be called first
Exception – When invalid mode specified in mode argument

classmethod upload_dataset(mongo_db, dataset_details: dict, filepath: str)

Upload a dataset to MongoDB from a local filepath

Parameters:

mongo_db (pymongo.MongoClient) – MongoDB Database object
dataset_details (dict) – Dictionary containing details about
dataset (e.g. name, user_id, dataset_id, collection names, etc.) –
filepath (str) – filepath of zip folder containing dataset (GDF records)

Returns:

_description_

Return type:

_type_

Example

from ia.gaius.pvt.mongo_interface import MongoData
...
dataset_details = {"user_id": "user-1234",
                   "dataset_name": "MNIST",
                   "dataset_id": "abba12",
                   "data_files_collection_name": "dataset_files",
                   "dataset_collection_name": "datasets"}
MongoData.upload_dataset(mongo_db=mongo_db,
                         dataset_details=dataset_details)

MongoDataRecords 

class ia.gaius.pvt.mongo_interface.MongoDataRecords(dataset_records, DR: float, DF: float, shuffle: bool)

Bases: object

__init__(dataset_records, DR: float, DF: float, shuffle: bool)

Parameters:

dataset_records (str or list, required) – List of mongo ObjectIds to use as data records
DR (float, required) – fraction of total data to use for testing and training. 0 < DR < 100
DF (float, required) – fraction of the DR to use for training. The rest of the DR is used for testing. 0 < DF < 100
shuffle (bool, required) – whether to shuffle the data when creating sets
class (After creating the) – train_sequences and test_sequences for the data sets
variables (utilize the member) – train_sequences and test_sequences for the data sets

Variables:

train_sequences – the mongo documents to use for training
test_sequences – the mongo documents to use for testing

class ia.gaius.pvt.mongo_interface.MongoResults(mongo_db, result_collection_name: str, log_collection_name: str, test_id: str, user_id: str, dataset_id: str, test_configuration: dict | None = None)

Bases: object

Class to handle saving and linking result data inside MongoDB.

Provides functions to insert single log record during training/testing, save final result after test completion, and remediation/deletion function for test aborting, database cleanup

__init__(mongo_db, result_collection_name: str, log_collection_name: str, test_id: str, user_id: str, dataset_id: str, test_configuration: dict | None = None)

Initialize MongoResults object

Parameters:

mongo_db (pymongo.MongoClient) – Database where the results are to be stored
result_collection_name (str) – collection name to save final test results
log_collection_name (str) – collection name to save testing log documents
test_id (str) – unique-id for the test being conducted
user_id (str) – unique-id for the user conducting the test
dataset_id (str) – unique-id for the dataset being used in the test
test_configuration (dict) – object showing all of the options used for configuring pvt

addLogRecord(type: str, record: dict)

Called during the testing loop to insert a pvt status record into MongoDB

Parameters:

type (str) – Whether the record should be appended to the training or testing logs
record (dict) – the record to insert

Raises:

Exception – Thrown if the type provided is not supported

deleteResults()

Function used to remediate database in the event of a failed/aborted test

Returns:: dict showing the deleted result record, if any
Return type:: dict

reset(): Reset start time and testing/training logs in result_obj

retrieveResults()

Retreive test results from MongoDB based on user_id, dataset_id, and test_id

Raises:: Exception – If dataset master record is not found in database
Returns:: Entire test result object
Return type:: dict

saveResults(final_results: dict)

Save a document in MongoDB, linking the result doc to the logs documents

Parameters:: final_results (dict) – Information pertaining to the results of the test, to be stored in the results object for future use
Returns:: string of the ObjectId saved in MongoDB
Return type:: str

Example

uid = mongo_results.saveResults(final_state)

Performance Validation Testing 