Performance Validation Testing

This class is utilized to perform thorough testing on GAIuS Agents. Performance Validation Testing (PVT) has three test types:

  • Classification

  • Emotive Value

  • Emotive Polarity

Performance Validation Test

MongoData

class ia.gaius.pvt.mongo_interface.MongoData(mongo_dataset_details: dict, data_files_collection_name: str, mongo_db: MongoClient, dataset_collection_name: str = 'datasets')

Bases: object

Analogous object to the Data class, but utilizes a MongoDB cursor instead of a directory to reference data records

Start with a MongoDB document containing the name of all dataset files (located separately)

Only retrieve actual data files when calling retrieveDataRecord. Overloaded iterator functions to allow treating of object as a list.

Example

>>> mongo = pymongo.MongoClient('mongodb://mongodb:27017/')
>>> mongo_db = mongo.db['main_database']
>>> dataset_details = {"user_id": "ABCD1",
                    "dataset_id": "iris_0_0_13"}
>>> md = MongoData(mongo_dataset_details=dataset_details,
                mongo_db=mongo_db,
                data_files_collection_name='dataset_files')
>>> md.prep(percent_of_dataset_chosen=50,
            percent_reserved_for_training=50,
            shuffle=True)
>>> md.setIterMode('testing')
>>> for record in md:
...
__init__(mongo_dataset_details: dict, data_files_collection_name: str, mongo_db: MongoClient, dataset_collection_name: str = 'datasets')

Initialized dataset object from MongoDB

Parameters:
  • mongo_dataset_details (dict) – contains info about the user_id and dataset_id of the dataset. Used to query MongoDB

  • data_files_collection_name (str) – Collection in MongoDB where individual data records are stored

  • dataset_collection_name (str) – Collection in MongoDB where master dataset info record is stored

  • mongo_db (pymongo.MongoClient) – MongoDB client object to use for dataset lookups

Raises:
  • Exception – user_id field missing from mongo_dataset_details

  • Exception – dataset_id field missing from mongo_dataset_details

  • Exception – multiple datasets found pertaining to same user_id and dataset_id field

convertBinaryStringtoSequence(record)

Convert Binary string of multiple GDFs (delimited by newline) into a sequence of JSON objects

Parameters:

record (str, required) – binary string of GDFs to convert

Returns:

list of GDFs in json format

Return type:

list

classmethod delete_dataset(mongo_db, dataset_details: dict)

Upload a dataset to MongoDB from a local filepath

Parameters:
  • mongo_db (pymongo.MongoClient) – MongoDB database object

  • dataset_details (dict) – Dictionary containing details about dataset (e.g. name, user_id, dataset_id, collection names)

Returns:

String depicting action that was taken

Return type:

str

Example

from ia.gaius.pvt.mongo_interface import MongoData
...
dataset_details = {"user_id": "user-1234",
                   "dataset_name": "MNIST",
                    "dataset_id": "abba12",
                    "data_files_collection_name": "dataset_files",
                    "dataset_collection_name": "datasets"}
MongoData.delete_dataset(mongo_db=mongo_db,
                         dataset_details=dataset_details)
getSequence(record)

Wrapper function to retrieve a record from MongoDB and convert it into a sequence

Parameters:

record (ObjectId) – The MongoDB ObjectId of the data record to retrieve

Returns:

GDF sequence retrieved from MongoDB

Return type:

list

prep(percent_of_dataset_chosen: float, percent_reserved_for_training: float, shuffle: bool = False)

Prepare the dataset

Parameters:
  • percent_of_dataset_chosen (float) – The percent of the dataset to utilize

  • percent_reserved_for_training (float) – The training/testing split for the dataset (e.g. set to 80 for 80/20 training/testing split)

  • shuffle (bool, optional) – Whether to shuffle the data. Defaults to False.

retrieveDataRecord(document_id: ObjectId)

Retrieve a data record from MongoDB, pertaining to the ObjectId specifed

Parameters:

document_id (ObjectId, required) – data record to retrieve from mongo, located in the collection specified when calling __init__()

Raises:

Exception – Raised when MongoDB document is not found. Shows query performed that failed

Returns:

binary string depicting data sequence stored in MongoDB Document

Return type:

str

setIterMode(mode: str) None

Set mode to be used for iterating across dataset

Parameters:

mode (str) – set to “training” or “testing” depending on what set of sequences is to be iterated across

Raises:
  • Exception – When no data is in train_sequences or test_sequences, and prep() should be called first

  • Exception – When invalid mode specified in mode argument

classmethod upload_dataset(mongo_db, dataset_details: dict, filepath: str)

Upload a dataset to MongoDB from a local filepath

Parameters:
  • mongo_db (pymongo.MongoClient) – MongoDB Database object

  • dataset_details (dict) – Dictionary containing details about

  • dataset (e.g. name, user_id, dataset_id, collection names, etc.) –

  • filepath (str) – filepath of zip folder containing dataset (GDF records)

Returns:

_description_

Return type:

_type_

Example

from ia.gaius.pvt.mongo_interface import MongoData
...
dataset_details = {"user_id": "user-1234",
                   "dataset_name": "MNIST",
                   "dataset_id": "abba12",
                   "data_files_collection_name": "dataset_files",
                   "dataset_collection_name": "datasets"}
MongoData.upload_dataset(mongo_db=mongo_db,
                         dataset_details=dataset_details)

MongoDataRecords

class ia.gaius.pvt.mongo_interface.MongoDataRecords(dataset_records, DR: float, DF: float, shuffle: bool)

Bases: object

__init__(dataset_records, DR: float, DF: float, shuffle: bool)
Parameters:
  • dataset_records (str or list, required) – List of mongo ObjectIds to use as data records

  • DR (float, required) – fraction of total data to use for testing and training. 0 < DR < 100

  • DF (float, required) – fraction of the DR to use for training. The rest of the DR is used for testing. 0 < DF < 100

  • shuffle (bool, required) – whether to shuffle the data when creating sets

  • class (After creating the) – train_sequences and test_sequences for the data sets

  • variables (utilize the member) – train_sequences and test_sequences for the data sets

Variables:
  • train_sequences – the mongo documents to use for training

  • test_sequences – the mongo documents to use for testing

MongoResults

class ia.gaius.pvt.mongo_interface.MongoResults(mongo_db, result_collection_name: str, log_collection_name: str, test_id: str, user_id: str, dataset_id: str, test_configuration: dict | None = None)

Bases: object

Class to handle saving and linking result data inside MongoDB.

Provides functions to insert single log record during training/testing, save final result after test completion, and remediation/deletion function for test aborting, database cleanup

__init__(mongo_db, result_collection_name: str, log_collection_name: str, test_id: str, user_id: str, dataset_id: str, test_configuration: dict | None = None)

Initialize MongoResults object

Parameters:
  • mongo_db (pymongo.MongoClient) – Database where the results are to be stored

  • result_collection_name (str) – collection name to save final test results

  • log_collection_name (str) – collection name to save testing log documents

  • test_id (str) – unique-id for the test being conducted

  • user_id (str) – unique-id for the user conducting the test

  • dataset_id (str) – unique-id for the dataset being used in the test

  • test_configuration (dict) – object showing all of the options used for configuring pvt

addLogRecord(type: str, record: dict)

Called during the testing loop to insert a pvt status record into MongoDB

Parameters:
  • type (str) – Whether the record should be appended to the training or testing logs

  • record (dict) – the record to insert

Raises:

Exception – Thrown if the type provided is not supported

deleteResults()

Function used to remediate database in the event of a failed/aborted test

Returns:

dict showing the deleted result record, if any

Return type:

dict

reset()

Reset start time and testing/training logs in result_obj

retrieveResults()

Retreive test results from MongoDB based on user_id, dataset_id, and test_id

Raises:

Exception – If dataset master record is not found in database

Returns:

Entire test result object

Return type:

dict

saveResults(final_results: dict)

Save a document in MongoDB, linking the result doc to the logs documents

Parameters:

final_results (dict) – Information pertaining to the results of the test, to be stored in the results object for future use

Returns:

string of the ObjectId saved in MongoDB

Return type:

str

Example

uid = mongo_results.saveResults(final_state)

PVT Utils