Performance Validation Testing
This class is utilized to perform thorough testing on GAIuS Agents. Performance Validation Testing (PVT) has three test types:
Classification
Emotive Value
Emotive Polarity
Performance Validation Test
MongoData
- class ia.gaius.pvt.mongo_interface.MongoData(mongo_dataset_details: dict, data_files_collection_name: str, mongo_db: MongoClient, dataset_collection_name: str = 'datasets')
Bases:
object
Analogous object to the Data class, but utilizes a MongoDB cursor instead of a directory to reference data records
Start with a MongoDB document containing the name of all dataset files (located separately)
Only retrieve actual data files when calling retrieveDataRecord. Overloaded iterator functions to allow treating of object as a list.
Example
>>> mongo = pymongo.MongoClient('mongodb://mongodb:27017/') >>> mongo_db = mongo.db['main_database'] >>> dataset_details = {"user_id": "ABCD1", "dataset_id": "iris_0_0_13"} >>> md = MongoData(mongo_dataset_details=dataset_details, mongo_db=mongo_db, data_files_collection_name='dataset_files') >>> md.prep(percent_of_dataset_chosen=50, percent_reserved_for_training=50, shuffle=True) >>> md.setIterMode('testing') >>> for record in md: ...
- __init__(mongo_dataset_details: dict, data_files_collection_name: str, mongo_db: MongoClient, dataset_collection_name: str = 'datasets')
Initialized dataset object from MongoDB
- Parameters:
mongo_dataset_details (dict) – contains info about the user_id and dataset_id of the dataset. Used to query MongoDB
data_files_collection_name (str) – Collection in MongoDB where individual data records are stored
dataset_collection_name (str) – Collection in MongoDB where master dataset info record is stored
mongo_db (pymongo.MongoClient) – MongoDB client object to use for dataset lookups
- Raises:
Exception – user_id field missing from mongo_dataset_details
Exception – dataset_id field missing from mongo_dataset_details
Exception – multiple datasets found pertaining to same user_id and dataset_id field
- convertBinaryStringtoSequence(record)
Convert Binary string of multiple GDFs (delimited by newline) into a sequence of JSON objects
- Parameters:
record (str, required) – binary string of GDFs to convert
- Returns:
list of GDFs in json format
- Return type:
list
- classmethod delete_dataset(mongo_db, dataset_details: dict)
Upload a dataset to MongoDB from a local filepath
- Parameters:
mongo_db (pymongo.MongoClient) – MongoDB database object
dataset_details (dict) – Dictionary containing details about dataset (e.g. name, user_id, dataset_id, collection names)
- Returns:
String depicting action that was taken
- Return type:
str
Example
from ia.gaius.pvt.mongo_interface import MongoData ... dataset_details = {"user_id": "user-1234", "dataset_name": "MNIST", "dataset_id": "abba12", "data_files_collection_name": "dataset_files", "dataset_collection_name": "datasets"} MongoData.delete_dataset(mongo_db=mongo_db, dataset_details=dataset_details)
- getSequence(record)
Wrapper function to retrieve a record from MongoDB and convert it into a sequence
- Parameters:
record (ObjectId) – The MongoDB ObjectId of the data record to retrieve
- Returns:
GDF sequence retrieved from MongoDB
- Return type:
list
- prep(percent_of_dataset_chosen: float, percent_reserved_for_training: float, shuffle: bool = False)
Prepare the dataset
- Parameters:
percent_of_dataset_chosen (float) – The percent of the dataset to utilize
percent_reserved_for_training (float) – The training/testing split for the dataset (e.g. set to 80 for 80/20 training/testing split)
shuffle (bool, optional) – Whether to shuffle the data. Defaults to False.
- retrieveDataRecord(document_id: ObjectId)
Retrieve a data record from MongoDB, pertaining to the ObjectId specifed
- Parameters:
document_id (ObjectId, required) – data record to retrieve from mongo, located in the collection specified when calling
__init__()
- Raises:
Exception – Raised when MongoDB document is not found. Shows query performed that failed
- Returns:
binary string depicting data sequence stored in MongoDB Document
- Return type:
str
- setIterMode(mode: str) None
Set mode to be used for iterating across dataset
- Parameters:
mode (str) – set to “training” or “testing” depending on what set of sequences is to be iterated across
- Raises:
Exception – When no data is in train_sequences or test_sequences, and
prep()
should be called firstException – When invalid mode specified in mode argument
- classmethod upload_dataset(mongo_db, dataset_details: dict, filepath: str)
Upload a dataset to MongoDB from a local filepath
- Parameters:
mongo_db (pymongo.MongoClient) – MongoDB Database object
dataset_details (dict) – Dictionary containing details about
dataset (e.g. name, user_id, dataset_id, collection names, etc.) –
filepath (str) – filepath of zip folder containing dataset (GDF records)
- Returns:
_description_
- Return type:
_type_
Example
from ia.gaius.pvt.mongo_interface import MongoData ... dataset_details = {"user_id": "user-1234", "dataset_name": "MNIST", "dataset_id": "abba12", "data_files_collection_name": "dataset_files", "dataset_collection_name": "datasets"} MongoData.upload_dataset(mongo_db=mongo_db, dataset_details=dataset_details)
MongoDataRecords
- class ia.gaius.pvt.mongo_interface.MongoDataRecords(dataset_records, DR: float, DF: float, shuffle: bool)
Bases:
object
- __init__(dataset_records, DR: float, DF: float, shuffle: bool)
- Parameters:
dataset_records (str or list, required) – List of mongo ObjectIds to use as data records
DR (float, required) – fraction of total data to use for testing and training. 0 < DR < 100
DF (float, required) – fraction of the DR to use for training. The rest of the DR is used for testing. 0 < DF < 100
shuffle (bool, required) – whether to shuffle the data when creating sets
class (After creating the) – train_sequences and test_sequences for the data sets
variables (utilize the member) – train_sequences and test_sequences for the data sets
- Variables:
train_sequences – the mongo documents to use for training
test_sequences – the mongo documents to use for testing
MongoResults
- class ia.gaius.pvt.mongo_interface.MongoResults(mongo_db, result_collection_name: str, log_collection_name: str, test_id: str, user_id: str, dataset_id: str, test_configuration: dict | None = None)
Bases:
object
Class to handle saving and linking result data inside MongoDB.
Provides functions to insert single log record during training/testing, save final result after test completion, and remediation/deletion function for test aborting, database cleanup
- __init__(mongo_db, result_collection_name: str, log_collection_name: str, test_id: str, user_id: str, dataset_id: str, test_configuration: dict | None = None)
Initialize MongoResults object
- Parameters:
mongo_db (pymongo.MongoClient) – Database where the results are to be stored
result_collection_name (str) – collection name to save final test results
log_collection_name (str) – collection name to save testing log documents
test_id (str) – unique-id for the test being conducted
user_id (str) – unique-id for the user conducting the test
dataset_id (str) – unique-id for the dataset being used in the test
test_configuration (dict) – object showing all of the options used for configuring pvt
- addLogRecord(type: str, record: dict)
Called during the testing loop to insert a pvt status record into MongoDB
- Parameters:
type (str) – Whether the record should be appended to the training or testing logs
record (dict) – the record to insert
- Raises:
Exception – Thrown if the type provided is not supported
- deleteResults()
Function used to remediate database in the event of a failed/aborted test
- Returns:
dict showing the deleted result record, if any
- Return type:
dict
- reset()
Reset start time and testing/training logs in result_obj
- retrieveResults()
Retreive test results from MongoDB based on user_id, dataset_id, and test_id
- Raises:
Exception – If dataset master record is not found in database
- Returns:
Entire test result object
- Return type:
dict
- saveResults(final_results: dict)
Save a document in MongoDB, linking the result doc to the logs documents
- Parameters:
final_results (dict) – Information pertaining to the results of the test, to be stored in the results object for future use
- Returns:
string of the ObjectId saved in MongoDB
- Return type:
str
Example
uid = mongo_results.saveResults(final_state)