Defining a Benchmark
We have defined each individual piece of our machine learning experiment.
We now need to define a code logic that uses all of them to train and evaluate our SVM classifier on the IMDB dataset.
SVCBenchmark
We define a Component
that wraps up data loading, data processing, model definition, model training, and model evaluation.
class SVCBenchmark(RunnableComponent):
def __init__(
self,
data_loader: IMDBLoader,
model: SVCModel,
text_processor: TfIdfProcessor,
label_processor: LabelProcessor
):
self.data_loader = data_loader
self.model = model
self.text_processor = text_processor
self.label_processor = label_processor
def run(
self,
config: Optional[cinnamon.configuration.Configuration] = None
):
logging.basicConfig()
train_df, val_df, test_df = self.data_loader.get_splits()
x_train = self.text_processor.process(data=train_df, is_training_data=True)
y_train = self.label_processor.process(data=train_df, is_training_data=True)
x_val = self.text_processor.process(data=val_df)
y_val = self.label_processor.process(data=val_df)
x_test = self.text_processor.process(data=test_df)
y_test = self.label_processor.process(data=test_df)
train_info, val_info = self.model.fit(x_train=x_train, y_train=y_train,
x_val=x_val, y_val=y_val)
test_info = self.model.evaluate(x=x_test, y=y_test)
logging.info(f'Train info:\n{train_info}')
logging.info(f'Val info:\n{val_info}')
logging.info(f'Test info:\n{test_info}')
Note
The __init__
of SVCBenchmark
takes built Component
instances. This is automatically handled by cinnamon.
If you want to work with RegistrationKey
(e.g., some components require additional attributes to initialize), set build_recursively=False
in register
and register_method
.
SVCBenchmarkConfig
We then define the corresponding SVCBenchmarkConfig
.
Notice how this configuration is an example of nested configuration where some Param
point to RegistrationKey
.
class SVCBenchmarkConfig(Configuration):
@classmethod
@register_method(name='benchmark',
tags={'svc'},
namespace='examples',
component_class=SVCBenchmark)
def default(
cls
):
config = super().default()
config.add(name='data_loader',
value=RegistrationKey(name='data_loader',
tags={'imdb'},
namespace='examples'))
config.add(name='text_processor',
value=RegistrationKey(name='processor',
tags={'tf-idf'},
namespace='examples'))
config.add(name='label_processor',
value=RegistrationKey(name='processor',
tags={'label'},
namespace='examples'))
config.add(name='model',
value=RegistrationKey(name='model',
tags={'svc'},
namespace='examples'))
return config
Running SVCBenchmark
We can now write a script to test SVCBenchmark
.
from pathlib import Path
from cinnamon.registry import Registry
from components.benchmark import SVCBenchmark
if __name__ == '__main__':
"""
In this demo script, we retrieve and build our SVC pipeline.
The pipeline covers data loading, data processing, and model evaluation.
"""
directory = Path(__file__).parent.parent.resolve()
Registry.setup(directory=directory)
benchmark = SVCBenchmark.build_component(name='benchmark',
tags={'svc'},
namespace='examples')
benchmark.run()
Congratulations!
That’s it! We have successfully defined a customizable, plug-and-play, and re-usable machine-learning pipeline.
Feel free to play to download this repository and play with Component
and Configuration
.
Cheers!