Defining a Benchmark

We have defined each individual piece of our machine learning experiment.

We now need to define a code logic that uses all of them to train and evaluate our SVM classifier on the IMDB dataset.

SVCBenchmark

We define a Component that wraps up data loading, data processing, model definition, model training, and model evaluation.

class SVCBenchmark(RunnableComponent):

    def __init__(
            self,
            data_loader: IMDBLoader,
            model: SVCModel,
            text_processor: TfIdfProcessor,
            label_processor: LabelProcessor
    ):
        self.data_loader = data_loader
        self.model = model
        self.text_processor = text_processor
        self.label_processor = label_processor

    def run(
            self,
            config: Optional[cinnamon.configuration.Configuration] = None
    ):
        logging.basicConfig()

        train_df, val_df, test_df = self.data_loader.get_splits()

        x_train = self.text_processor.process(data=train_df, is_training_data=True)
        y_train = self.label_processor.process(data=train_df, is_training_data=True)

        x_val = self.text_processor.process(data=val_df)
        y_val = self.label_processor.process(data=val_df)

        x_test = self.text_processor.process(data=test_df)
        y_test = self.label_processor.process(data=test_df)

        train_info, val_info = self.model.fit(x_train=x_train, y_train=y_train,
                                              x_val=x_val, y_val=y_val)
        test_info = self.model.evaluate(x=x_test, y=y_test)

        logging.info(f'Train info:\n{train_info}')
        logging.info(f'Val info:\n{val_info}')
        logging.info(f'Test info:\n{test_info}')

Note

The __init__ of SVCBenchmark takes built Component instances. This is automatically handled by cinnamon. If you want to work with RegistrationKey (e.g., some components require additional attributes to initialize), set build_recursively=False in register and register_method.

SVCBenchmarkConfig

We then define the corresponding SVCBenchmarkConfig.

Notice how this configuration is an example of nested configuration where some Param point to RegistrationKey.

class SVCBenchmarkConfig(Configuration):

    @classmethod
    @register_method(name='benchmark',
                     tags={'svc'},
                     namespace='examples',
                     component_class=SVCBenchmark)
    def default(
            cls
    ):
        config = super().default()

        config.add(name='data_loader',
                   value=RegistrationKey(name='data_loader',
                                         tags={'imdb'},
                                         namespace='examples'))

        config.add(name='text_processor',
                   value=RegistrationKey(name='processor',
                                         tags={'tf-idf'},
                                         namespace='examples'))
        config.add(name='label_processor',
                   value=RegistrationKey(name='processor',
                                         tags={'label'},
                                         namespace='examples'))

        config.add(name='model',
                   value=RegistrationKey(name='model',
                                         tags={'svc'},
                                         namespace='examples'))

        return config

Running SVCBenchmark

We can now write a script to test SVCBenchmark.

from pathlib import Path

from cinnamon.registry import Registry
from components.benchmark import SVCBenchmark

if __name__ == '__main__':
    """
    In this demo script, we retrieve and build our SVC pipeline.
    The pipeline covers data loading, data processing, and model evaluation.
    """

    directory = Path(__file__).parent.parent.resolve()
    Registry.setup(directory=directory)

    benchmark = SVCBenchmark.build_component(name='benchmark',
                                             tags={'svc'},
                                             namespace='examples')
    benchmark.run()

Congratulations!

That’s it! We have successfully defined a customizable, plug-and-play, and re-usable machine-learning pipeline.

Feel free to play to download this repository and play with Component and Configuration.

Cheers!