The goal of this work is to develop a tool, named learningOrchestra, to
facilitate and streamline the data science iterative process of:
- Gathering data;
- Cleaning/preparing the datasets;
- Building models;
- Validating their predictions; and
- Deploying the results.
The architecture of learningOrchestra is a collection of microservices deployed
in a cluster.
A dataset (in CSV format) can be loaded from an URL using the
microservice, which converts the dataset to JSON and later stores it in MongoDB.
It is also possible to perform several preprocessing and analytical tasks using
learningOrchestra’s collection of microservices.
With learningOrchestra, you can build prediction models with different
classificators simultaneously using stored and preprocessed datasets with the
Model Builder microservice. This microservice uses a Spark
cluster to make prediction models using distributed processing. You can compare the different
classification results over time to fit and increase prediction accuracy.
By providing their own preprocessing code, users can create highly customized model predictions
against a specific dataset, increasing model prediction accuracy.
With that in mind, the possibilities are endless!
To make using learningOrchestra more accessible, we provide the
learning_orchestra_client Python package. This package provides developers with all of
learningOrchestra’s functionalities in a Python API.
To improve your user experience, you can export and analyse the results using a MongoDB GUI, such as
The learningOrchestra documentation has more details on how to install and use it. We also provided documentation and examples for each microservice and Python package.