Improve Data Flow through React Context
28.02.2024
14.11.2022
These days, practically all modern software systems incorporate in one way or another different varieties of machine learning algorithms, particularly those systems that deal with extensive customer data, in areas like e-commerce, finance and the medical industry. Production systems as well use these algorithms to perform quality control, and even our cars, if they have any autonomous detection/driving functionality, use them too.
Good news, then, that successful practical application of ML allowed it to very quickly transform from being a complex topic within academia (with custom-made Python libraries) into standardized solutions that are available “out of the box” as part of frameworks. Classic examples would be TensorFlow and PyTorch frameworks in Python, which have a depth and breadth to satisfy a wide range of users from academia to industry, as well as the great development from Microsoft: the ML.NET framework. This contains powerful high-performing libraries, yet is so concise and intuitive to use. Moreover, ML.NET is integrated with Azure, where data or models, or both, can be stored (storage for real-life model development can reach petabyte level).
To recap the fundamentals of machine learning: there is a data set (usually split into train, validation, test) and a model (neural network, linear regression, k-means, svm, decisions trees and much more). Parameters of the model are fitted using the training set; to avoid overfitting, a validation set is used and final evaluation is done on the test set to estimate how well the model performs. A very short summary of types of algorithms can be found here https://www.sas.com/en_gb/insights/articles/analytics/machine-learning-algorithms.html
Let’s consider one of the classic problems of image classification. Given an image, a model predicts its category (for example, ‘food’, ‘car’, ‘house’, etc.). To use ML.NET, one has to simply install a NuGet package and then create the MLContext (similar to the entity framework context):
using MachineLearningTest;
using Microsoft.ML;
// 1. create context
var mlContext = new MLContext();
// 2. create pipeline to load->transform->train from data
var folder = Path.Combine(Environment.CurrentDirectory, "assets");
IEstimator<ITransformer> pipeline = mlContext.Transforms.LoadImages(outputColumnName: "input", imageFolder: Path.Combine(folder, "images"), inputColumnName: nameof(ImageData.ImagePath))
.Append(mlContext.Transforms.ResizeImages(outputColumnName: "input", imageWidth: 200, imageHeight: 200, inputColumnName: "input"))
.Append(mlContext.Transforms.ExtractPixels(outputColumnName: "input"))
.Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "LabelKey", inputColumnName: "Label")) // category -> integer
.Append(mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(labelColumnName: "LabelKey", featureColumnName: "input"))
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabelValue", "PredictedLabel")) // from integer -> category
.AppendCacheCheckpoint(mlContext);
// 3. load training data
IDataView trainingData = mlContext.Data.LoadFromTextFile<ImageData>(path: Path.Combine(folder, "images\\tags.tsv"), hasHeader: false);
// 4. train the model
ITransformer model = pipeline.Fit(trainingData);
// 5. predict an image
var predictor = mlContext.Model.CreatePredictionEngine<ImageData, ImagePrediction>(model);
var prediction = predictor.Predict(new ImageData { ImagePath = Path.Combine(folder, "images\\broccoli2.jpg") });
Console.WriteLine(prediction.PredictedLabelValue); // displays 'food'
namespace MachineLearningTest
{
using Microsoft.ML.Data;
public class ImageData
{
[LoadColumn(0)]
public string ImagePath;
[LoadColumn(1)]
public string Label;
}
public class ImagePrediction : ImageData
{
public float[] Score;
public string PredictedLabelValue;
}
}
This is a very simplified example taken from extensive documentation on ML.NET from Microsoft https://dotnet.microsoft.com/en-us/apps/machinelearning-ai/ml-dotnet. For real-life applications, well-trained models are very valuable as training eats up a lot of resources and time. Choosing the right algorithm for a problem and tuning is an art in itself, but once accomplished, models can be saved and loaded within ML.NET, as well as models trained in TensorFlow or provided in standard ONNX format.
With these tools, complex predictions or automation can be embedded into any software solution, whether it is prices/sales forecasting, customer classification, recommendations or any other application inferred from data.
Improve Data Flow through React Context
28.02.2024
Play with the WBS
13.09.2023
Zentralisierung von Application Logs
12.05.2023
A gentle introduction to JSON Web Tokens
30.03.2023