Introduction to Tribuo – A Java Machine Learning Library

Last updated: July 14, 2025

Written by: Olayemi Michael

Reviewed by: Saajan Nagendra

Artificial Intelligence

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

Machine Learning (ML) and Artificial Intelligence (AI) are reshaping software development by enabling systems to learn from data and make intelligent predictions.

Tribuo is a production-ready, open-source machine learning library developed by Oracle. It simplifies building and deploying robust ML models. Like Weka and Deeplearning4j, Tribuo supports various machine learning tasks and integrates easily with Java applications.

In this tutorial, we’ll learn about the different machine learning algorithms available in Tribuo. Also, we’ll build a regression model to predict wine quality using the UCI Red Wine Quality dataset.

2. What’s Tribuo?

Tribuo is a Java-centric machine learning library that supports:

Supervised learning: Regression, Classification, etc.
Unsupervised learning: Clustering

Furthermore, it’s strongly typed, meaning it enforces correct input and output types, which helps prevent runtime errors and ensures consistent model development.

It supports importing and exporting models in Open Neural Network eXchange (ONNX) format, allowing integration with other ML frameworks such as TensorFlow and PyTorch.

Another standout feature of Tribuo is provenance tracking. This feature logs metadata about datasets, model parameters, and training configurations, promoting transparency and reproducibility.

As AI continues to find its place in enterprise Java applications, Tribuo provides a practical toolkit for embedding intelligent behaviors directly into Java-based systems.

3. Supported Machine Learning Algorithms

Tribuo supports a variety of ML tasks, including:

Classification: it predicts discrete categories or labels. For example, predicting whether a football team will win or lose, or classifying wine as good or bad based on a quality threshold.
Regression: it predicts continuous values such as a wine quality score or a patient’s cholesterol level.
Clustering: it identifies groups in unlabeled data. For instance, it could group wines based on chemical properties such as acidity and alcohol content without knowing their quality scores.

4. Setting up a Tribuo Project

Let’s see Tribuo in action by building a regression model to predict wine quality.

First, let’s add the Tribuo dependency to our pom.xml:

<dependency>
    <groupId>org.tribuo</groupId>
    <artifactId>tribuo-all</artifactId>
    <version>4.3.2</version>
    <type>pom</type>
</dependency>

The tribuo-all dependency provides classes to load and train datasets with a specific training algorithm.

Also, let’s download the UCI Wine Quality Dataset Red and place it in the src/main/resources/dataset directory. The dataset includes 11 physicochemical features such as acidity and alcohol content:

The quality column is a continuous numeric value suitable for regression.

Finally, let’s create a class named WineQualityRegression:

public class WineQualityRegression {
}

In the subsequent sections, the class presents different logic to train and save our model for future use.

5. Class-Level Variables

Next, let’s define the following class-level variables:

public static final String DATASET_PATH = "src/main/resources/dataset/winequality-red.csv";
public static final String MODEL_PATH = "src/main/resources/model/winequality-red-regressor.ser";
public Model<Regressor> model;
public Trainer<Regressor> trainer;
public Dataset<Regressor> trainSet;
public Dataset<Regressor> testSet;

In the code above, we define the path to the dataset and where the trained model will be saved or loaded from.

Next, we define four variables representing the following:

Model – a class that stores a predictive model
Trainer – an interface that can train predictive models
Dataset– a class that holds a set of data used for training

Also, we explicitly specified the model output type as Regressor.

6. Loading and Splitting the Dataset

Let’s define a method to load and split the dataset:

void createDatasets() throws Exception {
    RegressionFactory regressionFactory = new RegressionFactory();
    CSVLoader<Regressor> csvLoader = new CSVLoader<>(';', CSVIterator.QUOTE, regressionFactory);
    DataSource<Regressor> dataSource = csvLoader.loadDataSource(Paths.get(DATASET_PATH), "quality");

    TrainTestSplitter<Regressor> dataSplitter = new TrainTestSplitter<>(dataSource, 0.7, 1L);
    trainSet = new MutableDataset<>(dataSplitter.getTrain());
    testSet = new MutableDataset<>(dataSplitter.getTest());
}

Here, we use CSVLoader to parse the semicolon-delimited CSV file and prepare it for regression. RegressionFactory creates regression outputs, specifying that the target variable quality is a continuous variable. DataSource<Regressor> holds the parsed data.

Then, to assess generalization and evaluate model performance, we split the dataset into 70% training and 30% test sets using TrainTestSplitter.

7. Training a Regression Model

Since wine quality score is a numerical value, let’s train the model using Classification and Regression Tree (CART) as the base learner to predict wine quality:

void createTrainer() {
    CARTRegressionTrainer subsamplingTree = new CARTRegressionTrainer(
      Integer.MAX_VALUE,
      AbstractCARTTrainer.MIN_EXAMPLES,
      0.001f,
      0.7f,
      new MeanSquaredError(),
      Trainer.DEFAULT_SEED
    );

    trainer = new RandomForestTrainer<>(subsamplingTree, new AveragingCombiner(), 10);
    model = trainer.train(trainSet); 
}

In the method above, CARTRegressionTrainer configures a decision tree with no maximum depth, a minimum of six examples per split, and Mean Squared Error as the splitting criterion. Then, RandomForestTrainer combines 10 CART decision trees, averaging their prediction with AveragingCombiner.

The train() method trains the model on the trainSet dataset, producing a Model<Regressor> for predicting wine quality scores.

8. Evaluation

Next, let’s assess the regression model’s performance by using RegressionEvaluator to compute metrics on the dataset in relation to the model:

void evaluate(Model<Regressor> model, String datasetName, Dataset<Regressor> dataset) {
    RegressionEvaluator evaluator = new RegressionEvaluator();
    RegressionEvaluation evaluation = evaluator.evaluate(model, dataset);
    Regressor dimension0 = new Regressor("DIM-0", Double.NaN);

    log.info("MAE: " + evaluation.mae(dimension0));
    log.info("RMSE: " + evaluation.rmse(dimension0));
    log.info("R^2: " + evaluation.r2(dimension0));
}

RegressionEvaluator assesses the model’s performance on the dataset. Then, we log the MAE (Mean Absolute Error), RMSE (Root Mean Squared Error), and R^2 ( Coefficient of Determination) to the console.

Next, let’s use the evaluate() method to evaluate our model and dataset:

void evaluateModels() throws Exception {
    log.info("Training model");
    evaluate(model, "trainSet", trainSet);

    log.info("Testing model");
    evaluate(model, "testSet", testSet);
}

Here’s the evaluation of the training and test sets against the model when we execute the program:

07:10:14.405 [main] INFO  tribuo.WineQualityRegression - Training model
07:10:14.406 [main] INFO  tribuo.WineQualityRegression - Results for trainSet---------------------
07:10:14.537 [main] INFO  tribuo.WineQualityRegression - MAE: 0.25025410332970005
07:10:14.537 [main] INFO  tribuo.WineQualityRegression - RMSE: 0.3422557198486092
07:10:14.538 [main] INFO  tribuo.WineQualityRegression - R^2: 0.8190947891297661
07:10:14.538 [main] INFO  tribuo.WineQualityRegression - Testing model
07:10:14.540 [main] INFO  tribuo.WineQualityRegression - Results for testSet---------------------
07:10:14.565 [main] INFO  tribuo.WineQualityRegression - MAE: 0.48711029366796743
07:10:14.565 [main] INFO  tribuo.WineQualityRegression - RMSE: 0.6584973595553575
07:10:14.565 [main] INFO  tribuo.WineQualityRegression - R^2: 0.3444460580874339

MAE represents the absolute difference between the predicted values and the actual values for the training and test sets. RMSE measures the square root of the average of the squared differences between the predicted and actual values. Also, R^2 indicates how well the model explains variance in the training and testing data.

Lower MAE and RMSE values, and higher R^2 value indicate better predictive performance.

9. Saving the Model

Finally, let’s save the model as a file for reuse:

void saveModel() throws Exception {
    File modelFile = new File(MODEL_PATH);
    try (ObjectOutputStream objectOutputStream = new ObjectOutputStream(new FileOutputStream(modelFile))) {
        objectOutputStream.writeObject(model);
    }
}

In the code above, we serialize the trained model to a file using the ObjectOutputStream class. By saving the model to a file, we can reuse the model for later predictions without retraining.

10. Calling the Methods

Now, let’s call the methods created earlier in our main() method:

public static void main(String[] args) throws Exception {
    WineQualityRegression wineQualityRegression = new WineQualityRegression();

    wineQualityRegression.createDatasets();
    wineQualityRegression.createTrainer();
    wineQualityRegression.evaluateModels();
    wineQualityRegression.saveModel();
}

This compiles the code and saves the model in the specified directory.

11. Using the Model

Let’s create a new class named WinePredictor and load the saved model inside the main() method:

class WineQualityPredictor {
    private static final Logger log = LoggerFactory.getLogger(WineQualityPredictor.class);

    public static void main(String[] args) throws IOException, ClassNotFoundException {
        File modelFile = new File("src/main/resources/model/winequality-red-regressor.ser");
        Model<Regressor> loadedModel = null;

        try (ObjectInputStream objectInputStream = new ObjectInputStream(new FileInputStream(modelFile))) {
            loadedModel = (Model<Regressor>) objectInputStream.readObject();
        }
}

As we know, Tribuo is type sensitive, hence we specify the type of our model, in this case, a Regressor.

Also, we load the model by creating an ObjectInputStream and passing the model path as an argument.

Then, let’s create an ArrayExample object to represent a single wine sample:

ArrayExample<Regressor> wineAttribute = new ArrayExample<Regressor>(new Regressor("quality", Double.NaN));
wineAttribute.add("fixed acidity", 7.4f);
wineAttribute.add("volatile acidity", 0.7f);
wineAttribute.add("citric acid", 0.47f);
wineAttribute.add("residual sugar", 1.9f);
wineAttribute.add("chlorides", 0.076f);
wineAttribute.add("free sulfur dioxide", 11.0f);
wineAttribute.add("total sulfur dioxide", 34.0f);
wineAttribute.add("density", 0.9978f);
wineAttribute.add("pH", 3.51f);
wineAttribute.add("sulphates", 0.56f);
wineAttribute.add("alcohol", 9.4f);

Finally, let’s make a prediction using the Prediction class:

Prediction<Regressor> prediction = loadedModel.predict(wineAttribute);
double predictQuality = prediction.getOutput().getValues()[0];
log.info("Predicted wine quality: " + predictQuality);

Here’s the predicted wine quality:

07:31:05.772 [main] INFO  tribuo.WineQualityPredictor - Predicted wine quality: 5.028163673540464

12. Conclusion

In this article, we learned about Tribuo and its features. Then we saw a high-level overview of some ML algorithms supported by Tribuo. Finally, we trained a model to make wine quality predictions using a regression algorithm.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.