Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

In this tutorial, we’ll learn how to approach the problem of generating mock data for different purposes. We’ll learn how to use Datafaker and review several examples.

2. History

Datafaker is the modern fork for Javafaker. It was transferred to Java 8 and underwent improvements, increasing the library’s performance. However, the current API stayed more or less the same. Thus, previously used Javafaker won’t have any problems migrating to Datafaker. All the examples provided in the Java Faker article will work for the 1.6.0 version of Datafaker.

The current Datafaker API is compatible with Javafaker. Therefore, this article will concentrate only on the differences and the improvements.

First, let’s add the datafaker Maven dependency to the project:

<dependency>
    <groupId>net.datafaker</groupId>
    <artifactId>datafaker</artifactId>
    <version>1.6.0</version>
</dependency>

3. Providers

One of the most important parts of  Datafaker is the providers. This is a set of special classes that make data generation more convenient. It’s important to note that these classes are backed up by yml files with the correct data. Faker methods and expressions use these files directly or indirectly to generate data. In the next sections, we become more familiar with the work of these methods and directives.

4. Additional Patterns for Data Generation

Datafaker, as well as Javafaker, support generating values based on the provided pattern. Datafaker introduced additional functionality with templatify, exemplify, options, date, csv, and json directives.

4.1. Templatify

The templatify directive takes several arguments. The first one is the base String. The second is the character that will be replaced in the given string. The rest are the options for replacement which will be picked randomly:

public class Templatify {
    private static Faker faker = new Faker();

    public static void main(String[] args) {
        System.out.println("Expression: " + getExpression());
        System.out.println("Expression with a placeholder: " + getExpressionWithPlaceholder());
    }

    static String getExpression() {
        return faker.expression("#{templatify 'test','t','j','r'}");
    }

    static String getExpressionWithPlaceholder() {
        return faker.expression("#{templatify '#ight', '#', 'f', 'l', 'm', 'n'}");
    }
}

Although we can use the base string without placeholders, it might produce undesirable results as it will replace all the occurrences in the given string. We can introduce a placeholder, a character that appears only in the specific places of the base string. In the case above, the result was:

Expression: resj
Expression with a placeholder: night

If there’re multiple places where random characters can be placed, it will be randomized each time. Using Strings for replacement is possible, but the documentation doesn’t mention this explicitly. Therefore it’s better to use it with caution.

4.2. Examplify

This directive generates a random value based on the provided example. It will replace lowercase or uppercase characters with the respected value. The same goes for the numbers. Special characters are untouched, which helps to create formatted strings:

public class Examplify {
    private static Faker faker = new Faker();

    public static void main(String[] args) {
        System.out.println("Expression: " + getExpression());
        System.out.println("Number expression: " + getNumberExpression());
    }

    static String getExpression() {
        return faker.expression("#{examplify 'Cat in the Hat'}");
    }

    static String getNumberExpression() {
        return faker.expression("#{examplify '123-123-123'}");
    }
}

An example of the output:

Expression: Lvo lw ero Qkd
Number expression: 707-657-434

4.3. Regexify

This is a more flexible way of creating formatted String values. We can use the regexify directive as an expression or call the regexify method directly on the Faker object:

public class Regexify {
    private static Faker faker = new Faker();

    public static void main(String[] args) {
        System.out.println("Expression: " + getExpression());
        System.out.println("Regexify with a method: " + getMethodExpression());
    }

    static String getExpression() {
        return faker.expression("#{regexify '(hello|bye|hey)'}");
    }

    static String getMethodExpression() {
        return faker.regexify("[A-D]{4,10}");
    }
}

Possible output:

Expression: bye
Regexify with a method: DCCC

4.4. Options

The options.option directive allows picking an option from a provided list randomly. This functionality can be achieved via regexify, but as it’s a usual case, a separate directive makes sense:

public class Option {
    private static Faker faker = new Faker();

    public static void main(String[] args) {
        System.out.println("First expression: " + getFirstExpression());
        System.out.println("Second expression: " + getSecondExpression());
        System.out.println("Third expression: " + getThirdExpression());
    }

    static String getFirstExpression() {
        return faker.expression("#{options.option 'Hi','Hello','Hey'}");
    }

    static String getSecondExpression() {
        return faker.expression("#{options.option '1','2','3','4','*'}");
    }

    static String getThirdExpression() {
        return faker.expression("#{regexify '(Hi|Hello|Hey)'}");
    }
}

The output of the code above:

First expression: Hey
Second expression: 4
Third expression: Hello

If the number of options is too big, creating a custom provider for randomized values makes sense.

4.5. CSV

This directive, based on its name, creates CSV formatted data. However, there might be confusion with using this directive. Because, under the hood, two overloaded methods with quite different signatures handle this directive:

public class Csv {
    private static Faker faker = new Faker();

    public static void main(String[] args) {
        System.out.println("First expression:\n" + getFirstExpression());
        System.out.println("Second expression:\n" + getSecondExpression());
    }

    static String getFirstExpression() {
        String firstExpressionString
          = "#{csv '4','name_column','#{Name.first_name}','last_name_column','#{Name.last_name}'}";
        return faker.expression(firstExpressionString);
    }

    static String getSecondExpression() {
        String secondExpressionString
          = "#{csv ',','\"','true','4','name_column','#{Name.first_name}','last_name_column','#{Name.last_name}'}";
        return faker.expression(secondExpressionString);
    }
}

The directives above are using expressions #{Name.first_name} and #{Name.last_name}. The next sections will explain the usage of these expressions.

The values after the csv directive in the expression are mapped to the parameters of the mentioned methods. The documentation for these methods provides additional information. However, sometimes it’s possible to get problems with parsing these directives, and, in this case, it’s better to use the methods directly. The code above will produce the following output:

First expression:
"name_column","last_name_column"
"Riley","Spinka"
"Lindsay","O'Conner"
"Sid","Rogahn"
"Prince","Wiegand"

Second expression:
"name_column","last_name_column"
"Jen","Schinner"
"Valeria","Walter"
"Mikki","Effertz"
"Deon","Bergnaum"

This is a great way to generate mock data for use outside the application programmatically.

4.6. JSON

Another popular and often-used format is JSON. Datafaker allows generating data in JSON format using expressions:

public class Json {
    private static final Faker faker = new Faker();

    public static void main(String[] args) {
        System.out.println(getExpression());
    }

    static String getExpression() {
        return faker.expression(
          "#{json 'person'," + "'#{json ''first_name'',''#{Name.first_name}'',''last_name'',''#{Name.last_name}''}'," +
          "'address'," + "'#{json ''country'',''#{Address.country}'',''city'',''#{Address.city}''}'}");
    }
}

The code above produces the following output:

{"person": {"first_name": "Dorian", "last_name": "Simonis"}, "address": {"country": "Cameroon", "city": "South Ernestine"}}

4.7. Method Invocations

In fact, all the expressions are just method invocations with the method name and parameters passed as a String. Thus, all the directives above mirror the methods with the same names. However, sometimes it’s more convenient to use plain text to create mock data:

public class MethodInvocation {
    private static Faker faker = new Faker();

    public static void main(String[] args) {
        System.out.println("Name from a method: " + getNameFromMethod());
        System.out.println("Name from an expression: " + getNameFromExpression());
    }

    static String getNameFromMethod() {
        return faker.name().firstName();
    }

    static String getNameFromExpression() {
        return faker.expression("#{Name.first_Name}");
    }
}

Now it’s obvious that the expressions with csv and json directives used method invocations inside. This way, we can call any method for data generation on the Faker object. Although the method names are case insensitive and allow variations in the format, it’s better to refer to the documentation of the used version to verify it.

Additionally, it’s possible to pass parameters to a method with an expression. We partially saw this in the formats of the regexify and templatify directives. Even though it might be a bit cumbersome and error-prone in some cases, sometimes this is the most convenient way to interact with Faker:

public class MethodInvocationWithParams {
    public static int MIN = 1;
    public static int MAX = 10;
    public static String UNIT = "SECONDS";
    private static Faker faker = new Faker();

    public static void main(String[] args) {
        System.out.println("Duration from the method :" + getDurationFromMethod());
        System.out.println("Duration from the expression: " + getDurationFromExpression());
    }
    static Duration getDurationFromMethod() {
        return faker.date().duration(MIN, MAX, UNIT);
    }

    static String getDurationFromExpression() {
        return faker.expression("#{date.duration '1', '10', 'SECONDS'}");
    }
}

One of the shortcomings of the expressions is that they return a String object. As a result, this reduces the number of operations we can make on the returned object. The code above produces this output:

Duration from the method: PT6S
Duration from the expression: PT4S

5. Collections

Collections allow the creation of lists with mocked data. In this case, the elements can be of different types. The collection is parametrized by the most specific type: a parent of all the classes in the collection. Let’s geek out a bit and generate a list of the characters from “Star Wars” and “Start Trek”:

public class Collection {
    public static int MIN = 1;
    public static int MAX = 100;
    private static Faker faker = new Faker();

    public static void main(String[] args) {
        System.out.println(getFictionalCharacters());
    }

    static List<String> getFictionalCharacters() {
        return faker.collection(
          () -> faker.starWars().character(),
          () -> faker.starTrek().character())
            .len(MIN, MAX)
            .generate();
    }
}

As a result, we got the following list:

[Luke Skywalker, Wesley Crusher, Jean-Luc Picard, Greedo, Hikaru Sulu, William T. Riker]

Because both suppliers in our collection return the String type values, the resulting list will be parametrized by String. Let’s check the situation where we mix different types of data:

public class MixedCollection {
    public static int MIN = 1;
    public static int MAX = 20;
    private static Faker faker = new Faker();

    public static void main(String[] args) {
        System.out.println(getMixedCollection());
    }

    static List<? extends Serializable> getMixedCollection() {
        return faker.collection(
        () -> faker.date().birthday(),
        () -> faker.name().fullName())
          .len(MIN, MAX)
          .generate();
    }
}

In this case, the most specific class for String and Timestamp is Serializable. The output will be the following:

[1964-11-09 15:16:43.0, Devora Stamm DVM, 1980-01-11 15:18:00.0, 1989-04-28 05:13:54.0,
  2004-09-06 17:11:49.0, Irving Turcotte, Sherita Durgan I, 2004-03-08 00:45:57.0, 1979-08-25 22:48:50.0,
  Manda Hane, Latanya Hegmann, 1991-05-29 12:07:23.0, 1989-06-26 12:40:44.0, Kevin Quigley]

6. Conclusion

Datafaker is a new, improved version of Javafaker. This article covered new functionality introduced in Datafaker 1.6.0, which provided new ways of generating data. However, there is more to learn about this library, and it’s better to refer to the official documentation and GitHub repository to get more information about the functionality and features of Datafaker.

As always, the code presented in the article is available over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.