1. Introduction
We use Spring AI with OpenAI’s Moderation model to detect harmful or sensitive content in text. The moderation model analyzes input and flags categories like self-harm, violence, hate, or sexual content.
In this tutorial, we’ll learn how to build a moderation service and integrate it with the moderation model.
2. Dependencies
Let’s add the spring-ai-starter-model-openai dependency:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>
Using it, we get the chat client, including moderation model requests.
3. Configuration
Next, we configure our Spring AI client:
spring:
ai:
openai:
api-key: ${OPEN_AI_API_KEY}
moderation:
options:
model: omni-moderation-latest
We’ve specified the API key and moderation model name. Now, we can start using the moderation API.
4. Moderation Categories
Let’s review the moderation categories we can use:
- Hate, we can use this category to detect content that expresses or promotes hate based on protected traits.
- Hate/Threatening, we can use this category to detect hate content that includes threats of violence or serious harm.
- Harassment, we may face this category when language harasses, bullies, or targets an individual or group.
- Harassment/Threatening, we may face this category when harassment includes explicit threats or intent to cause harm.
- Self-harm, we can use this category to identify content that promotes or depicts self-harm behaviors.
- Self-harm/Intent, we can use this category when someone expresses an intent to self-harm.
- Self-harm/Instructions, we may face this category when content gives instructions, methods, or encouragement to self-harm.
- Sexual, we can use this category to flag explicit sexual content or promotion of sexual services.
- Sexual/Minors, we can use this category to flag any sexual content involving minors, which is strictly disallowed.
- Violence, we may face this category when content depicts or describes death, violence, or physical injury.
- Violence/Graphic, we can use this category to detect vivid or graphic depictions of injury, death, or severe harm.
- Illicit, we can use this category to flag advice, instructions, or promotion of illegal activities.
- Illicit/Violent, we may face this category when illicit content includes elements of violence.
5. Build Moderation Service
Now, let’s build the Moderation Service. In this service, we’ll consume user input messages and validate them against different categories using the moderation model.
5.1. TextModerationService
Let’s start by building the TextModerationService:
@Service
public class TextModerationService {
private final OpenAiModerationModel openAiModerationModel;
@Autowired
public TextModerationService(OpenAiModerationModel openAiModerationModel) {
this.openAiModerationModel = openAiModerationModel;
}
public String moderate(String text) {
ModerationPrompt moderationRequest = new ModerationPrompt(text);
ModerationResponse response = openAiModerationModel.call(moderationRequest);
Moderation output = response.getResult().getOutput();
return output.getResults().stream()
.map(this::buildModerationResult)
.collect(Collectors.joining("\n"));
}
}
Here, we’ve used the OpenAiModerationModel. We send the ModerationPrompt with the text we want to moderate and build the result from the model’s response. Now, let’s create the buildModerationResult() method:
private String buildModerationResult(ModerationResult moderationResult) {
Categories categories = moderationResult.getCategories();
String violations = Stream.of(
Map.entry("Sexual", categories.isSexual()),
Map.entry("Hate", categories.isHate()),
Map.entry("Harassment", categories.isHarassment()),
Map.entry("Self-Harm", categories.isSelfHarm()),
Map.entry("Sexual/Minors", categories.isSexualMinors()),
Map.entry("Hate/Threatening", categories.isHateThreatening()),
Map.entry("Violence/Graphic", categories.isViolenceGraphic()),
Map.entry("Self-Harm/Intent", categories.isSelfHarmIntent()),
Map.entry("Self-Harm/Instructions", categories.isSelfHarmInstructions()),
Map.entry("Harassment/Threatening", categories.isHarassmentThreatening()),
Map.entry("Violence", categories.isViolence()))
.filter(entry -> Boolean.TRUE.equals(entry.getValue()))
.map(Map.Entry::getKey)
.collect(Collectors.joining(", "));
return violations.isEmpty()
? "No category violations detected."
: "Violated categories: " + violations;
}
We got the moderation result categories and created a map to add the violation result for each category. If no categories are violated, we just build the default text response.
5.2. TextModerationController
Before building the controller, let’s create the ModerateRequest, which we’ll use to send the text for moderation:
public class ModerateRequest {
private String text;
//getters and setters
}
Now, let’s create the TextModerationController:
@RestController
public class TextModerationController {
private final TextModerationService service;
@Autowired
public TextModerationController(TextModerationService service) {
this.service = service;
}
@PostMapping("/moderate")
public ResponseEntity<String> moderate(@RequestBody ModerateRequest request) {
return ResponseEntity.ok(service.moderate(request.getText()));
}
}
Here, we got the text from the ModerateRequest and sent it to our TextModerationService.
5.3. Test the Behavior
Finally, let’s test our moderation service:
@AutoConfigureMockMvc
@ExtendWith(SpringExtension.class)
@EnableAutoConfiguration
@SpringBootTest
@ActiveProfiles("moderation")
class ModerationApplicationLiveTest {
@Autowired
private MockMvc mockMvc;
@Test
void givenTextWithoutViolation_whenModerating_thenNoCategoryViolationsDetected() throws Exception {
String moderationResponse = mockMvc.perform(post("/moderate")
.contentType(MediaType.APPLICATION_JSON)
.content("{\"text\": \"Please review me\"}"))
.andExpect(status().isOk())
.andReturn()
.getResponse()
.getContentAsString();
assertThat(moderationResponse).contains("No category violations detected");
}
}
We sent a text that doesn’t violate any categories and verified that the service confirms it. Now, let’s test the behavior when some categories are violated:
@Test
void givenHarassingText_whenModerating_thenHarassmentCategoryShouldBeFlagged() throws Exception {
String moderationResponse = mockMvc.perform(post("/moderate")
.contentType(MediaType.APPLICATION_JSON)
.content("{\"text\": \"You're really Bad Person! I don't like you!\"}"))
.andExpect(status().isOk())
.andReturn()
.getResponse()
.getContentAsString();
assertThat(moderationResponse).contains("Violated categories: Harassment");
}
As we can see, the Harassment category was violated as expected. Now, let’s check if our service can moderate multiple violations:
@Test
void givenTextViolatingMultipleCategories_whenModerating_thenAllCategoriesShouldBeFlagged() throws Exception {
String moderationResponse = mockMvc.perform(post("/moderate")
.contentType(MediaType.APPLICATION_JSON)
.content("{\"text\": \"I hate you and I will hurt you!\"}"))
.andExpect(status().isOk())
.andReturn()
.getResponse()
.getContentAsString();
assertThat(moderationResponse).contains("Violated categories: Harassment, Harassment/Threatening, Violence");
}
We sent a text that contains multiple violations. As we can see, the service response confirms that three categories were violated.
6. Conclusion
In this article, we reviewed the OpenAI Moderation Model integration with Spring AI. We explored the moderation categories and built a service to moderate incoming text. This service can be part of a more complex system that works with user content. For example, we can attach it to the chat moderation bot, which helps us control the quality of conversations under our articles.
As always, the code is available over on GitHub.