With over 25 million downloads, Rasa Open Source is the most popular open source framework for building chat and voice-based AI assistants.

NLU Training Data

The goal of NLU (Natural Language Understanding) is to extract structured information from user messages. This usually includes the user’s intent and any entities their message contains. You can add extra information such as regular expressions and lookup tables to your training data to help the model identify intents and entities correctly.

NLU training data consists of example user utterances categorized by intent. To make it easier to use your intents, give them names that relate to what the user wants to accomplish with that intent, keep them in lowercase, and avoid spaces and special characters.

version: "3.1"
 
nlu:
    - intent: greet
      examples: |
          - Hey
          - Hi
          - hey there [Sara](name)
 
    - intent: faq/language
      examples: |
          - What language do you speak?
          - Do you only handle english?
 
stories:
    - story: greet and faq
      steps:
          - intent: greet
          - action: utter_greet
          - intent: faq
          - action: utter_faq
 
rules:
    - rule: Greet user
      steps:
          - intent: greet
          - action: utter_greet

Gather Real Data

When it comes to building out NLU training data, developers are sometimes tempted to use text generation tools or templates to quickly increase the number of training examples. This is a bad idea.

:::note Remember that if you use a script to generate training data, the only thing your model can learn is how to reverse-engineer the script. :::

Avoiding Intent Confusion#

Intents are classified using character and word-level features extracted from your training examples, depending on what featurizers you’ve added to your NLU pipeline. When different intents contain the same words ordered similarly, this can create confusion for the intent classifier.

Entity

Keywords that can be extracted from a user message. For example: a telephone number, a person’s name, a location, the name of a product

stories:
    - story: migrate from IBM Watson
      steps:
          - intent: migration
            entities:
                - product
          - slot_was_set:
                - product: Watson
          - action: utter_watson_migration
 
    - story: migrate from Dialogflow
      steps:
          - intent: migration
            entities:
                - product
          - slot_was_set:
                - product: Dialogflow
          - action: utter_dialogflow_migration
 
    - story: migrate from unspecified
      steps:
          - intent: migration
          - action: utter_ask_migration_product

To avoid intent confusion, group these training examples into single migration intent and make the response depend on the value of a categorical product slot that comes from an entity.