Interpreting

API Details for the Interpreter, Interpretations, and related components.

Interpreter

The Interpreter is a pipeline step that takes the data from the previous step and interprets it into nodes and relationships.

Key	Description	Type	Default	Required
`interpretations`	A list of interpretations to be used to interpret the data.	`List[Interpretation]`	N/A	Yes
`iterate_on`	Values to iterate over to interpret the data.	`ValueProvider`	N/A	No
`before_iteration`	A list of iterpretations that are applied before iteration.	`List[Interpretation]`	`[]`	No

Interpretations

Interpretations are the building blocks of the Interpreter. They are used to interpret the data into nodes and relationships.

Source Node Interpretation

The SourceNodeInterpretation is used to interpret the data into a source node.

Parameter Name	Required?	Type	Description
node_type	Yes	String or ValueProvider	Specifies the type of the source node. It is a required field. When a ValueProvider is used dynamic index creation and schema introspection are not supported.
key	Yes	Dictionary	Contains key-value pairs that define the key of the source node. The keys represent field names, and the values can be either static values or value providers. It is a required field.
allow_create	No	Boolean	When `true`, allows creating new nodes when source node is not in the graph. When `false`, new source nodes will not be added to the graph. This field is optional and the default value is `true`.
properties	No	Dictionary	Stores additional properties of the source node. It is a dictionary where the keys represent property names, and the values can be either static values or value providers. This field is optional.
additional_indexes	No	List[String]	Specifies additional indexes for desired on the source node. It is a list of field names. This field is optional.
additional_types	No	List[String]	Defines additional types for the source node. It is a list of strings representing the additional types. These types are not considered by ingestion system as part of the identity of the node and rather considered as extra labels applied after the ingestion of the node is completed. This field is optional.
normalization	No	Dictionary	Contains normalization flags that should be adopted by value providers when getting values. This field is optional. See the normalization reference. By default `do_lowercase_strings` is enabled.

Example

- type: source_node
  node_type: Person
  key:
    name: !jmespath patient_name
  properties:
    birthday: !jmespath patient_birthday
  additional_indexes:
    - birthday
  additional_types:
    - Patient

Relationship Interpretation

Parameter Name	Required?	Type	Description
node_type	Yes	String or ValueProvider	Specifies the type of the node a relationship connects to. It is a required field. When a ValueProvider is used dynamic index creation and schema introspection are not supported.
relationship_type	Yes	String or ValueProvider	Specifies the type of the relationship. It is a required field. When a ValueProvider is used dynamic index creation and schema introspection are not supported.
node_key	Yes	Dictionary	Contains key-value pairs that define the key of the related node. The keys represent field names, and the values can be either static values or value providers. It is a required field.
node_properties	No	Dictionary	Stores additional properties of the related node. It is a dictionary where the keys represent property names, and the values can be either static values or value providers. This field is optional.
relationship_key	No	Dictionary	Contains key-value pairs that define the key of the relationship itself. The keys represent field names, and the values can be either static values or value providers. It is a required field. The uniqueness of the relationship is defined in terms of the nodes it is relating and the key of the relationship.
relationship_properties	No	Dictionary	Stores additional properties of the relationship It is a dictionary where the keys represent property names, and the values can be either static values or value providers. This field is optional.
outbound	No	Boolean	Represents whether or not the relationship direction is outbound from the source node. By default, this is true.
find_many	No	Boolean	Represents whether or not the searches provided to node_key can return multiple values, and thus should create multiple relationships to multiple related nodes.
iterate_on	No	ValueProvider	Iterates over the values provided by the supplied value provider, and creates an relationship for each one.
node_creation_rule	No	`EAGER` \| `MATCH_ONLY` \| `FUZZY`	Defaults to `EAGER`. When `EAGER`, related nodes will be created when not present based on the supplied node type and key. `MATCH_ONLY` will not create a relationship when the related node does not already exists. `FUZZY` behaves like `MATCH ONLY`, but treats the key values as regular expressions to match on.
key_normalization	No	Dictionary	Contains normalization flags that should be adopted by value providers when getting values for node and relationship keys. This field is optional. See the normalization reference. By default `do_lowercase_strings` is enabled.
property_normalization	No	Dictionary	Contains normalization flags that should be adopted by value providers when getting values for node and relationship properties. This field is optional. See the normalization reference. By default no flags are enabled.
node_additional_types	No	List[String]	Defines additional types for the related node. It is a list of strings representing the additional types. These types are not considered by the ingestion system as part of the identity of the node and rather considered as extra labels applied after the ingestion of the node is completed. This field is optional.

Example

- type: relationship
  node_type: Person
  relationship_type: HAS_CHILD
  iterate_on: !jmespath children[*]
  node_key:
    name: !jmespath name

Properties Interpretation

Parameter Name	Required?	Type	Description
properties	Yes	Dictionary	Stores additional properties of the source node. It is a dictionary where the keys represent property names, and the values can be either static values or value providers. This field is optional.
normalization	No	Dictionary	Contains normalization flags that should be adopted by value providers when getting values. This field is optional. See the normalization reference. By default no flags are enabled.

Example

- type: properties
  properties:
    meaning_of_life: 42

Variables Interpretation

The variables interpretation is used to define variables that can be referenced later with the !variable value provider. For example, if we wanted to define a variable called meaning_of_life with a value of 42, we would use the following interpretation:

- type: variables
  variables:
    meaning_of_life: 42

And it could be referenced later like this:

- type: properties
  properties:
    meaning_of_life: !variable meaning_of_life

NOTE: that variables can be defined either statically or using a value provider.

Parameter Name	Required?	Type	Description
variables	Yes	Dictionary	Stores values as variables that can be referenced later with the `!variable` value provider. It is a dictionary where the keys represent property names, and the values can be either static values or value providers. This field is optional.
normalization	No	Dictionary	Contains normalization flags that should be adopted by value providers when getting values. This field is optional. See the normalization reference. By default no flags are enabled.

Switch Interpretation

The SwitchInterpretation is used to switch between different interpretations based on the value of a field.

Parameter Name	Required?	Type	Description
switch_on	Yes	ValueProvider	The value provider that will be evaluated for each source node. The value of the value provider will be used to determine which interpretation to apply.
interpretations	Yes	Dictionary	Contains the interpretations that will be applied. The keys represent the values of the `switch_on` parameter. The values represent the interpretations that will be applied. Each value may also be a list of interpretations.
default	No	Dictionary	Contains the default interpretation that will be applied if no interpretation has the same value as the value of the `switch_on` parameter.

Source Node and Relationship Interpretation Combined

A common use case is to combine source node and relationship interpretation.

- implementation: nodestream.interpreting:Interpreter
  arguments:
    interpretations:
    - type: source_node
      node_type: Person
      key:
        name: !jmespath name
      allow_create: true
    - type: relationship
      node_type: Person
      relationship_type: HAS_CHILD
      node_creation_rule: MATCH_ONLY
      iterate_on: !jmespath children[*]
      node_key:
        name: !jmespath name

In this example, source_node has the flag allow_create=true while the relationship has node_creation_rule=MATCH_ONLY. This means Person nodes will be created if they don't already exist using the source_node interpretation, but the children Person nodes will not be created as part of the relationship interpretation. The relationship interpretation will only create the relationships between existing persons.

Below is a comprehensive list of possible behaviors depending on pipeline configuration and graph state:

Given: Source node does not exist in the graph

		Related Node
		Already Exists	Does Not Exist
source_node.allow_create=true	relationship.node_creation_rule=EAGER	Source Node and Relationship are created	Everything is created
source_node.allow_create=true	relationship.node_creation_rule=MATCH_ONLY	Source Node and Relationship are created	Source Node is created
source_node.allow_create=false	relationship.node_creation_rule=EAGER	Nothing is created	Related node is created
source_node.allow_create=false	relationship.node_creation_rule=MATCH_ONLY	Nothing is created	Nothing is created

Given: Source node already exists in the graph

		Related Node
		Already Exists	Does Not Exist
source_node.allow_create=true	relationship.node_creation_rule=EAGER	Relationship is created	Related Node and Relationship are created
source_node.allow_create=true	relationship.node_creation_rule=MATCH_ONLY	Relationship is created	Nothing is created
source_node.allow_create=false	relationship.node_creation_rule=EAGER	Relationship is created	Related Node and Relationship are created
source_node.allow_create=false	relationship.node_creation_rule=MATCH_ONLY	Relationship is created	Nothing is created

Normalizers

A Normalizer allows you to clean data extracted by a ValueProvider. They are intended to provided stateless, simple transformations of data. Many different interpretations allow you to enable Normalizers to apply these transformations. See the Interpretation reference for where they can be applied.

Normalizer Flag Name	Example Input	Example Output
`do_lowercase_strings`	"dO_LoWER_cASe_strings"	"do_lowercase_strings"
`do_remove_trailing_dots`	"my.website.com."	"my.website.com"
`do_trim_whitespace`	" some value "	"some value"

ValueProviders

`!jmespath`

Represents a jmespath query language expression that should be executed against the input record.

For example, if you want to get extract all of the name fields from the list of people provided in a document like this:

{
    "people": [{"name": "Joe", "age": 25}, {"name": "john", "age": 45}]
}

A valid !jmespath value provider would look like this: !jmespath people[*].name Essentially, any jmespath expression provided after the !jmespath tag will be parsed and loaded as one. Another guide on jmespath can be found here.

`!variable`

Provides the value of an extracted variable from the Variables Interpretation. For instance, if you provided an variables interpretation block like so:

interpretations:
    - type: variables
      variables:
         name: !jmespath person.name

You are then able to use the !variable provided in a later interpretation. For example,

interpretations:
    # other interpretations are omitted.
    - type: source_node
      node_type: Person
         name: !variable name

This is particularly helpful when using the before_iteration and iterate_on clause in an Interpreter. For example, assume that you have a record that looks like this:

{
    "team_name": "My Awesome Team",
    "people": [
        {"name": "Joe", "age": 25},
        {"name": "John", "age": 34},
    ]
}

On way to ingest this data would be to do the following:

- implementation: nodestream.interpreting:Interpreter
  arguments:
    before_iteration:
      - type: variables
        variables:
           team: !jmespath team
    iterate_on: !jmespath people[]
    interpretations:
      - type: source_node
        node_type: Person
        key:
          name: !jmespath name
        properties:
          age: !jmespath age
      - type: relationship
        node_type: Team
        relationship_type: PART_OF
        node_key:
          name: !variable team

`!format`

The !format value provider allows you to format a string using the format method. For example, if you wanted to create a hello world node based on a name field in the record, you could do the following:

{
    "name": "Joe",
    "age": 25
}

The following interpretation would create a node with the key Hello, Joe!:

- implementation: nodestream.interpreting:Interpreter
  arguments:
    interpretations:
      - type: source_node
        node_type: HelloNode
        key:
          name: !format
            fmt: "Hello, {name}!"
            name: !jmespath name
        properties:
          age: !jmespath age

`!regex`

The !regex value provider allows you to extract a value from a string using a regular expression. For example, if you wanted to extract the first name from a string given a record like this:

{
    "name": "Joe Smith",
    "age": 25
}

The following interpretation would create a node with the key Joe:

- implementation: nodestream.interpreting:Interpreter
  arguments:
    interpretations:
      - type: source_node
        node_type: HelloNode
        key:
          first_name: !regex
            regex: "^(?P<first_name>[a-zA-Z]+)\s(?P<last_name>[a-zA-Z]+)$"
            data: !jmespath name
            group: first_name
        properties:
          age: !jmespath age

You can either use named groups or numbered groups. If you use named groups, you can specify the group name in the group argument. If you use numbered groups, you can specify the group number in the group argument. If you do not specify a group, the first group will be used - which is the entire match.

`!split`

The !split value provider allows you to split a string into a list of strings using a delimiter. For example, if you wanted to split a string like this:

{
    "name": "Joe Smith",
    "talents": "jumping,running,swimming"
}

The following interpretations would create a Joe Smith node with relationships to jumping, running, and swimming:

- implementation: nodestream.interpreting:Interpreter
  arguments:
    interpretations:
      - type: source_node
        node_type: Person
        key:
          name: !jmespath name
      - type: relationship
        node_type: Talent
        relationship_type: HAS_TALENT
        find_many: true
        node_key:
          name: !split
            data: !jmespath talents
            delimiter: ","

`!normalize`

The !normalize value provider allows you to utilize the normalization functionality to normalize an incoming value. For example, if you wanted to normalize a name field in the record:

{
    "name": "Joe Smith   ",
}

The following interpretation would create a node with the key Joe Smith:

- implementation: nodestream.interpreting:Interpreter
  arguments:
    interpretations:
      - type: source_node
        node_type: Person
        key:
          name: !normalize
            using: trim_whitespace
            data: !jmespath name

While most interpretations support a normalization block (See the Interpretations reference above for more information), the !normalize value provider allows you to normalize a value before it is returned to the interpretation. This is useful when you want to normalize a value in a key or property block where the normalization should only be applied to that value only. For example, if you wanted to normalize the city field of the record but not the state field:

{
    "city": "New York   ",
    "state": "NY"
}

The following interpretation would create a Locality node with the keys of New York and NY:

- implementation: nodestream.interpreting:Interpreter
  arguments:
    interpretations:
      - type: source_node
        node_type: Locality
        key:
          city: !normalize
            using: trim_whitespace
            data: !jmespath city
          state: !jmespath state

Interpreting

Interpreter​

Interpretations​

Source Node Interpretation​

Example​

Relationship Interpretation​

Example​

Properties Interpretation​

Example​

Variables Interpretation​

Switch Interpretation​

Source Node and Relationship Interpretation Combined​

Given: Source node does not exist in the graph​

Given: Source node already exists in the graph​

Normalizers​

ValueProviders​

!jmespath​

!variable​

!format​

!regex​

!split​

!normalize​

Interpreter

Interpretations

Source Node Interpretation

Example

Relationship Interpretation

Example

Properties Interpretation

Example

Variables Interpretation

Switch Interpretation

Source Node and Relationship Interpretation Combined

Given: Source node does not exist in the graph

Given: Source node already exists in the graph

Normalizers

ValueProviders

`!jmespath`

`!variable`

`!format`

`!regex`

`!split`

`!normalize`