The NLP Pipeline is the foundation for all semantic processing on the K-Platform.

The NLP Pipeline is the semantic processing foundation for all other K-Platform products and is available as Information Extraction or as the NLP Base.

Given any text input, Information Extraction identifies named entities such as people (Steve Jobs), organizations (city council), locations (Cupertino City Hall), or times (Tuesday), as well as events (gave and meeting), as seen in the Fiugre to the right.

Steve Jobs gave his proposal for a new Apple campus on Tuesday at Cupertino City Hall during the monthly city council meeting.”

Diagram of NLP pipeline
Diagram of NLP pipeline

Information Extraction is defined by the blue pipeline and is executed for each new document submitted to the K-Platform. This includes (1) text segmentation into words, sentences, paragraphs, and documents, (2) assignment of parts of speech to each word, (3) named entity recognition, (4) event identification, and (5) phrase chunking. For each of steps (1) through (4), customization interfaces are available to make it easy to tailor Information Extraction to any document set or domain.

The Base NLP pipeline includes all the processing steps from Information Extraction in addition to the steps in green. The purpose of Base NLP is to provide semantic relations between entities and events including identity links. These links are translated directly into semantic triples that can be stored in a semantic database like Oracle 11GR2 or can be used by Jaguar™ to build a Knowledge Base and ontology.

As part of the Base NLP pipeline, (1) words are disambiguated for their proper sense (i.e. river bank vs. financial bank), (2) a syntactic tree is formed for each sentence, (3) semantic relations are detected between entities and events, (4) co-reference links are formed between synonymous entities and events, and (5) event relations are formed.

There are 26 relations possible for connecting concepts in text, specifically chosen to guarantee a complete and robust semantic representation of text. The table below lists all the relations with their definition and short code.

Diagram of NLP pipeline

Each of these relations can be used for domain customization. For example, if your application operates in the law enforcement domain, you can extract relations like "Arrested-At" using the K-Platform's relation customization tools.

Diagram of NLP pipeline

Download the NLP Pipeline Brochure here.