The Natural Language Processing Pipeline

Lymba's NLP Pipeline Service is the heart of all Lymba's Knowledge Management solutions. Given text as input, the NLP Pipeline identifies a variety of key concepts and semantic relations between them. The set of concept and relation types is customizable and expandable, with a powerful set available out of the box:

26+ relation types:

—Is-A, Part-Whole

—Agent, Instrument, Cause

—At-Location, At-Time

—Value, Manner

—Domain-specific relations like Diagnosis, Arrested-At, Worked-With, etc.

86+ concept types:

—People

—Organizations

—Locations

—Time expressions

—Domain-specific terms like bacteria, symptoms, tattoo shapes, brands, etc.

 

The Importance of Deep Semantics

Semantic relations are fundamentally important for understanding text. Compare, for example, these two sentences:

1. The committee awarded a girl with curly hair.

2. The committee awarded a girl with a book.

While the surface structure is very similar, the underlined phrases have different meaning and connection. Such deep understanding is not achievable by tossing keywords with shallow NLP methods.

 

NLP Pipeline Modules

The NLP Pipeline is a chain of individual modules. Each module extracts and labels specific types of information that later modules and/or applications can build on top of. The processing starts with shallow NLP tasks, from text segmentation to syntactic parsing. Then, deep semantic processing is performed. The semantic parser uses 26 basic semantic relation types to express semantic connectivity for all concepts in text. These types were specifically chosen to guarantee a complete and robust semantic representation of text.

2021_NLPPipeline-02-01.png

Lymba's NLP Pipeline Service includes additional tools that enhance the information extracted by the semantic parser:

  • co-reference resolution module that links entities and events that refer to each other;

  • semantic calculus tool that enables a user to define and extract custom relations between concepts;

  • Our event identification module that builds rich temporal timelines and causal chains of events mentioned in text.

The output of the NLP Pipeline can be directly translated into semantic triples that can be stored in a semantic database or used by Jaguar to build a Knowledge Base and ontology.

 

Domain Customization

Different domains mean different knowledge needs. Our NLP Pipeline allows extraction of custom high-level relations via combination of underlying basic relations. For example, if your application operates in the law enforcement domain, you can extract relations like "Arrested-At" using inference as shown in the figure below.

Screen Shot 2020-06-25 at 11.52.39 AM.png