K-Extractor: Solution Overview

The K-Extractor™ semantically parses unstructured text to automatically extract knowledge. Knowledge can take different forms: concepts, semantic relations, events, sentiment analysis, opinion mining, and text snippets with important information. Extracted knowledge is saved into a traditional RDBMS or RDF store and can be queried in plain English: “List all people from USA and their affiliations who worked with Zhang San on cancer treatment between 2010 and 2015”. The knowledge can be used to drive custom applications: intelligent search, question answering, summarization of a document, populating predefined templates, generating scientific profiles, student answer assessment or dialogue systems. Check out the basic functionality, see our demo.

Core Functionality

Lymba's K-Extractor automates and greatly simplifies the process of loading/accessing knowledge from text that otherwise requires a substantial amount of manual effort. The core features include:

Extraction of structured knowledge from unstructured, semi-structured, or structured content by using our NLP Pipeline. Input text can be in multiple formats, from plain text to image-only scanned documents, including popular office formats, ebooks, html, Wikipedia. We can process foreign languages and the non-grammatical language of social media.

Indexing the knowledge and storing the results in a standard RDBMS or RDF store.

Query knowledge stored in a data store easily and efficiently using our built-in Natural Language query interface

Advanced Capabilities

In addition to core functionality, Lymba's K-Extractor is adaptable to domain-specific knowledge needs and provides advanced inference:

Allow customers to define the knowledge items (entities, events, relationships) of interest that need to be extracted from various data resources.

Allow customers to define their own domain/application specific relationship types (e.g. for the medical domain: CAUSES_DISEASE, CURE_FOR, etc.) to be identified in the input textual data.

Infer relationships and concepts from ontologies created using Jaguar, our automatic Ontology Building Tool.

Align the structured and unstructured content with the ontology stored in the semantic model.

Integration

K-Extractor is not a black box; it allows tight integration with customer infrastructure on the knowledge level. All extracted knowledge is available to a customer in a form of RDF triples, OWL or custom XML files and can be used by other applications. Additionally, K-Extractor can take as input existing lexicons, ontologies or any other format of structured data and align newly extracted knowledge with it.

K-Extractor populates Oracle Database’s (11g and 12c versions) relational tables and RDF store with important knowledge automatically extracted from unstructured data. The structured knowledge can be modeled into RDF, pushed into an Oracle Spatial semantic index, and queried using SPARQL.

Alternatively, the structured knowledge can be transformed to populate customers' existing relational tables and existing SQL-based querying tools which can then access this additional knowledge in a transparent manner.

K-Extractor is also available as SaaS and can be hosted on any popular cloud service on a dedicated cloud with secure RESTful API.