K-Extractor: Platform Overview

The K-Extractor™ semantically parses unstructured text to automatically extract knowledge. Knowledge can take different forms: concepts, semantic relations, events, sentiment analysis, opinion mining, and text snippets with important information. Extracted knowledge is saved into a traditional RDBMS or RDF store and can be queried in plain English: “List all people from USA and their affiliations who worked with Zhang San on cancer treatment between 2010 and 2015”. The knowledge can be used to drive custom applications: intelligent searchquestion answering, summarization of a document, populating predefined templates, generating scientific profiles, student answer assessment or dialogue systems. Check out the basic functionality, see our demo.

Core Functionality

2021_NLPPipeline-02-01.png

Lymba's K-Extractor™ automates and greatly simplifies the process of loading/accessing knowledge from text that otherwise requires a substantial amount of manual effort. The core features include:

  • Extraction of structured knowledge from unstructured, semi-structured, or structured content by using our NLP Pipeline. Input text can be in multiple formats, from plain text to image-only scanned documents, including popular office formats, ebooks, html, Wikipedia. We can process foreign languages and the non-grammatical language of social media.

  • Indexing the knowledge and storing the results in a standard RDBMS or RDF store.

  • Query knowledge stored in a data store easily and efficiently using our built-in Natural Language query interface.

Advanced Capabilities

In addition to core functionality, Lymba's K-Extractor™ is adaptable to domain-specific knowledge needs and provides advanced inference:

  • Allow customers to define the knowledge items (entities, events, relationships) of interest that need to be extracted from various data resources.

  • Allow customers to define their own domain/application specific relationship types (e.g. for the medical domain: CAUSES_DISEASE, CURE_FOR, etc.) to be identified in the input textual data.

  • Infer relationships and concepts from ontologies created using Jaguar™ , our automatic Ontology Building Tool.

  • Align the structured and unstructured content with the ontology stored in the semantic model.

Integration

K-Extractor™ is not a black box; it allows tight integration with customer infrastructure on the knowledge level. All extracted knowledge is available to a customer in a form of RDF triples, OWL or custom XML files and can be used by other applications. These triples can be exported to an enterprise knowledge graph (or graph database), including Stardog, Anzograph, MarkLogic, Allegrograph, Oracle, Neo4J, and others.

K-Extractor™ populates graph databases and RDF triple stores with important knowledge automatically extracted from unstructured data. The structured knowledge can be modeled into RDF, pushed into a semantic index, and queried using SPARQL.

Alternatively, the structured knowledge can be transformed to populate customers' existing relational tables and existing SQL-based querying tools which can then access this additional knowledge in a transparent manner.

K-Extractor™ is available both as an on-prem solution or as SaaS and can be hosted on any popular cloud service on a dedicated cloud with secure RESTful API.