K-Extractor: Technical Details

K-Extractor includes powerful pre-processing modules that support all popular office formats, web and wiki pages, PDFs and ebooks and recognizes the structure of the documents. The semantic functionality is based on our high-quality NLP Pipeline. K-Extractor allows ingestion of the extracted knowledge into existing storage on the customer side.

All the steps involved in the knowledge extraction, including queries and calls, preprocessing and NLP pipeline execution, loading extracted RDF triples into the storage, etc., are performed in parallel. The expected throughput is 1 KB/sec per CPU. Currently, one system installation provides real-time search access for 7 million research publications.

System Requirements

Architecture: Intel/AMD 32-bit and 64-bit

Operating System: CentOS release 5.x, 6.x

RedHat Enterprise Linux release 5.x, 6.x Oracle Linux release 5.x, 6.x

Memory: minimum 2GB; preferred 4GB+

Free disk space: 8GB minimum; more depending on data

Oracle Database: version 11g or 12c

Oracle Database options: Spatial Option for using RDF store and semantic index

Cloud-Based Service

K-Extractor is available as SaaS, allowing smooth scalability, and hassle-free integration with customer infrastructure. We provide a dedicated instance with RESTful API and secure access.

Integration with Other Semantic Tools

K-Extractor can be used together with other Lymba's NLP solutions to provide more functionality:

Incorporate customer-provided ontologies or knowledge bases to improve inference and querying.

Generate automatic ontologies from customer-provided documents using Jaguar. Jaguar also supports the merging of ontologies, to merging newly generated ontologies with customer-provided ontologies into a uniform representation.

Intelligent search with Collaborative High Precision Search (CHiPS) to browse key topics and work with document collection in a more usual search-like way.