K-Extractor: Technical Details
K-Extractor includes powerful pre-processing modules that support all popular office formats, web and wiki pages, PDFs and ebooks and recognizes the structure of the documents. The semantic functionality is based on our high-quality NLP Pipeline. K-Extractor allows ingestion of the extracted knowledge into existing storage on the customer side.
All the steps involved in the knowledge extraction, including queries and calls, preprocessing and NLP pipeline execution, loading extracted RDF triples into the storage, etc., are performed in parallel. The expected throughput is 1 KB/sec per CPU. Currently, one system installation provides real-time search access for 7 million research publications.
Integration with Other Semantic Tools
K-Extractor can be used together with other Lymba's NLP solutions to provide more functionality:
Incorporate customer-provided ontologies or knowledge bases to improve inference and querying.
Generate automatic ontologies from customer-provided documents using Jaguar. Jaguar also supports the merging of ontologies, to merging newly generated ontologies with customer-provided ontologies into a uniform representation.
Intelligent search with Collaborative High Precision Search (CHiPS) to browse key topics and work with document collection in a more usual search-like way.