Natural Language Processing Tools for East Asian and Western languages

Our suite of open source and proprietary natural language processing products are used to power the businesses of leading companies across the globe. Our proprietary products all come with generous licenses that allow unlimited use and modification. We are always happy to demonstrate our products and offer trial license keys.


Kuromoji is our open source, Japanese morphological analyzer written in Java. Core features include word segmentation, part-of-speech tagging, lemmatization, readings for kanji, and support for multiple dictionary backends. Kuromoji powers Japanese language support in Apache Lucene and Apache Solr and is also used in Elasticsearch. For a demo and further information.

In addition to our Apache License v2 Java Kuromoji, we offer a C version with a proprietary license and commercial support.



Akahai is our query suggestion engine for Japanese, Chinese, Korean, and Western Languages. Akahai gives instant, high quality suggestions during search or other text input and can match input across any combination of hiragana, katakana, kanji and romaji. It also offers multi-term matching, match highlighting, and spell-checking, all with extremely low latency.

Akahai is simple to integrate with search engines, databases, and web applications and serves millions of users daily.


We offer a powerful keyword extractor for Japanese. Keyword and concept extraction can be used for a variety of applications, such as: searching by example, content personalization, recommendation systems, and identifying major themes across large data sets.


Our named entity extractor can recognize named entities, such as person names, companies, locations, buildings, and religions. It currently supports texts in Japanese, English and Scandinavian languages. By using language context, our technology can tag the same word with different ypes based on usage, and is even able to accurately tag unknown terms. Our machine learning based models can be supplemented with dictionaries of custom entities specific to your domain. We also offer a user friendly annotation tool for building custom data to include in your model.


Panda is our morphological analyzer for simplified Chinese. It provides high accuracy word segmentation, part-of-speech tagging and pinyin for Chinese text. Information about corresponding traditional characters is also available for each token.

All of our products are available in Java. Our products are also packaged with REST servers to allow for easy integration with any platform or language.

Our customers

NHST Media Group
NTT Docomo

Let’s get to work

Contact us about your project today for a free consultation.