SyntaxStudy
Sign Up
MySQL N-gram Parser for CJK Languages
MySQL Advanced 5 min read

N-gram Parser for CJK Languages

N-gram Parser

The built-in n-gram parser tokenizes text character by character (not by word boundaries), enabling full-text search for Chinese, Japanese, and Korean.

Example
-- Create table with n-gram parser
CREATE TABLE cjk_articles (
  id INT AUTO_INCREMENT PRIMARY KEY,
  content TEXT,
  FULLTEXT(content) WITH PARSER ngram
);

-- Set n-gram token size (default 2)
-- ngram_token_size = 2  (in my.cnf)
Pro Tip

ngram_token_size = 2 means every 2-character sequence is indexed.