Manticore Search 23.0.0 introduced bigram_delimiter combined with digit-aware bigram_index modes to solve a common tokenization mismatch in product search: users typing glued model names like 'xt850' failing to match indexed text stored as 'xt 850'. Two modes are explained with working SQL examples: second_numeric (matches when the second token is purely numeric, e.g. 'galaxy24') and second_has_digit (matches when the second token contains any digit, covering mixed identifiers like 'iphone5se', 'eos80d', 'thinkpadx1'). The bigram_delimiter setting controls whether glued, delimited, or both token forms are stored. Using 'both' is recommended as it covers user-facing glued queries without breaking normal phrase behavior.

7m read timeFrom manticoresearch.com
Post cover image
Table of contents
TL;DRAssumptions and verificationThe broader search problemBaseline: why xt850 fails by defaultWhy bigrams help hereA note about bigram_delimiterMode 1: second_numericMode 2: second_has_digitHow to choose between the twoFinal takeaway

Sort: