manticore search available via ClickHouse

i’m a big fan of ClickHouse; over time it became central hub for any sort of business intelligence analytics. at work we replicate data to it, use dictionaries loaded over http and mysql, recently – we’re also using MySQL table or database functions to be able to run queries with joins across data from various systems. those ‘foreign database wrappers’ exposing data from MySQL to ClickHouse two are not very sophisticated, usually ClickHouse loads all table rows for relevant columns to its memory, but that’s acceptable for our use cases with few hundreds thousands of rows.

recently we’ve exposed to ClickHouse new mysql dataset – content of RT ticketing system. it’s fast enough for ad-hoc analytics, but way too slow to run text search. i already have manticore index built on top of RT’s attachments… so why not using it?

i’ve set up my manticore 6.3.x to expose HTTP read-only interface to ClickHouse [ listen = 9313 in section searchd of /etc/manticoresearch/manticore.conf proxied via apache2 ]. in ClickHouse i’ve used parametrized view:

CREATE VIEW search AS 
 SELECT 
  JSONExtractInt(arrayJoin(JSONExtractArrayRaw(r, 'hits', 'hits')), '_id') AS transaction_id
 FROM 
 url(concat('https://address.of.proxy.to.manticore/manticore-sql?query=',encodeURLComponent(concat('SELECT id FROM transaction WHERE MATCH(\'',{text:String},'\') LIMIT 20000 OPTION max_matches=2000'))),JSONAsString,'r text')

not pretty, but it does work:

SELECT * FROM search3(text='some search phrase passed to manticore');

   ┌─transaction_id─┐
1. │        6768586 │
   └────────────────┘

yes – it is one big SQL injection; manticore claims that communication to /sql is read-only; fingers crossed it is indeed.

ClickHouse’s new JSON format is in active development right now – hopefully soon JSONExtract* functions will support it. once that happens – above query will get much simpler.

Leave a Reply Cancel reply