{"id":3721,"date":"2025-04-06T20:19:58","date_gmt":"2025-04-06T19:19:58","guid":{"rendered":"https:\/\/kudzia.eu\/b\/?p=3721"},"modified":"2025-04-07T08:37:09","modified_gmt":"2025-04-07T07:37:09","slug":"making-manticore-search-available-via-clickhouse","status":"publish","type":"post","link":"https:\/\/kudzia.eu\/b\/2025\/04\/making-manticore-search-available-via-clickhouse\/","title":{"rendered":"manticore search available via ClickHouse"},"content":{"rendered":"\n<p>i&#8217;m a big fan of <a href=\"https:\/\/clickhouse.com\/\">ClickHouse<\/a>; over time it became central hub for any sort of business intelligence analytics. at work we replicate data to it, use dictionaries loaded over http and mysql, recently &#8211; we&#8217;re also using MySQL table or database functions to be able to run queries with joins across data from various systems. those &#8216;foreign database wrappers&#8217; exposing data from MySQL to ClickHouse two are not very sophisticated, usually ClickHouse loads all table rows for relevant columns to its memory, but that&#8217;s acceptable for our use cases with few hundreds thousands of rows.<\/p>\n\n\n\n<p>recently we&#8217;ve exposed to ClickHouse new mysql dataset &#8211; content of <a href=\"https:\/\/bestpractical.com\/request-tracker\">RT ticketing system<\/a>. it&#8217;s fast enough for ad-hoc analytics, but way too slow to run text search. i already have manticore index built on top of RT&#8217;s attachments&#8230; so why not using it?<\/p>\n\n\n\n<p>i&#8217;ve set up my <a href=\"https:\/\/manticoresearch.com\/\">manticore<\/a> 6.3.x to expose HTTP read-only interface to ClickHouse [ <em>listen = 9313<\/em> in section <em>searchd<\/em> of \/etc\/manticoresearch\/manticore.conf proxied via apache2 ]. in ClickHouse i&#8217;ve used parametrized view:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE VIEW search AS \n SELECT \n  JSONExtractInt(arrayJoin(JSONExtractArrayRaw(r, 'hits', 'hits')), '_id') AS transaction_id\n FROM \n url(concat('https:\/\/address.of.proxy.to.manticore\/manticore-sql?query=',encodeURLComponent(concat('SELECT id FROM transaction WHERE MATCH(\\'',{text:String},'\\') LIMIT 20000 OPTION max_matches=2000'))),JSONAsString,'r text')\n<\/code><\/pre>\n\n\n\n<p>not pretty, but it does work:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT * FROM search3(text='some search phrase passed to manticore');\n\n   \u250c\u2500transaction_id\u2500\u2510\n1. \u2502        6768586 \u2502\n   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n<\/code><\/pre>\n\n\n\n<p>yes &#8211; it is one big SQL injection; manticore claims that communication to \/sql is read-only; fingers crossed it is indeed.<\/p>\n\n\n\n<p>ClickHouse&#8217;s new JSON format is in active development right now &#8211; hopefully soon JSONExtract* functions will support it. once that happens &#8211; above query will get much simpler.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>i&#8217;m a big fan of ClickHouse; over time it became central hub for any sort of business intelligence analytics. at work we replicate data to it, use dictionaries loaded over http and mysql, recently &#8211; we&#8217;re also using MySQL table or database functions to be able to run queries with joins across data from various [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[112,127],"class_list":["post-3721","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-clickhouse","tag-manticoresearch"],"_links":{"self":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3721","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/comments?post=3721"}],"version-history":[{"count":5,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3721\/revisions"}],"predecessor-version":[{"id":3729,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3721\/revisions\/3729"}],"wp:attachment":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/media?parent=3721"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/categories?post=3721"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/tags?post=3721"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}