ruby on rails - Which record is causing NOTICE: word is too long to be indexed -
using postgres in rails app (with pg_search gem), have enabled search tsvector. in database on 35,000 records several messages saying
notice: word long indexed detail: words longer 2047 characters ignored.
am correct in assuming "word" not include whitespace? how can determine records causing message?
here's sql generated migration introduces indexes
== addindexforfulltextsearch: migrating ====================================== -- add_column(:posts, :tsv, :tsvector) -> 0.0344s -- execute(" create index index_posts_tsv on posts using gin(tsv);\n") -> 0.1694s -- execute(" update posts set tsv = (to_tsvector('english', coalesce(title, '')) || \n to_tsvector('english', coalesce(intro, '')) || \n to_tsvector('english', coalesce(body, '')));\n") notice: word long indexed detail: words longer 2047 characters ignored. notice: word long indexed detail: words longer 2047 characters ignored. notice: word long indexed detail: words longer 2047 characters ignored. notice: word long indexed detail: words longer 2047 characters ignored. -> 343.0556s -- execute(" create trigger tsvectorupdate before insert or update\n on posts each row execute procedure\n tsvector_update_trigger(tsv, 'pg_catalog.english', title, intro, body);\n") -> 0.0266s
according the postgresql documentation, “full text search functionality includes ability […] parse based on more white space”, depending on “text search configurations”. you’ll have examine configuration find out “word” means.
you search long whitespace-separated words using regular expression:
select regexp_matches(the_text_col, '\s{2047,}') the_table
that regex searches 2047 or more consecutive non-whitespace characters.
Comments
Post a Comment