ruby on rails - Which record is causing NOTICE: word is too long to be indexed -


using postgres in rails app (with pg_search gem), have enabled search tsvector. in database on 35,000 records several messages saying

notice:  word long indexed detail:  words longer 2047 characters ignored. 

am correct in assuming "word" not include whitespace? how can determine records causing message?

here's sql generated migration introduces indexes

 ==  addindexforfulltextsearch: migrating ====================================== -- add_column(:posts, :tsv, :tsvector)    -> 0.0344s -- execute("      create index index_posts_tsv on posts using gin(tsv);\n")    -> 0.1694s -- execute("    update posts set tsv = (to_tsvector('english', coalesce(title, '')) || \n                            to_tsvector('english', coalesce(intro, '')) || \n                            to_tsvector('english', coalesce(body, '')));\n") notice:  word long indexed detail:  words longer 2047 characters ignored. notice:  word long indexed detail:  words longer 2047 characters ignored. notice:  word long indexed detail:  words longer 2047 characters ignored. notice:  word long indexed detail:  words longer 2047 characters ignored.    -> 343.0556s -- execute("      create trigger tsvectorupdate before insert or update\n      on posts each row execute procedure\n      tsvector_update_trigger(tsv, 'pg_catalog.english', title, intro, body);\n")    -> 0.0266s 

according the postgresql documentation, “full text search functionality includes ability […] parse based on more white space”, depending on “text search configurations”. you’ll have examine configuration find out “word” means.

you search long whitespace-separated words using regular expression:

select regexp_matches(the_text_col, '\s{2047,}') the_table 

that regex searches 2047 or more consecutive non-whitespace characters.


Comments

Popular posts from this blog

Unable to remove the www from url on https using .htaccess -