reset password
Author Message
tloi
Posts: 16
Posted 22:59 May 29, 2012 |

Hi Dr. Sun,


On the forum I see you mentioned that  FTS on uploaded files must be done using PostgresSQL.  But doing this, I think, a tsvector for a file need to be store in the table? Is it too big data for a field if a file is large?

 

Thanks.

cysun
Posts: 2935
Posted 08:06 May 30, 2012 |

In practice some FTS implementations deal with the large file problem by searching only the first part of a file. For example, early versions of Windows Search only search the first 5000 words of a file. The reasoning is that if a word does not appear in the first 5000 words of a document, it's probably not an important word for the document. You can do something like that in your implementation.