Thursday, March 29, 2012

Filter Html tags on Full text Search

I have a Text field in my database that has data along with the HTML tags.
I dont want to search these HTML tags on my FUll text Search.
Example Data
< Font color='red'> Blah</Font>
I want to ignore the font tages on my search.
How do i do it?
save the content in the text data type columns into columns of the image
data type, and use the document type column with a value of htm so that only
the content of these files will be indexed.
"Bruce" <Bruce@.discussions.microsoft.com> wrote in message
news:3BF3659B-E3EB-495F-8D17-75DDD996E323@.microsoft.com...
>I have a Text field in my database that has data along with the HTML tags.
> I dont want to search these HTML tags on my FUll text Search.
> Example Data
> < Font color='red'> Blah</Font>
> I want to ignore the font tages on my search.
> How do i do it?
>
|||Can you please be more elaborate.I am indexing almost 9 columns of Text data
type.How do I save all these columns into columns of the image data type?
How do I use document type columns?
"Hilary Cotter" wrote:

> save the content in the text data type columns into columns of the image
> data type, and use the document type column with a value of htm so that only
> the content of these files will be indexed.
> "Bruce" <Bruce@.discussions.microsoft.com> wrote in message
> news:3BF3659B-E3EB-495F-8D17-75DDD996E323@.microsoft.com...
>
>
|||for the document type column you must
1) ensure you have a column which is char(3) or char(4) and contains the
value htm or .htm
2) store your html content in an image data type column
3) use sp_fulltext_column to specify that the document type is specified in
the document type column you created above in 1)
here is an example
sp_fulltext_column 'MyTable','ImageColumn', 'add', 1033,
'DocumentTypeColumn'
where MyTable is the table you are full text indexing, ImageColumn is a
column of the image datatype, and DocumentTypeColumn is the char(3) or
char(4) column which tells what the native type of the document you are
storing is.
Now, if you also might want to convert your docs to pure text. Using
FiltDump -b myhtmldoc.htm > myhtmldoc.txt is one way of doing it.
To convert your columns from html to text or from the text datatype to image
you should spit them out to the file system, and then convert them and push
them back.
Let me know if you need code samples to do this.
"Bruce" <Bruce@.discussions.microsoft.com> wrote in message
news:604B02B0-2741-402F-AE62-318452B9BDF0@.microsoft.com...
> Can you please be more elaborate.I am indexing almost 9 columns of Text
data[vbcol=seagreen]
> type.How do I save all these columns into columns of the image data type?
> How do I use document type columns?
>
> "Hilary Cotter" wrote:
only[vbcol=seagreen]
tags.[vbcol=seagreen]
|||Hilary,
"Now, if you also might want to convert your docs to pure text. Using
FiltDump -b myhtmldoc.htm > myhtmldoc.txt is one way of doing it."
Is the use of FiltDump in the above scenario a violation of Microsoft's
licensing agreement?
Thanks,
John
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:uIB1eqrtEHA.2596@.TK2MSFTNGP10.phx.gbl...
> for the document type column you must
> 1) ensure you have a column which is char(3) or char(4) and contains the
> value htm or .htm
> 2) store your html content in an image data type column
> 3) use sp_fulltext_column to specify that the document type is specified
in
> the document type column you created above in 1)
> here is an example
> sp_fulltext_column 'MyTable','ImageColumn', 'add', 1033,
> 'DocumentTypeColumn'
>
> where MyTable is the table you are full text indexing, ImageColumn is a
> column of the image datatype, and DocumentTypeColumn is the char(3) or
> char(4) column which tells what the native type of the document you are
> storing is.
> Now, if you also might want to convert your docs to pure text. Using
> FiltDump -b myhtmldoc.htm > myhtmldoc.txt is one way of doing it.
> To convert your columns from html to text or from the text datatype to
image
> you should spit them out to the file system, and then convert them and
push[vbcol=seagreen]
> them back.
> Let me know if you need code samples to do this.
>
> "Bruce" <Bruce@.discussions.microsoft.com> wrote in message
> news:604B02B0-2741-402F-AE62-318452B9BDF0@.microsoft.com...
> data
type?[vbcol=seagreen]
image[vbcol=seagreen]
that
> only
> tags.
>
|||That's in interesting question. I'll have to check into it.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"John Kane" <jt-kane@.comcast.net> wrote in message
news:OEpwKoFuEHA.2624@.TK2MSFTNGP11.phx.gbl...[vbcol=seagreen]
> Hilary,
> "Now, if you also might want to convert your docs to pure text. Using
> FiltDump -b myhtmldoc.htm > myhtmldoc.txt is one way of doing it."
> Is the use of FiltDump in the above scenario a violation of Microsoft's
> licensing agreement?
> Thanks,
> John
>
> "Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
> news:uIB1eqrtEHA.2596@.TK2MSFTNGP10.phx.gbl...
> in
> image
> push
Text[vbcol=seagreen]
> type?
> image
> that
HTML
>
|||Yes, as you had indicated in the past that any such use was not allowed by a
non-publicly available license policy for these files in another thread. If
you or Microsoft would make this licensing policy public, then there would
be less confusion on this issue.
Best Regards,
John
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:#SNgrgWuEHA.3320@.TK2MSFTNGP15.phx.gbl...[vbcol=seagreen]
> That's in interesting question. I'll have to check into it.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
>
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:OEpwKoFuEHA.2624@.TK2MSFTNGP11.phx.gbl...
the[vbcol=seagreen]
specified[vbcol=seagreen]
a[vbcol=seagreen]
are
> Text
> HTML
>
|||Specifically what I said in the past was that I had been advised by
Microsoft that you could not use the word breakers for your own purposes, ie
to roll your own hit highlighting solution.
Filtdump may be another matter, as it is a diagnostic tool.
"John Kane" <jt-kane@.comcast.net> wrote in message
news:%2344qClXuEHA.2828@.TK2MSFTNGP12.phx.gbl...
> Yes, as you had indicated in the past that any such use was not allowed by
> a
> non-publicly available license policy for these files in another thread.
> If
> you or Microsoft would make this licensing policy public, then there would
> be less confusion on this issue.
> Best Regards,
> John
>
> "Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
> news:#SNgrgWuEHA.3320@.TK2MSFTNGP15.phx.gbl...
> the
> specified
> a
> are
>
|||You also "generalized" response to include ALL .dll and .exe files in
addition to the wordbreaker dll files, if memory servers me correctly. I
also asked you (or Microsoft) at that time to make public (and now again)
the specific licensing policy from Microsoft that you are referring to. If
it is a secret or under NDA, then how can anyone judge whether or not he or
she is violating a non-public licensing agreement.
Best Regards,
John
PS: Feel free to contact me off-line.
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:eyNhvefuEHA.3456@.TK2MSFTNGP10.phx.gbl...
> Specifically what I said in the past was that I had been advised by
> Microsoft that you could not use the word breakers for your own purposes,
ie[vbcol=seagreen]
> to roll your own hit highlighting solution.
> Filtdump may be another matter, as it is a diagnostic tool.
>
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:%2344qClXuEHA.2828@.TK2MSFTNGP12.phx.gbl...
by[vbcol=seagreen]
would[vbcol=seagreen]
Microsoft's[vbcol=seagreen]
contains[vbcol=seagreen]
is[vbcol=seagreen]
Using[vbcol=seagreen]
of[vbcol=seagreen]
data[vbcol=seagreen]
[vbcol=seagreen]
the
>
|||Please review the pertinent posts:
http://groups.google.com/groups?hl=e...ver .fulltext
http://groups.google.com/groups?hl=e...ver .fulltext
I am not trying to hide anything and AFAIK the communication was not under
NDA. I do not have the communication I had with the Microsoft developer, nor
do I have the response I received from the link I posted as I posted above.
The link I posted is regarding distributing dlls and exes, as you correctly
point out. When I asked another question about tapping into services exposed
by another Microsoft product for my commercial use, I was directed to this
link by a PSS engineer who explained this was the forum to ask these
questions to.
I'll follow up on filtdump and post back here with the response I get. I
will ask that it can be made public.
Please stop mischaracterizing what I say, or check the original posts before
commenting on them.
"John Kane" <jt-kane@.comcast.net> wrote in message
news:#dD0fKhuEHA.2632@.TK2MSFTNGP10.phx.gbl...
> You also "generalized" response to include ALL .dll and .exe files in
> addition to the wordbreaker dll files, if memory servers me correctly. I
> also asked you (or Microsoft) at that time to make public (and now again)
> the specific licensing policy from Microsoft that you are referring to. If
> it is a secret or under NDA, then how can anyone judge whether or not he
or[vbcol=seagreen]
> she is violating a non-public licensing agreement.
> Best Regards,
> John
> PS: Feel free to contact me off-line.
>
> "Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
> news:eyNhvefuEHA.3456@.TK2MSFTNGP10.phx.gbl...
purposes,[vbcol=seagreen]
> ie
allowed[vbcol=seagreen]
> by
thread.[vbcol=seagreen]
> would
Using[vbcol=seagreen]
> Microsoft's
> contains
ImageColumn[vbcol=seagreen]
> is
char(3)[vbcol=seagreen]
you[vbcol=seagreen]
> Using
datatype[vbcol=seagreen]
them[vbcol=seagreen]
> of
> data
of[vbcol=seagreen]
htm[vbcol=seagreen]
> the
Search.
>

No comments:

Post a Comment