Thursday, March 29, 2012

Filter double results from query

We are working on an application which aggregates news articles from various sources. The problem is, when querying the database, is that there are a lot of double results because some of the sources get their news from the same press agency. To clarify things a bit here's a link to an example:

http://84.244.146.27/Zoekresultaten.aspx?SearchQuery=Amsterdam

As you can see, there are a lot of the same results. What we want is to filter out the double results irrespective of the source. In other words; when the query encounters the first result, display that and leave out the other same results.

I've tried filtering them out by using DISTINCT on the 'news header' / results, but that does not seem to work. Any ideas on how to get this done?

Any help is greatly appreciated!

Providing the aggregator select is an infrequent action, you can use a CURSOR within it to select just the first record to a buffer table which the page will read.

|||

Thanks Tatworth for your answer! I'm afraid that it sounds a bit like Chinese to me though...

The aggregator is run every couple of minutes and of course it would be best to filter the results in that step, then filtering on querying results. There's a slight problem though with the aggregator script. It's in PHP :( and PHP and MSSQL don't play along that nicely with eachother as far as I have experienced.

Is there a way to filter within the select query?

|||

Yes, as an example:

14-11-2007

10:52

Marokkaanse vaders in actie in Amsterdam-West

BNdestem BNdestem

|||

If your answer is 2005 or Express, and you want the earliest record, this syntax will do it for you:

SELECT *

FROM (

SELECT {the fields from your original query},row_number() OVER (PARTITION BY {the list of key fields} ORDER BY {Your datetime field} ASC) AS TheRank

{the rest of your original query}

) t1

WHERE TheRank=1

Example:

SELECT *FROM (SELECT NewsDate,NewsTitle,NewsLink,row_number()OVER (PARTITIONBY NewsTitleORDER BY NewsDateASC)AS TheRankFROM NewsTable) t1WHERE TheRank=1
|||

Thanks very much Motley! It works like a charm!

No comments:

Post a Comment