How to Convert Oracle Text statements to SQL Server 2005

This article overviews Oracle Text, which is Oracle integrated full-text retrieval technology, and shows the ways to emulate its functionality in Microsoft SQL Server 2005. In Oracle, the CONTEXT index type is used to index large amounts of text. Similar functionality is implemented as Full-Text Search in SQL Server and full-text indexes are used to index large amounts of text. Below I will cover common techniques to perform full-text retrieval on both platforms and the differences in its syntax.

First, let us consider the preconditions of full-text indexes creation in SQL Server. With Oracle Text, you can create a CONTEXT index on columns of type VARCHAR2, CLOB, BLOB, CHAR, BFILE, XMLType and URIType. In SQL Server, a full-text index can be built on CHAR, VARCHAR, NCHAR, NVARCHAR, TEXT, NTEXT, IMAGE, XML and VARBINARY. A table, for which a SQL Server full-text index will be created, should contain at least one unique not nullable column, and only one full-text index is allowed per SQL Server table.

The following commands create full-text indexes in Oracle and SQL Server.

Oracle:

CREATE TABLE mytext(id int not null, thetext VARCHAR2(4000));

CREATE INDEX mytext_idx
ON mytext(thetext)
INDEXTYPE IS CTXSYS.CONTEXT;

SQL Server :

CREATE TABLE mytext(id int not null, thetext varchar(4000))
GO

CREATE UNIQUE INDEX ui_id
ON mytext(id);
GO

CREATE FULLTEXT CATALOG mycatalog AS DEFAULT;
GO

CREATE FULLTEXT INDEX
ON mytext(thetext)
KEY INDEX ui_id;
GO

Both Oracle and SQL Server syntax support CONTAINS predicate, which specifies selection criteria. CONTAINS can be used inside a WHERE clause of a SELECT statement. Here is an example of a query with CONTAINS in Oracle and SQL Server - note the difference in syntax:

Oracle:

SELECT * FROM mytext WHERE CONTAINS(thetext, ‘Cats & dogs’) > 0;

SQL Server:

SELECT * FROM mytext WHERE CONTAINS(thetext, ‘Cats & dogs’);

CONTAINS predicate can combine several terms linked by boolean operators. The usage of NEAR, AND, &, OR and | operators is identical in Oracle and SQL Server. MINUS operator in Oracle can be replaced by &! operator in SQL Server. For example, the equivalent for ‘Animals - cats’ expression will be ‘Animals &! cats’. Also, the expression ‘Animals ; cats’ can be emulated by ‘Animals ~ cats’ in SQL Server. The wildcard % in Oracle search condition can be replaced by * character, for example, ‘Oracl%’ becomes “Oracl*” in SQL Server. Note that SQL Server does not support the Oracle syntax ‘%racle’.

Another interesting operator in full-text search is stem ($), which is used to search for terms that have the same linguistic root as the query term. The SQL Server counterpart for this operator is FORMSOF:

Oracle:

SELECT * FROM mytext WHERE CONTAINS(thetext, ‘$worlds’) > 0;

SQL Server:

SELECT * FROM mytext WHERE CONTAINS(thetext, ‘ FORMSOF (INFLECTIONAL, worlds) ‘);

In expressions that contain more than one query term, you can use the weight operator to adjust the relative scoring of the query terms in Oracle. The same functionality you can receive by means of SQL Server ISABOUT operator with WEIGHT keyword. For example:

Oracle:

SELECT * FROM mytext WHERE CONTAINS(thetext, ‘Oracle*3′) > 0;

SQL Server:

SELECT * FROM mytext WHERE CONTAINS(thetext, ‘ISABOUT (Oracle weight (0.3)) ‘);

In Oracle, the SCORE operator can be used in an ORDER BY or GROUP BY clause to return the score values produced by CONTAINS query and to order or group them according to the score. To emulate a SELECT statement with SCORE operator you should use CONTAINSTABLE function. CONTAINSTABLE returns a table with rank column showing how well the data match selection criteria. Note that CONTAINSTABLE function must be explicitly joined with the rows of the original SQL Server table.

Oracle:

SELECT * FROM mytext
WHERE CONTAINS(thetext, ‘Rain’, 1) > 0
ORDER BY SCORE(1) desc;

SQL Server:

SELECT t.*
FROM mytext AS t INNER JOIN
CONTAINSTABLE(mytext, thetext, ‘Rain’ ) AS KEY_TBL
ON t.ID = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK DESC;

Last but not least: a search for a specific phrase needs double quotation marks in SQL Server.

Oracle:

SELECT * FROM mytext WHERE CONTAINS(thetext, ‘The exact phrase’) > 0;

SQL Server:

SELECT * FROM mytext WHERE CONTAINS(thetext, ‘ “The exact phrase” ‘)

If you need more information on this topic, please contact newsletter@dbbest.com

Written by Galina Snurnikova


Tagged as , , , , , + Categorized as Newsletters, Newsletter #3, December 18, 2008

1 Comments

  1. What an awesome site

Leave a Reply

You must be logged in to post a comment.