CrankBerry Blog Title
2011


(9) Comments

Intalling Sphinx on Wamp localhost (Windows)

I recall the first time trying to setup Sphinx search on my Windows machine with Wamp it wasn't so obvious of how it was to be setup. Here's a guide in how to get it setup on you local machine. If you want to install Sphinx on Linux / Plesk there's an article for that too.

Here is my environment:

  • Windows Vista
  • Wamp 2.0
  • Apache 2.2.11
  • MySQL 5.1.36
  • PHP 5.3.0
  • Sphinx 1.10

First of all download Sphinx for Windows which can be found (http://sphinxsearch.com/downloads/beta/). I downloaded Win32 binaries w/MySQL support version because I only work with MySQL. Once you got this you're ready to begin.

How Does Sphinx Works

In brief this is how Sphinx works. Sphinx creates a separate index of the table you want to search. Your PHP scripts calls the Sphinx API to search connecting to a specific port where Sphinx is listening to. Sphinx then searches its own index and does the search returning the result. Yes, the index has to be updated every time your update your original table. On my live environment I have a cron job setup to reindex with Sphinx every so often.

Step 1 Extract and Prepare Conf File

Extract the files into any folder you want, I've selected c:\sphinx\. The next thing you need to do is create a document in c:\sphinx\bin called sphinx.conf. This is the most important part as Sphinx actually acts based on the settings and parameters defined in this single document. Once you have created sphinx.conf in the bin folder copy the content from c:\sphinx\sphinx-min.conf.in (this is their provided sample file). It looks like:

#
# Minimal Sphinx configuration sample (clean, simple, functional)
#

source src1
{
	type			= mysql

	sql_host		= localhost
	sql_user		= test
	sql_pass		=
	sql_db			= test
	sql_port		= 3306	# optional, default is 3306

	sql_query		= \
		SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS
		date_added, title, content \
		FROM table

	sql_attr_uint		= group_id
	sql_attr_timestamp	= date_added

	sql_query_info		= SELECT * FROM table WHERE id=$id
}

index test1
{
	source			= src1
	path			= @CONFDIR@/data/test1
	docinfo			= extern
	charset_type		= sbcs
}

index testrt
{
	type			= rt
	rt_mem_limit		= 32M

	path			= @CONFDIR@/data/testrt
	charset_type		= utf-8

	rt_field		= title
	rt_field		= content
	rt_attr_uint		= gid
}

indexer
{
	mem_limit		= 32M
}

searchd
{
	listen			= 9312
	listen			= 9306:mysql41
	log			= @CONFDIR@/log/searchd.log
	query_log		= @CONFDIR@/log/query.log
	read_timeout		= 5
	max_children		= 30
	pid_file		= @CONFDIR@/log/searchd.pid
	max_matches		= 1000
	seamless_rotate		= 1
	preopen_indexes		= 0
	unlink_old		= 1
	workers			= threads # for RT to work
}

These settings should cover most of what you're looking for in terms of setting up Sphinx. I'll try to run through everything as much as possible. There are more parameters and customization you can do by reading their documentation here.

source src1

This section defines your SQL settings and what is needed to be index. So the first few lines sql_host, sql_user, sql_pass, sql_db is exactly your local host MySQL settings. The sql_port is defaulted to 3306. I haven't changed this setting for Wamp so that's where my port sits.

The sql_query is actually crucial. This is the query that you want Sphinx to do and build an index on. Its just like an SQL query you would do. Say my table people had columns: id, name, address, city, age, and joined (stored as time stamp) as follow:

id name address city age joined
50 john doe 546 random st. Vancouver 42 124568794562
51 amanda smith

3425 great ave

Vancouver 18 124568923423
52 justin heiber 352 awesome dr

Burnaby

56 124569000231
53 macy store 355 loser pl Richmond 26 124569000352

I want to be able to search for the people's name and address only so thats all I care about, but I have to include the id and the timestamp. You can also defined unsigned integer attributes. So my sql_query will be:

sql_query		= \
		SELECT id, id AS id_attr, name, address, joined  \
		FROM people

sql_attr_uint		= id_attr
sql_attr_timestamp	= joined

sql_query_info		= SELECT * FROM people WHERE id=$id

I've defined the sql_attr_uint, sql_attr_timestamp, and sql_query_info. The sql_attr_uint is the unsigned integer attribute and it can't be the first column which is to define your id. The sql_attr_timestamp is the UNIX timestamp in your column. You can convert if you want but I usually store my dates in timestamp anyways. The sql_query_info is for testing purpose only. It is the query that retrieves the actual row after searching. You have to do this yourself in your PHP Script.

index test1

You can change the name of test1 to anything you want but for the sake of it I'll just leave it. The source is just the name of the section we did above. If you changed the name then you have to change it here. The path is where you want to store the indexed file on your site. So I'll store it in the site directory in a folder called sphinx: C:\Wamp\www\site\sphinx\test1. The remainder I left the same. I've taken out the real time indexing because I'll be doing my own thing and I heard there were lots of limitations to it at this point, so its not my thing. If it floats your boat you can do some more research and look into it. Here's my index:

index test1
{
	source			= src1
	path			= C:\Wamp\www\site\sphinx\test1
	docinfo			= extern
	charset_type		= sbcs
}

indexer

This section defines the indexer. My database is big so I increased the mem_limit to 64M. Don't just go for a high or low number. Too low can hurt your searching and too high can hurt your server. Find the balance based on the size of your database.

searchd

This part defined the searchd which is the search application from Sphinx. Besides the log paths I didn't change anything else. The listen port is where you want searchd to listen in on for search queries. I left it at 9312 which is where I'll point my PHP script to later on.

Here is my sphinx.conf file:

#
# Minimal Sphinx configuration sample (clean, simple, functional)
#

source src1
{
	type			= mysql

	sql_host		= localhost
	sql_user		= test
	sql_pass		=
	sql_db			= test
	sql_port		= 3306	# optional, default is 3306

	sql_query		= \
		SELECT id, id AS id_attr, name, address, joined  \
		FROM people

	sql_attr_uint		= id_attr
	sql_attr_timestamp	= joined

	sql_query_info		= SELECT * FROM people WHERE id=$id
}

index test1
{
	source			= src1
	path			= C:\Wamp\www\s\sphinx\test1
	docinfo			= extern
	charset_type		= sbcs
}

indexer
{
	mem_limit		= 64M
}

searchd
{
	listen			= 9312
	listen			= 9306:mysql41
	log			= C:\Wamp\www\s\sphinx\searchd.log
	query_log		= C:\Wamp\www\s\sphinx\query.log
	read_timeout		= 5
	max_children		= 30
	pid_file		= C:\Wamp\www\s\sphinx\searchd.pid
	max_matches		= 1000
	seamless_rotate		= 1
	preopen_indexes		= 0
	unlink_old		= 1
}

Getting Sphinx Ready

Now that you have prepared the Configuration file which is the hardest part of this then you're ready to begin. The first part you want to do is prepare the Sphinx index. Open up your command promt (Start > Search [cmd] > CMD.

In command promt navigate to your Sphinx bin folder by typing in cd c:\sphinx\bin.

Once there launched the indexer to build the index by typing in indexer.exe –config c:\sphinx\bin\sphinx.conf test1. Replace test1 if you changed the name of your index. If you don't get any errors and see some numbers run then you're fine. If you get a "failed to write" message it's because you don't have admin access to your machine.

Once the index is build you want to install Sphinx on your Windows machine as a service by typing searchd.exe –install –config c:\sphinx\bin\sphinx.conf –servicename Sphinx. It'll prompt you that the installation was successful.

Now you have to start the service you just installed. Go to control panel > administrative tools > services. Find Sphinx in the list. Click on it and on the left side you should see Start (on my machine that's where it is). Click on it and it should start.

Testing Sphinx Out

Go back to command prompt and test it out using search. So say I want to search my table for john doe I'll type search sphinx.conf "john doe". That should return something for me. I can define the matching modes and which column to search as well such as search –ext2 –config sphinx.conf "@name john @address awesome". If you are getting the results you want then you are ready to move forward into your PHP script.

Sphinx and Your PHP Script

You need the Sphinx API in your script for it to work so go to c:\sphinx\api and copy sphinxapi.php to your site directory. Include this file in your script and call on the class:

require_once('sphinxapi.php');

//Sphinx
$s = new SphinxClient;
$s->setServer("localhost", 9312);
$s->setMatchMode(SPH_MATCH_EXTENDED2);

Noticed I have setServer pointing to the port I told Sphinx to listen to. The setMatchMode is just the matching mode you'll like to use. In my case I'm using extended 2.

Now you can query search your Sphinx index with:

//Search Query
$result = $s->query("@name $searchName @address $searchAddress");

The $result returned will not include the rows and columns of your actual table. It'll return the follow parameters (taken from php.net):

Key Value description
"matches" An array with found document IDs as keys and their weight and attributes values as values
"total" Total number of matches found and retrieved (depends on your settings)
"total_found" Total number of found documents matching the query
"words" An array with words (case-folded and stemmed) as keys and per-word statistics as values
"error" Query error message reported by searchd
"warning" Query warning reported by searchd

This means that $result['total'] will tell me how many results were found and $result['matches'] is where the id of the search results are stored. So here's a simple line to return all of the results:

if ($result['total'] > 0) {
        foreach ($result['matches'] as $id => $otherStuff) {
                //Get Column
                $searchColumn = mysql_fetch_array( mysql_query("SELECT * FROM people WHERE id=$id") );
               
                //Dump
                var_dump($searchColumn);       
        }
} else {
        echo 'No results found';       
}

Notice that the mysql_query is the same as sql_query_info in sphinx.conf. What a coincidence eh? LOL. That is all and now you have a working Sphinx search system setup on your local machine. If you need to reindex your Sphinx you can do so without restarting the service. In command prompt just run indexer.exe –config c:\sphinx\bin\sphinx.conf –rotate test1.

TL
This entry was posted on Sunday, April 3rd, 2011 at 8:42 pm and is filed under MySQL, PHP, Web Hosting. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.
BL

9 Responses to “Intalling Sphinx on Wamp localhost (Windows)”

  1. tresloukadu tresloukadu says:

    Wow, thanks for your information, in a way seems more logical, i will keep your approach. I also saw other friend doing the way you do, I asked him why that, and he didnt know how to explain to me.. but you did, thanks for this explanation, now I can note the advantages.

  2. Jerry Jerry Low says:

    Tresloukadu,

    It may make more sense that way. Here's a good way to look at it, Sphinx does the hardest part efficiently and accurately for you which is the matching part with out you worrying about the Like, Match or what-so-ever.

    By returning just the ID there are a few advantages.

    1. Querying by ID is quick and painless because IDs are indexed and unique.
    2. By just having the ID you can customize the columns you want to query.

    If you had a table with 20 columns (which is not the best practice) sphinx would return all 20 columns, but maybe in your search you only need 5 columns, until the drill down page. You can completely customize it.

    3. Making additional joins or unions easier.

    I had the same though you did, but after using the system a bit it made sense.

    Jerry

  3. tresloukadu tresloukadu says:

    hello, I got the results as you mentioned, it returned jus the id's. But I didnt understand this part:

    "The $result returned will not include the rows and columns of your actual table. "

    So you get the id's and after you do a usual query to mysql… Why that? What is the advantages of this approach? Wouldnt be faster return the wole data from sphinx reather than just the id? Im just wondering why…

  4. tresloukadu tresloukadu says:

    Just for people who are gettin the php api running just the first time and after that gettin the error below:

    "failed to send client protocol version".

    I found the solution for that in sphinx forum here:

    http://sphinxsearch.com/forum/view.html?id=4919

    I just changed the line in php script and sphinx started to return my data:

    The file sphinxapi.php
    replaced by line
    if ( !$this->_Send ( $fp, pack ( "N", 1 ), 4 ) )
    to line
    if (fwrite ( $fp, pack ( "N", 1 ), 4 ) !== 4)

    Some people said it was a problem in the version 5.2 of php but it is not. I just don't know why this issue, because I downloaded the last version from sphinx website. But but is working now.

  5. tresloukadu tresloukadu says:

    @candy man where is your require to api file?

    I got another problem, everything seems working, but the query is returnin false.

  6. Jerry Jerry Low says:

    Candy, did you solve the problem yet? It seems that PHP hasn't defined $searchName yet, did you set $searchName = 'John'm prior to $result = [...]?

  7. Jerry Jerry Low says:

    No problem tresloukadu, I've had the same problem getting started too and am glad I can help.

  8. tresloukadu tresloukadu says:

    Man, thanks for this great tutorial, i was completely lost, because the tutorial on sphinx page is very superficial. I Followed your tutorial and got just a minor error because the old libmysql.dll in the bin directory but i just substituted it with my MySQL5 dll inside system32 and it worked great now i will test sphinx with PHP lets see.

  9. candy candy says:

    Hi, I am stuck at the section "Sphinx and Your PHP Script". After copying sphinxapi.php to my site directory. I save the below php scripts in a file test.php
    setServer("localhost", 9312);
    $s->setMatchMode(SPH_MATCH_EXTENDED2);

    //Search Query
    $result = $s->query("@name $searchName @address $searchAddress");
    //$result = $s->query("@name joe @address awesome");

    if ($result['total'] > 0) {
    foreach ($result['matches'] as $id => $otherStuff) {
    //Get Column
    $searchColumn = mysql_fetch_array( mysql_query("SELECT * FROM people WHERE id=$id") );

    //Dump
    var_dump($searchColumn);
    }
    } else {
    echo 'No results found';
    }
    ?>

    however, it return an error "Notice: Undefined variable: searchName in C:\wamp\www\sphinx_test.php on line 10" am i on the right track?

    using the cmd line, i am able to yield
    words:
    1. 'john': 1 documents, 1 hits
    2. 'awesome': 1 documents, 1 hits

Leave a Reply

Spam protection by WP Captcha-Free