xinabse_
prefix and should not interfere with your
existing tables
have a look at the spider/xinabse.conf file and change the values to your
needs. All options are documented in the file.
change the values at the top of frontend/xinabse.php to match your database
upload/copy all files in frontend/ to somewhere beyond your webserver root
When using the spider be sure to be inside the xinabse/spider/ directory or it will not find it's configfiles.
For getting a short overview on the available options of the spider run
./xinabse-spider.pl --help
All options may be abbreviated. (--he is the same as --help)
To add a site just run xinabse-spider.pl with the --add
option and give
the sites URL as parameter. When giving a toplevel adress as start URL be sure
to add a trailing slash.
The spider will immediatly start to spider the given URL using the default
recursion depht given in the config file. Alternatively you can give the level
for that URL with the --level
option.
Examples: add splitbrain.org with the default depth ./xinabse-spider --add 'http://www.splitbrain.org/'
add splitbrain.org with a spider depth of 2 ./xinabse-spider --level 2 --add 'http://www.splitbrain.org/'
To revisit all sites that are older than the configured time given in the
configuration file. Just run xinabse-spider.pl without any argument. Optional
you can specify the timespan to use on the command line with the --reindex-after
option (in hours).
You can delete a site and all it's subpages by using the --delete
option:
Example:
delete splitbrain.org from the database
./xinabse-spider --delete 'http://www.splitbrain.org/'
You may even clear the complete database by using the --delete-all
option.
You will be asked to confirm this by entering yes.
Example:
./xinabse-spider --delete-all
You can use the included xinabse.php
frontend to query the search index.
It uses a template to customize the page design. Please note that you have
much more freedom when using the API as described in the next chapter.
When designing Templates for xinabse you have to deal with some placeholders which will be replaced by the values your search returns.
There are two kinds of placeholders in xinabse templates: blocks and tags. All placeholders are defined by curly brackets.
You should have a look at the default template xinabse.tpl for example usage.
Let's have a look at the blocks first. Blocks are used to define areas in your template. They start with a C<BEGIN> and end with an C<END> tag, everything between these tags is the content of this block. Each block may only occure once in your template! The following blocks are available:
=over 2
{BEGIN RESULT}
{END RESULT}
{BEGIN RESULTROW}
{END RESULTROW}
RESULT
block!
{BEGIN NORESULT}
{END NORESULT}
Please note that all these blocks are removed from the output when no query was given.
Tags are simple placeholders which are replaced by their real values on parsing the template. Except the result tags below they may be placed everywhere inside your template. You may use each tag as often you want.
{QUERY}
{QQUERY}
INPUT
field.
{HITCOUNT}
{START}
{LIMIT}
{PREV <text>}
text
surrounded by a link which links to the previous
search results. When there are no previous results it is replaced by a
. You may even give some HTML tags as text
like an IMG
tag.
{NEXT <text>}
The following tags may only be used inside the RESULTROW
block. The will
be simply replaced by the values of the actual result.
{NUMBER}
{TITLE}
{URL}
{EXTRACT}
{LASTMODIFIED}
{WEIGHT}
{RELEVANCE}
{DOMAIN}
{PATH}
{PAGE}
When writing a template for xinabse you can use the following parameter names
for your FORM
tag.
tpl
$tpl
variable at top of the
xinabse.php file. This parameter defaults to xinabse.tpl
.
start
limit
q
This software is GPL - see License!
If you want to adjust the spider to your needs have a look at the perldoc comments in the common.pl file. All functionality is in this file.
If you don't want to use xinabse's template engine you can include the
xinabse-api.php file into your PHP code and use the xinabse
function
to commit searchqueries.
Here's a short version of it:
function xinabse($query,&$resultarray,$all=false,$offset=0,$limit=20,$domain="") global $XINABSE_SERVER; global $XINABSE_DATABASE; global $XINABSE_USER; global $XINABSE_PASSWD; //query database //place mysql-result into $resultarray //return count of all results }
As you can see it uses some globals to connect the database and stores the
results into a given array (it hast to be given as reference). Only the
results from line offset
to offset+limit
are placed in the array but it
returns the number of all results.
The resultarray contains objects with the following properties: title
,
lastmodified
, url
, extract
, weight
, domain
, path
, page
,
number
and relevance
. Having a look at the placeholders descriptions
in the template section of this document should give you a hint what they
contain.
Here is a list of the accepted parameters and their meaning.
$query
$resultarray
$all
$offset
offset
are returned. This is used for pagination of search results.
$limit
$domain
Due to some limitations of the MySQL database engine there is a slim chance for sites being inserted more than once when indexing with more than one process. However these duplicates should be detected and removed by xinabse' cleanup function which is run after each run.
Furthermore there may occure some ``lost database connection'' errors. I'm not sure what causes them. Indexing with fewer processes may help.
Only the latin1 (ISO-8859-1) charset is supported by xinabse. While Perl 5.8 supports Unicode natively there is currently no such support in MySQL. To make xinabse compatible with older Perl versions and the actual MySQL releases all characters that differ from latin1 alphanumeric characters are replaced with spaces. However a skilled programmer should find it easy to adjust xinabse to work with any other ISO-8859 charset.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. See COPYING for details