PhpDig is a light indexing robot/search engine written in Php
It provides full text indexing.
This program is provided under the GNU/GPL license.
See LICENSE file for more informations

CHANGELOG
---------

Note of version numbering :
M.m.n[p]
M : Major version number. Will mean major changes in code,
    logic and features.
m : Minor version number. Means important new features, ehance of
    existing ones, and bugfixes.
n : Sub-minor number. Means some new minor features and/or bugfixes
p : Patch letter (b,c,d,...). Means fix of serious bugs without
    any other changes


Version 1.6.2
--------------
2003-04-06

Add support of others charsets than 8859-1, encoding 8859-2 added (Jan Kincl).
PhpDig handles meta http-equiv cookie.
Function phpdigTestUrl fixed.
Css classes for classic mode fixed.
Bug on noindex and nofollow fixed (Michael Chapman).
Small API doc added.
Error on database creation script on some versions of MySql fixed.

Version 1.6.1
--------------
2003-03-15

Experimental handle of cookies added
Experimental removing of Session ids
Better handling of javascript window.open
Handle default indexes as option
Considers '+' as possible character in Urls
Add average search time in logs
All MySql connection parameters are now constants
Update in install script fixed


Version 1.6
--------------
2003-03-09

PhpDig could now index PDF, MS-Word and MS-Excel files using external binaries.
Locking system : An host is locked from concurrent indexings.
Localization of all remaining hard-coded messages complete (Eric Chauvin).
Optimized queries and template parsing.
Admin interface and template "PhpDig" xhtml compliancy added (Eric Chauvin).
Install web interface could update exising databases.
Parts of html pages could be excluded from indexing with special formatted comments.
Handling of mysql connections improved.
Statistics on searchs are collected to know what the visitors want first in the website.
New ranking system added, lowering ranking of pages with a lot of same words.
More explanations of how phpdig works added in documentation.


Version 1.4.8
--------------
2003-03-01

Text snipets now match search mode (start/any/exact).
Results extracts are more customizable.
spider can read a file containing urls' list to explore.
Delete more than one host at once from index is possible.
New design for admin interface.
Resume and force indexing fixed.
Templates parsing fixed.
Cleanup scripts fixed.

Version 1.4.7
--------------
2003-02-26

MySql tables can be prefixed by an user-defined string.
Spidering an entire domain is now possible.
Better handling of redirections.
Doc spelling corrections (John Zastrow)
Updated german locale file (Matthias Strohmaier)
New Norwegian locale file (Martin Kristiansen)
New Czech locale file (Dan Barta)
Remaining E_ALL errors fixed (i tried to hunt all of them...)


Version 1.4.6
--------------
2003-02-22

PhpDig works with register_globals = off and/or Error_reporting = E_ALL
Restore starting indexing by other path than /
Using only <?php ?> tags now
An option makes search function returning an array
All functions renamed and prefixed by "phpdig"
Using two specific CSS classes for results links and highlighting
Some code improvement where made
If an error message occurs while indexing, please download the


Version 1.4.5c
--------------
2003-02-18

Patch to correct content retrieval due to php bug.
See Bug #22008 for more explanations.


Version 1.4.5b
--------------
2003-02-17

Broken indexation of hosts bound to another port than 80 repaired.


Version 1.4.5
--------------
2003-02-16

Note : Upgrade of database is needed, use the update_db_to_1_4_5.sql file.
Search is now a function, making integration easier. (template could be only a part of a page.)
Highlight fixed.
Using a CSS instead "style.php" file.
Configuration directives are now constants, except for arrays.
Exclude a path at robot side is possible now.


Version 1.4.4c
--------------
2003-02-09

PhpDig works with PHP 4.3.0 (still register_globals=on).
Spidering whith shell command (php-cli) fixed.
Templates fixed.


1.4.4b
--------------
2001-12-03
Fixed doubles inserted in the sites table.


Version 1.4.4
--------------
2001-12-02

PhpDig can now spider a site binded to another port than 80.
PhpDig can also spider a password protected site (please read the documentation warning).
Ehanced directory view in admin mode.
Islandic (!) special characters are now supported.
Working on a E_ALL error_reporting level fixed.
Bad Last-Modified HTTP header parsing fixed.


Version 1.4.3
--------------
2001-11-27

Improved templates system
Field added in keywords table optimize search queries
Some queries causing error fixed
Code part causing php core dump fixed
Not updated textual content fixed
Update of branch/files fixed


Version 1.4.2
--------------
2001-11-24

Complete english documentation added.
Best robots.txt file parsing : The wildcard * is now supported, and files can be specified (with complete path).
The special character "" is included in indexing, some german words were not reconized. Thanks Christof Fritz for bug report.


Version 1.4.1
--------------
2001-11-11

Complete french documentation added (Need help on english translation)
Simple http authentification added
A bug in relative links parsing fixed.
A bug in the test_url() function fixed.
Thanks to Florian Perrichot for the bug report


Version 1.4
--------------
2001-11-06

Both spidering and indexing are proceeded in the same time.
Much less charge on indexed servers with a cache system.
The results page show now extracts of the doccuments with the search keys occurences.
The admin, libs and configuration scripts are now in
separate directories, allowing protect it by some .htaccess files.
The results page is highly customizable by a simple template system (samples provided).
Ehanced CGI mode for total automatic updates with a cron task.
Great thanks to Florian Perrichot for cache and templates system.
Portugese locale file provided by Carlos Serro.


Version 1.0.4
--------------
2001-06-04

Bug which causes PhpDig send an http request on each link it finds in pages
regardless it already make it fixed.


Version 1.0.3
--------------
2001-05-28

Italian locale file provided by Mirko Maischberger.


Version 1.0.2
--------------
2001-05-27

Http and cgi versions of indexing merged.
Lot of more comments in source code.


Version 1.0.1
--------------
2001-05-22

Missing field fixed in init_db.sql.
Excluding words in search queries fixed.
Quotes and double quotes in search form fixed.


Version 1.0
--------------
2001-05-19

Spanish locale file provided by Geffrey Velsquez.
Bug fixed in parsing of "alt" attributes in img tags.
"description" metatag is included in search results page.


Version 0.99
--------------
2001-05-14

Fixed bug which inserts doubles in database.
Fixed bugged queries in update_cgi script.
Fixed bug which cause phpdig fails in detect description and keywords metatatags.
Fixed bug in html entities parsing.
Fixed bug in reconizing some words in html_to_plain_text() function.
Last-modified header is supported now. Don't forget to update your database with the update_db_0_99.sql script !
Metatag 'Revisit after' is supported now.
Sub-directories in robots.txt file are reconized.
Delete an entire site from database is supported now.


Version 0.98b
--------------
2001-05-10

German locale file provided by Gregor Mucha.
German stop-words added by the same person.
External domains names in Hrefs are indexed (i.e. www.gnu.org) an can be retrieved by search queries.
Some classic files added : COPYING, README and LICENSE.


Version 0.97b
--------------
2001-05-08

robots.txt file and META ROBOTS are reconized. See The Web Robots Page to obtain more informations.
Increase speed in indexing text files.
Files without extension are indexed now.
Indexes and primary key in the database are a bit different. Check the init_db.sql file to see changes.


Version 0.96b
--------------
2001-05-06

Some files corrected by Brien Louque : documentation_en.html, search.php, en-language.php
Greek locale file provided by Sofoklis Magoulas.
An auto-update script was added. You must have access to the crontab and to an executable cgi of php in order to use it.
Expire time for pages are used by indexing scripts.


Version 0.95b
--------------
2001-05-05

PhpDig is now avaible in both english and french.
Localized search forms are provided with archive.


Version 0.93b
--------------
2001-05-03

English doc was added to the archive.
I changed the search algorithm. Less SQL, more php.
Localization in some languages in progress.
You can now exclude search keys.
The occurence is based on a product, not more on a sum.
Search form and results page are provided in english.


Version 0.92b
--------------
2001-05-02

Results page now keeps filters.
news: links are not more followed.
Some SQL queries are optimized.
SQL_BIG_SELECT is set to 1 for search queries.
No more IE user_agent string send ;-).


Version 0.91b
--------------
2001-05-01

Long texts bug which freezes PhpDig is fixed.


Version 0.9b

2001-04-30
--------------
Initial release