JSSINDEX: The JavaScript Search Engine

Yann LeCun, 03/2004.


Home Page |  SourceForge Summary |  Download |  Forums |  News | 

JSS is a simple search engine designed for CDROM or Web-based document collections. The documents to be indexed can be in HTML, PostScript (.ps and .ps.gz), PDF, and DjVu. The main feature of JSS is that the query engine and the index are entirely in JavaScript, and therefore require no other software than a JavaScript-enabled Web browser.

What is the advantage? If you are distributing a collection of document on CD-ROM, you can provide platform-independent full-text search without asking your users to install any software on their machine. If you publish a collection of documents on the web, you don't need to install any server-side scripts: search queries run entirely in the user's web browser.

INSTALLATION

jssindex was tested on GNU/Linux. The indexer should run wherever Lush and DjVuLibre run (this includes Solaris, Irix, and Windows under Cygwin). The query engine produced by jssindex runs on any Javascript-enabled Web browser.

To install:

jssindex uses two other programs: ps2ascii and zcat. make sure you have those in your shell path if you want jssindex to index documents in postscript (.ps), PDF (.pdf), and gzipped postscript (.ps.gz). ps2ascii is part of the GhostScript package (also known as gs), and zcat is part of gzip. Both packages are installed by default in most Linux distros.

USAGE

Call jssindex with no argument to get the full documentation. Here is a simple example of usage: let's assume that the content of your Web site is in the directory web-root, and that the collection of documents you want to index are under web-root/mydocs. Type the following at the shell prompt:
  % cd web-root
  % jssindex mydocs
Now point your Web browser to web-root/jss-index.html. If you want the JSS files neatly put in their own directory, do:
  % mkdir jss
  % cd jss
  % jssindex ../mydocs
Now point your Web browser to web-root/jss/jss-index.html. That's it.

CREDITS

jssindex was written by Yann LeCun and Florin Nicsa, with contributions from Leon Bottou. Lush was created by Leon Bottou and Yann LeCun. DjVu was created by Leon Bottou, Yann LeCun, Patrick Haffner, and a large cast of characters.

CONTACTS

For Yann's contact info, visit http://yann.lecun.com

WHAT ARE LUSH AND DJVU?

Lush is an interpreter/compiler for a dialect of Lisp. More info can be obtained at http://lush.sf.net. jssindex is written in Lush. Lush is required for running jssindex, but not required by end-users to perform search queries (that part merely requires a Web browser).

DjVu is a file format and compression technology for documents (particularly scanned documents) and images (particularly high-resolution ones). DjVuLibre is an open source implementation of DjVu that includes viewers, decoders, utilities, and simple encoders. The simplest way to produce DjVu documents from originals (scanned or digitally produced) in TIFF, PS, PDF, or other formats is to use one of the free on-line conversion servers, for examples: http://any2djvu.djvuzone.org (single document conversion with OCR while-U-wait).

LICENSE AND LEGAL STUFF

jssindex is Copyright (C) 2003 Yann LeCun and is distributed under the GNU General Public License.