% Forindex documentation
% Version 0.1
% January 2005
% Guido Milanese - guido.milanese@unicatt.it

\documentclass[a4paper,english]{article}
\usepackage[T1]{fontenc}
\usepackage[latin1]{inputenc}
\IfFileExists{url.sty}{\usepackage{url}}
                      {\newcommand{\url}{\texttt}}
\usepackage{babel}

\begin{document}

\title{The forindex utilities\\
Version 0.1}


\author{Guido Milanese\\
\url{guido.milanese@unicatt.it}}


\thispagestyle{empty}
\date{January 2005}
\maketitle
\begin{abstract}
\noindent{}Making a good index is a very important part in the process of
writing a document, particularly books and manuals. Entering data manually can
be a very long process; although a certain amount of data must be entered
manually, some tasks can be performed automatically, e.g. an index of
geographical names or other trivial tasks. This task can be achieved using
the program \textsf{doindex}, that prepares a file to be processed by
\textsf{makeindex}. Another useful feature is to remove all the
\texttt{\textbackslash{}index} entries in a \LaTeX{} file, obtaining a
clean file with no indexing (program \textsf{cleanindex}). The programs are
written in Snobol4; the only requirement is to install the interpreter. For
Windows, a standalone file compiled with Spitbol is also provided, and the
programs can be run without the need of an external interpreter.
\end{abstract}

\tableofcontents

\section{The programs}
\subsection{The program \textsf{doindex}}
The programs \textsf{doindex} reads a \LaTeX{} file, using a list file, and
enters index entries in the file according to this list. Previously entered
index entries are left unchanged, making it possible to add further indexing to
an already indexed file.

The input \LaTeX{} file may have extension \texttt{tex} or \texttt{latex},
both uppercase and lowercase (not mixed as in \texttt{Tex}).

The list file is meant to contain all the words to be indexed. It must have
exactly the extension \texttt{wls}. Sub-entries are identified with the separator
character '/'. See \textsf{test.wls} as example:


\begin{quote}
		  \begin{verbatim}
		  animals
		  dogs/animals
		  cats
		  house/nouns/english/languages
		  sleeping@sleep
		  évita/Italian/foreign words
		  ça/French/foreign words
		  drücken/German/foreign words
		  \end{verbatim}
\end{quote}

No particular order in this file is required. Some users will prefer
alphabetical order, others different orders, so the programs has no requirements
concerning order/sort in this file. Entries as \texttt{sleeping@sleep} use the
standard \textsf{makeindex} syntax and are left unchanged. The ``logic'' of this
syntax is opposite to the internal logic of \textsf{makeindex}, that is -- I think --
very clever at the stage of typesetting an index, but not at the stage of
designing an index. ``A dog is an animal'' (\texttt{dog/animal} in my syntax)
seems to me to be more natural than ``Among animals there are dogs''
(\texttt{animals!dogs} in the \textsf{makeindex} syntax). The \LaTeX{} file produced
by \textsf{doindex} follows, of course, the \textsf{makeindex} conventions.

The original \LaTeX{} file is left unchanged. A new file will be written,
identified by \textsf{-ind}. For example, from \textsf{file.tex} you will get
\textsf{file-ind.tex}. Of course, you'll have to run \textsf{makeindex} as usual.

The purpose of the program is similar to what is provided by the program
\mbox{\textsf{ixgen}} (\url{http://www.iit.upco.es/~oscar/ixgen/})
written by \textsc{Oscar Lopez} (\url{oscar@iit.upco.es}), but \textsf{forindex} was
designed to be a bit more flexible.

\subsection{The program \textsf{cleanindex}}

The program \textsf{cleanindex} removes \textsf{\textbackslash{}index}
sequences from a \LaTeX{} file. The program can be used e.g. if a user is not happy with
the indexing of a file and wants to start it over again. 

The input file may have extension \texttt{tex} or \texttt{latex},
both uppercase and lowercase.

The original file is left unchanged. A new file will be written, identified
by \textsf{-noind}. For example, from \textsf{file.tex} you will get
\textsf{file-noind.tex}. In this file, lines concerning \textsf{makeindex} will
be left but commented, in order to avoid an empty \emph{Contents} section in the
output. You can uncomment the lines as soon as you want to reindex the file
again.

\section{Installation}

\subsection{GNU/Linux and other {*}nix systems}

\begin{enumerate}
		  \item Install \textsf{snobol4} from \url{http://www.snobol4.org}. This is Philip Budne's
		  CSNOBOL implementation. You need a \texttt{c} compiler to compile the
		  interpreter; it's normally a very quick and easy process.
\item Make sure \textsf{snobol4} is in your PATH or make a symbolic link.
\item Copy all the files from the source directory in a suitable directory (you
		  do not need the \texttt{bat} files, provided for Windows, and can safely remove
		  them).
\item Make executables the scripts (\textsf{doindex} and \textsf{cleanindex}
with no extensions), e.g. \texttt{chmod +x doindex}
\item Run the scripts as follows:

\begin{enumerate}
\item -- to index a text: \texttt{./doindex file.tex}
\item If you want to exclude words with accents: \texttt{./doindex file.tex
-{}-noacc}
\item -- to remove \textsf{\textbackslash{}index }sequences: \texttt{./cleanindex
file.tex}
\item If the current directory is in your PATH, you do not need \texttt{./}
before the script name.\\

\end{enumerate}
\end{enumerate}

\subsection{Windows}

The package offers exe files compiled with Spitbol (see
(\url{http://www.snobol4.com}). Make a directory and copy all the file in the
bin/windows directory. There must be two \texttt{{*}.exe} files and the two
\texttt{test.{*}} files.

Run the programs as follows:

-- to index a text: \texttt{doindex file.tex}

If you want to exclude words with accents: \texttt{doindex file.tex -noacc}.
Accents must be encoded using the \textsf{latin1} encoding (see the list of
Todo).

-- to remove \texttt{\textbackslash{}index} sequences: \texttt{cleanindex
file.tex}

\subsection{Windows from source}

Basically, follow the same directions given about GNU/Linux, but make sure to
use the \texttt{bat} files and to install the Windows version of the interpreter.
Before using the sources, that are in Unix format, use a script to translate
from Unix to Dos-Windows format. If you do not have such a script, open the
files with a text editor and save the sources in Windows-Dos format. This can be
done reading and saving each file with the DOS \textsf{edit} program, with
\textsf{vim} or any other editor able to deal with different file formats. Do
not alter the files if you are not sure of what you are doing. Please (1) do not
use a word processor (as Word or similar) but a simple text editor and (2) make
sure to leave the encoding of file \textsf{acc.inc} to \textsf{ISO-8859-1} or
\textsf{8859-15}, not to plain DOS or Unicode.

\subsection{Cygwin}

I suggest to follow the same directions given for GNU/Linux, but the
EXE files provided for native Windows can be used anyway if preferred. 

\subsection{Macintosh}
Not yet tested (I do not have a Mac right now). It's in the TODO list.

\section{Test files}
Please test the program on \textsf{test.tex} and \textsf{test.wls}.
The produced file will be called \textsf{test-ind.tex} if you use
\textsf{doindex}, \textsf{test-noind.tex} if you use \textsf{cleanindex}.

\section{Bugs and TODO}
The program does not support Unicode files. At this moment, most
\LaTeX{} users are still using latin1, but the situation is rapidly changing.

List of features that I would like to add:

\begin{enumerate}
\item Index also included files.
\item Add typographical styles, such as italics for the most important
locations of a word.
\item Add support for several indexes (particularly with class \textsf{memoir})
\item Add an option to generate a rough index for all the words.
\item Add a support to index words listed with regular expressions.
E.g. \texttt{read{*}} should index \emph{read}, \emph{reads},
\emph{reading}, \emph{readings},
all under the same heading \emph{read}.
\item Make possible to use another separator for the list file, e.g. a simple
blank or other char preferred by user. 
\item Test the programs on a Mac.
\item Add Unicode support.
\end{enumerate}

\section{Acknowledgements}
The program \textsf{ixgen} gave me the idea of \textsf{forindex}. Many thanks to
\textsc{Oscar Lopez} for this very good program.

Some questions sent by \textsc{Carlo Pellegrino} (Modena University, Italy) gave
me the idea of transforming a very rudimentary script into a general purpose
utility. \textsc{Maurizio Loreti} (Padua University, Italy) sent me very useful
remarks on the problems of automatical generations of indexes, which I made use
of in the introduction to this text.

My warmest thanks to \textsc{Phil Budne} (\url{phil@ultimate.com}) for making
his excellent CSNOBOL available. Many thanks to the community of Snobol users,
particularly to the members of the list \url{snobol4@mercury.dsu.edu}, and,
among them, to \textsc{Gordon Peterson} (\url{http://personal.terabites.com/}),
\textsc{Michael Radow} (\url{mikeradow@yahoo.com}), \textsc{Gregory L. White}
(\url{glwhite@netconnect.com.au}) and to \textsc{Rafal M. Sulejman}
(\url{rafal@engelsinfo.de}) whose \textsf{vim} syntax files are a daily
blessing.

Thanks to Jim Hefferon <\url{ftpmaint@alan.smcvt.edu}> who pointed out that the
original name of the package,\textsf{ 4index}, was not acceptable due to XML syntax
rules.

\section{Author, copyright, license, disclaimer}

This program is Copyright \copyright{} 2005\\
Guido Milanese <\url{guido.milanese@unicatt.it}>\\
under the terms of the GNU General Public License.\\

\footnotesize{}
		  \begin{verbatim}
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

If you do not have a copy of the GNU General Public License write to the Free
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

If the author of this software was too lazy to include the full GPL text along
with the code, you can find it at: http://www.gnu.org/copyleft/gpl.html.
\end{verbatim}
\normalsize{}
\end{document}