% Forindex documentation % Version 0.1 % January 2005 % Guido Milanese - guido.milanese@unicatt.it \documentclass[a4paper,english]{article} \usepackage[T1]{fontenc} \usepackage[latin1]{inputenc} \IfFileExists{url.sty}{\usepackage{url}} {\newcommand{\url}{\texttt}} \usepackage{babel} \begin{document} \title{The forindex utilities\\ Version 0.1} \author{Guido Milanese\\ \url{guido.milanese@unicatt.it}} \thispagestyle{empty} \date{January 2005} \maketitle \begin{abstract} \noindent{}Making a good index is a very important part in the process of writing a document, particularly books and manuals. Entering data manually can be a very long process; although a certain amount of data must be entered manually, some tasks can be performed automatically, e.g. an index of geographical names or other trivial tasks. This task can be achieved using the program \textsf{doindex}, that prepares a file to be processed by \textsf{makeindex}. Another useful feature is to remove all the \texttt{\textbackslash{}index} entries in a \LaTeX{} file, obtaining a clean file with no indexing (program \textsf{cleanindex}). The programs are written in Snobol4; the only requirement is to install the interpreter. For Windows, a standalone file compiled with Spitbol is also provided, and the programs can be run without the need of an external interpreter. \end{abstract} \tableofcontents \section{The programs} \subsection{The program \textsf{doindex}} The programs \textsf{doindex} reads a \LaTeX{} file, using a list file, and enters index entries in the file according to this list. Previously entered index entries are left unchanged, making it possible to add further indexing to an already indexed file. The input \LaTeX{} file may have extension \texttt{tex} or \texttt{latex}, both uppercase and lowercase (not mixed as in \texttt{Tex}). The list file is meant to contain all the words to be indexed. It must have exactly the extension \texttt{wls}. Sub-entries are identified with the separator character '/'. See \textsf{test.wls} as example: \begin{quote} \begin{verbatim} animals dogs/animals cats house/nouns/english/languages sleeping@sleep évita/Italian/foreign words ça/French/foreign words drücken/German/foreign words \end{verbatim} \end{quote} No particular order in this file is required. Some users will prefer alphabetical order, others different orders, so the programs has no requirements concerning order/sort in this file. Entries as \texttt{sleeping@sleep} use the standard \textsf{makeindex} syntax and are left unchanged. The ``logic'' of this syntax is opposite to the internal logic of \textsf{makeindex}, that is -- I think -- very clever at the stage of typesetting an index, but not at the stage of designing an index. ``A dog is an animal'' (\texttt{dog/animal} in my syntax) seems to me to be more natural than ``Among animals there are dogs'' (\texttt{animals!dogs} in the \textsf{makeindex} syntax). The \LaTeX{} file produced by \textsf{doindex} follows, of course, the \textsf{makeindex} conventions. The original \LaTeX{} file is left unchanged. A new file will be written, identified by \textsf{-ind}. For example, from \textsf{file.tex} you will get \textsf{file-ind.tex}. Of course, you'll have to run \textsf{makeindex} as usual. The purpose of the program is similar to what is provided by the program \mbox{\textsf{ixgen}} (\url{http://www.iit.upco.es/~oscar/ixgen/}) written by \textsc{Oscar Lopez} (\url{oscar@iit.upco.es}), but \textsf{forindex} was designed to be a bit more flexible. \subsection{The program \textsf{cleanindex}} The program \textsf{cleanindex} removes \textsf{\textbackslash{}index} sequences from a \LaTeX{} file. The program can be used e.g. if a user is not happy with the indexing of a file and wants to start it over again. The input file may have extension \texttt{tex} or \texttt{latex}, both uppercase and lowercase. The original file is left unchanged. A new file will be written, identified by \textsf{-noind}. For example, from \textsf{file.tex} you will get \textsf{file-noind.tex}. In this file, lines concerning \textsf{makeindex} will be left but commented, in order to avoid an empty \emph{Contents} section in the output. You can uncomment the lines as soon as you want to reindex the file again. \section{Installation} \subsection{GNU/Linux and other {*}nix systems} \begin{enumerate} \item Install \textsf{snobol4} from \url{http://www.snobol4.org}. This is Philip Budne's CSNOBOL implementation. You need a \texttt{c} compiler to compile the interpreter; it's normally a very quick and easy process. \item Make sure \textsf{snobol4} is in your PATH or make a symbolic link. \item Copy all the files from the source directory in a suitable directory (you do not need the \texttt{bat} files, provided for Windows, and can safely remove them). \item Make executables the scripts (\textsf{doindex} and \textsf{cleanindex} with no extensions), e.g. \texttt{chmod +x doindex} \item Run the scripts as follows: \begin{enumerate} \item -- to index a text: \texttt{./doindex file.tex} \item If you want to exclude words with accents: \texttt{./doindex file.tex -{}-noacc} \item -- to remove \textsf{\textbackslash{}index }sequences: \texttt{./cleanindex file.tex} \item If the current directory is in your PATH, you do not need \texttt{./} before the script name.\\ \end{enumerate} \end{enumerate} \subsection{Windows} The package offers exe files compiled with Spitbol (see (\url{http://www.snobol4.com}). Make a directory and copy all the file in the bin/windows directory. There must be two \texttt{{*}.exe} files and the two \texttt{test.{*}} files. Run the programs as follows: -- to index a text: \texttt{doindex file.tex} If you want to exclude words with accents: \texttt{doindex file.tex -noacc}. Accents must be encoded using the \textsf{latin1} encoding (see the list of Todo). -- to remove \texttt{\textbackslash{}index} sequences: \texttt{cleanindex file.tex} \subsection{Windows from source} Basically, follow the same directions given about GNU/Linux, but make sure to use the \texttt{bat} files and to install the Windows version of the interpreter. Before using the sources, that are in Unix format, use a script to translate from Unix to Dos-Windows format. If you do not have such a script, open the files with a text editor and save the sources in Windows-Dos format. This can be done reading and saving each file with the DOS \textsf{edit} program, with \textsf{vim} or any other editor able to deal with different file formats. Do not alter the files if you are not sure of what you are doing. Please (1) do not use a word processor (as Word or similar) but a simple text editor and (2) make sure to leave the encoding of file \textsf{acc.inc} to \textsf{ISO-8859-1} or \textsf{8859-15}, not to plain DOS or Unicode. \subsection{Cygwin} I suggest to follow the same directions given for GNU/Linux, but the EXE files provided for native Windows can be used anyway if preferred. \subsection{Macintosh} Not yet tested (I do not have a Mac right now). It's in the TODO list. \section{Test files} Please test the program on \textsf{test.tex} and \textsf{test.wls}. The produced file will be called \textsf{test-ind.tex} if you use \textsf{doindex}, \textsf{test-noind.tex} if you use \textsf{cleanindex}. \section{Bugs and TODO} The program does not support Unicode files. At this moment, most \LaTeX{} users are still using latin1, but the situation is rapidly changing. List of features that I would like to add: \begin{enumerate} \item Index also included files. \item Add typographical styles, such as italics for the most important locations of a word. \item Add support for several indexes (particularly with class \textsf{memoir}) \item Add an option to generate a rough index for all the words. \item Add a support to index words listed with regular expressions. E.g. \texttt{read{*}} should index \emph{read}, \emph{reads}, \emph{reading}, \emph{readings}, all under the same heading \emph{read}. \item Make possible to use another separator for the list file, e.g. a simple blank or other char preferred by user. \item Test the programs on a Mac. \item Add Unicode support. \end{enumerate} \section{Acknowledgements} The program \textsf{ixgen} gave me the idea of \textsf{forindex}. Many thanks to \textsc{Oscar Lopez} for this very good program. Some questions sent by \textsc{Carlo Pellegrino} (Modena University, Italy) gave me the idea of transforming a very rudimentary script into a general purpose utility. \textsc{Maurizio Loreti} (Padua University, Italy) sent me very useful remarks on the problems of automatical generations of indexes, which I made use of in the introduction to this text. My warmest thanks to \textsc{Phil Budne} (\url{phil@ultimate.com}) for making his excellent CSNOBOL available. Many thanks to the community of Snobol users, particularly to the members of the list \url{snobol4@mercury.dsu.edu}, and, among them, to \textsc{Gordon Peterson} (\url{http://personal.terabites.com/}), \textsc{Michael Radow} (\url{mikeradow@yahoo.com}), \textsc{Gregory L. White} (\url{glwhite@netconnect.com.au}) and to \textsc{Rafal M. Sulejman} (\url{rafal@engelsinfo.de}) whose \textsf{vim} syntax files are a daily blessing. Thanks to Jim Hefferon <\url{ftpmaint@alan.smcvt.edu}> who pointed out that the original name of the package,\textsf{ 4index}, was not acceptable due to XML syntax rules. \section{Author, copyright, license, disclaimer} This program is Copyright \copyright{} 2005\\ Guido Milanese <\url{guido.milanese@unicatt.it}>\\ under the terms of the GNU General Public License.\\ \footnotesize{} \begin{verbatim} This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. If you do not have a copy of the GNU General Public License write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. If the author of this software was too lazy to include the full GPL text along with the code, you can find it at: http://www.gnu.org/copyleft/gpl.html. \end{verbatim} \normalsize{} \end{document}