.EF 'Strategies for Porting'- % -'March 1, 1988' .OF 'Strategies for Porting'- % -'March 1, 1988' .EH ''' .OH ''' .TL Strategies for Porting the X v11 Sample Server .AU Susan Angebranndt .AU Raymond Drewry .AU Philip Karlton .AU Todd Newman .AI Digital Equipment Corporation .AI minor revisions by .AU Bob Scheifler .AI Massachusetts Institute of Technology .AI Revised for Release 4 and Release 5 by .AU Keith Packard .AI MIT X Consortium .LP X is a graphic user interface system enabling an application program to run on a computer other than the user's workstation. A "display server" (or simply "server") is the user's workstation. A "client" is an application program running on a machine with which the user interacts. Accompanying this document is the X Window System source tape, which also contains the "Definition of the Porting Layer" document. This document is a guide to porting the X server software to your workstation. .IP 1) Read the first section of this document (Overview of Porting Process). It describes what you will go through. .IP 2) Skim the "Definition of the Porting Layer" document. .IP 3) Scan the Details section of this document. .IP 4) Start planning and working, referring to the Strategies and Definition documents. You may also want to look at the following documents: .IP \(bu 5 "The X Window System" for an overview of X. .IP \(bu 5 "Xlib - C Language X Interface" for a view of what the client programmer sees. .IP \(bu 5 "X Window System Protocol" for a terse description of the bytestream protocol between the client and server. .IP \(bu 5 "X11 Server Extensions Engineering Specification" for a description of how to add features to your X server in an agreeable way. UNIX is a trademark of AT&T. QVSS, LK201, ULTRIX, VMS, DEC, MicroVAX and VAX are trademarks of Digital Equipment Corporation. Macintosh and Apple are trademarks of Apple Computer, Inc. PostScript is a trademark of Adobe Systems, Inc. Ethernet is a trademark of Xerox Corporation. X and the X Window System are trademarks of Massachusetts Institute of Technology. Cray is a trademark of Cray Research, Inc. .NH 1 Overview of the Porting Process .XS Overview of the Porting Process .XE .NH 2 Directories .XS Directories .XE .LP This section describes some of the directories accompanying the tape. .LP lib/X .RS .RE The X client library is used for developing client programs. Since it is simple and straightforward, it should be very easy to port. .LP doc/ .RS .RE This directory contains documentation, such as an online copy of this document and the others mentioned. .LP Makefile .RS .RE This is a master make file for everything on the tape. .LP fonts/ .RS .RE This directory has font files in a text format and the compiler to compile them into a binary format that your server uses. .LP fonts/lib .RS .RE This directory contains a library of routines for handling fonts. It contains all of the interfaces used by both the X and Font servers. It also contains routines for manipulating font files which are used by bdftopcf and mkfontdir. .LP fonts/bdf .RS .RE This directory contains directories of font files. The font files are in a text format called "bdf." (See the "Definition of the Porting Layer" document, section 5.2.8.) There are three sections: misc, 75dpi and 100dpi. The misc directory contains fonts which do not fit a particular resolution; the cursor font, various fixed-pixel-size fonts and fonts which are needed by various toolkits for icon images. 75dpi and 100dpi contain identical font sets at 75 dots-per-inch and 100 dots-per-inch respectively. All of these fonts conform to the XLFD standard; most of them are in ISO-8859-1 (Latin-1) encoding. .LP fonts/tools/bdftopcf .RS .RE This directory has the font compiler source. The old bdftosnf compiler had private definitions for the glyph format; bdftopcf inherits the definitions from the server so special configuration is not needed here. Besides, if you get it wrong, the server will still run, but will spend quite a bit of time reformating the fonts. .LP fonts/tools/mkfontdir .RS .RE mkfontdir is run over a directory containing font files. It produces the database which the server uses to map font-names to file-names. .LP X11/ .RS .RE This is an important directory because it has some include files that are actually shared between client (Xlib) and server, such as X.h, Xproto.h. The files in this directory are actually installed from include/ during the initial build process. .LP server/ .RS .RE This directory contains all the source for the server code. There are some other files that are needed from other top-level directories, such as .h files from the X11/ directory. .LP server/ddx .RS .RE This directory has all the Device Dependent X server code. Operating System dependent code is under server/os. .LP server/ddx/cfb .RS .RE This directory has "Pixblit" and other code for Color Frame Buffer displays. (In this document, "color" means multiple bits per pixel. You may be able to simply plug in a few special numbers into defines, then compile it and have it work for your display. Otherwise, it will serve as no more than an example of what you should implement as Pixblit routines for your own frame buffer. .LP server/ddx/mfb .RS .RE This directory has "Pixblit" and other code for Monochrome Frame Buffer displays. (In this document, "monochrome" means a black and white display with one bit per pixel. Even though the usual meaning of monochrome is more general, this special case is so common that we decided to reserve the word for this purpose.) If you have a monochrome screen, you may be able to simply plug in a few special numbers into defines, then compile it and have it work for your display. Otherwise, it will serve as no more than an example of what you should implement as Pixblit routines for your own frame buffer. .LP server/ddx/mi .RS .RE This directory contains Machine Independent DDX code. It supplies the "Drawing Primitive" procedures that DIX calls to perform graphics and calls the "Pixblit routines." It also contains other routines. It is portable to any system that can draw graphics by manipulating rows of pixels by hand. See the "Definition" document for an explanation of these terms. .LP server/ddx/apollo,dec,hp,ibm,macII,mips,sun,tek .RS .RE These directories contain device-specific code which runs on a wide variety of machines. All of these directories were, at least originally, contributed by the various hardware vendors. None of them represent actual production code entirely. .LP server/dix .RS .RE This is the Device Independent source. You should not have to change any of these routines. .LP server/include .RS .RE This directory has dozens of includes that are specific to the server. Many of them are pairs in the form of XXX.h and XXXstr.h, the former having #defines for the XXX topic, and the latter having struct definitions. This naming convention could not be carried out systematically as SysV allows only 14 character filenames, which truncates some of the XXXstr.h file names. The include files in this directory are only for DIX and DIX/DDX interfaces; individual modules which export functionality (such as mi) include the interface-definition header files in their own directories. .LP server/os .RS .RE This directory has Operating System specific source, mostly in subdirectories. .LP server/os/4.2bsd .RS .RE This is source for UNIX 4.2 BSD (Berkeley UNIX) source. It will also run on 4.3 BSD and ULTRIX. This code will also run on several vendors mixed SysV/4BSD systems. It provides many routines which are not very OS specific, but which haven't been moved elsewhere yet for lack of need (i.e. it runs on every device which is supported by the sample DDX directories). .LP This software is contributed to the public as a service. We welcome contributions from other development groups for inclusion on future distributions. .NH 2 Areas of Work to be Done .XS Areas of Work to be Done .XE .LP Most of the code for the X server is on an industry standard 9 track magnetic tape in UNIX "tar" backup format. The missing parts that you must supply for your particular workstation fall into the following three categories: .LP operating system: .RS .RE Your operating system is the first and most obvious source of differences. If you have a UNIX 4.2 or 4.3 BSD system, this part will be trivial to port. The further you move away from that, the harder it will be. For systems that are not UNIX-based, the hardest part of the porting process may be the interface to clients. .LP input: .RS .RE You need to code specific interfaces for your particular pointing device (mouse or tablet) and keyboard. These have to be non-blocking; a scheduler must be supplied that can wait for input events and client requests to arrive. .LP output: .RS .RE This is potentially the largest section of code you will need to write. If you have a memory-mapped frame buffer display, most of the code has already been written for you (but optimization may be desired). If you have color and/or a special-purpose graphics processor that insists upon doing all of the work itself, you have a substantial task. .NH 2 About DDX, mfb, cfb, and mi .XS About DDX, mfb, cfb, and mi .XE .LP The DDX (device dependent X) layer provides a software interface to a conceptual hardware device. The imagined device provides primatives for drawing lines, arcs, text, filling areas, etc. These primatives may be actually provided in your hardware, or you may have to build them out of simpler primatives your hardware does provide. The mi (machine independent) routines provide software simulation of the conceptual machine built out of very simple primatives such as GetSpans, SetSpans, FillSpans, PushPixels, etc., which we call the Pixblit routines. The mfb layer is one implementation of the software interface that connects to monochrome (one bit deep) framebuffers. The cfb layer is one implementation that connects to multi-bit framebuffers. In both cases, some functionality is provided by writing directly to the framebuffer. Some more esoteric functionality is achieved by calling the mi routines. In order to be able to use the mi routines, they must also implement Pixblit routines. Both cfb and mfb have been extensively tuned for Release 4 to run as quickly as portably possible on a wide variety of machines. Mfb, in particular, has some Gnu C Compiler "asm" statements which substantially increase performance of some operations on both Vax and 68020 CPUs. Cfb, on the other hand, can be tuned for new architectures by describing some CPU characteristics in server/include/servermd.h. The mi code should be portable to all systems. It calls the Pixblit routines to apply the pixels, all device dependencies are contained in there. Some routines in mi are not used by the mfb or cfb DDX implementations. They are provided to make it easier for you to get a simple port running quickly. Unfortunately, it is not possible to provide a complete DDX implementation in mi, you need the Pixblit routines which actually know how the hardware looks. The mi, mfb, and cfb routines were designed for portability over performance. Therefore, you may want to spend time optimizing them if you choose to use them. .NH 2 What do I do? .XS What do I do? .XE .LP To start, you should get the simplest server running by modifying as little as possible, probably using mi and maybe using mfb or cfb. Later, you can carefully optimize it. The first step is to copy the source code off the tar tape onto your machine. If yours is a UNIX system, this will be easy. If not, it may be more difficult. Use the UNIX "tar" command to load the tape onto your machine, if appropriate. If you have a network running, you might be able to get it from some other machine on the net by using the UNIX "ftp" command (some non-UNIX systems also support ftp). One way to load the source onto a non-UNIX system is to load it onto a UNIX system and move it to your system. If you are porting to a non-UNIX system, we strongly recommend that you have a UNIX system available in house for purposes such as this and for testing. The next step is to create a subdirectory under the ddx and os directories as appropriate for your code. (See the "Definition of the Porting Layer" document for details on directories.) Copy files into these directories from sibling directories that seem closest to what you will need. For instance, if you are porting to an IBM 3279 display on an IBM 4361 mini, you create the directories ddx/3279 and os/4361 (or os/370 if you thought this would be portable to other 370 architecture machines). If you were porting to a 3279 display on a UNIX 4.2 system, you would make a directory ddx/3279 and use the os/4.2BSD directory the way it is, if you thought it would work. (If later in the process you found it did not, you would make your own subdirectory.) Start modifying the code. Begin with the OS code. There are file i/o routines to work on, and the byte stream to the client is important. Get the byte stream working between your own test programs. The second logical step is to get some form of the X server code running. Make dummy versions of the input routines and graphical output routines so you can concentrate on getting initialization right without having the system crash. Edit Xmd.h according to the instructions in the section "Machine Dependencies" later in this document. Then compile everything. Next, work on the graphical output. Fill in whatever you need so that a simple client program that just draws some graphics on the screen works. For monochrome screens, setting a few defines and recompiling the mfb files may be all you need. (See "Porting MFB" in the Details, below.) For color screens, setting a few defines and recompiling the cfb files may be all you need. (See "Porting CFB" in the Details, below.) The xclock program is a good candidate for testing graphical output. Depending on your networking software, it might be easiest to have this test client on the same machine as your server. Finally, work on the input. Fill in code to handle the keyboard and mouse (or other pointing device). The cursor that echoes the position of the pointing device is better implemented in hardware, but mi does provides support for a primitive software cursor which is very easy to use. See the section on cursors below. Next, optimize. You are done! For more explanation, see the Details section, below. .NH 2 Cost .XS Cost .XE .LP We estimate that a basic monochrome or color server will take one to two months to develop if done on a UNIX 4.2 BSD system by an experienced C programmer who knows the hardware quite well. The more software you have to write, the longer it will take. If it is a non-4.2 UNIX system, add one to four weeks. If it is a non-UNIX system, add one to two months. If your operating system does not have a network, that must be taken into consideration. If you buy someone else's implementation, add one to four months. If you decide to write it yourself, add six months to two years. If special graphics hardware (a graphics processor, not just unusual bitplanes) is involved, it will take much longer. If you want the code optimized for maximum performance, it will take much, much longer. The more experienced you are, the less time it will take. If you are new to C, add some time. If your programmer is not familiar with your operating system, it will take longer. If you are not familiar with windowing systems, it will take longer; if you're not even familiar with 2-d raster graphics, it will take longer still. If you've done ports to X before, it will take less time. If you are really hot, it will take less time. Of course, all of these are just guesstimates. The above figures are for one programmer. Some gains may be achieved through the parallelism of adding programmers. But, as Fred Brooks puts it, the bearing of a child takes nine months, no matter how many women are assigned. If you do distribute the work, it would be best to devise a good partition. For instance, a reasonable partition might be to have one programmer work on the operating system, network and input code, have two more working on graphics output, with one of them concentrating on text graphics. We recommend no more than a few programmers at one time. At any rate, if you have a product that is robust enough to be useful, you are probably about half way to making that product a solid, finished release. .NH 1 Details .XS Details .XE .LP .NH 2 Tools .XS Tools .XE .LP .NH 3 The C Compiler .XS The C Compiler .XE .LP Your C compiler can have a significant effect upon the time it takes you to finish the project. Since the original source was developed on a UNIX system, the closer your compiler approximates the UNIX C (pcc) compiler, the better. Depending upon your situation, it may be worthwhile to try more than one C compiler and use the one that works best. (Programmer time is quite expensive; software is frequently much less expensive, even if overpriced.) If, for instance, the DIX code does not compile without modifications, you may want to look elsewhere. Sometimes we intentionally call a routine with the wrong number of arguments. For instance, there is a routine NoopDDA() in dixutils.c that is used widely as a procedure that does nothing. It has zero arguments but is used for situations where routines get passed different numbers of arguments. If this causes problems on your machine, you might need to change the code or get another compiler. If you are using an 8086 architecture, we recommend you use "large" model to get the server running, then switch to mixed model for speed and space efficiency. .NH 3 Make and Makefiles .XS Make and Makefiles .XE .LP "Make" is a UNIX program that manages the compilation process. It reads in a text file named Makefile describing the source files that need to be compiled and how. (This file is frequently called the dependencies file because it describes the chain of dependencies leading to the final product.) Make then checks the dates of source, intermediate, and object files, determines the minimum compiles needed to bring a given result file up to date, and runs each compilation step as a child process. This idea has been imported to a wide variety of operating systems (frequently still called "make"). On non-multitasking operating systems, the program frequently generates just a batch file with the needed compile commands in it and then executes this batch file as its final operation. (Beware: few of these non-UNIX versions contain all the features of the original.) We recommend using Make or whatever useful substitute you have available. The makefiles for the UNIX system are included with the tar tape, and they should work on any UNIX system. this code does not support "near" and "far" pointers. This may not be necessary or desirable on 386 systems. They might not work on your system. To aid you in generating your own makefiles for your own system, we briefly describe the syntax of makefiles. The dependency relationships look like this: .nf fig.o : fig.c fig.h xyz.h cc -abc fig.c .fi This states that the file fig.o (an object) depends upon fig.c and the two .h files listed. If fig.o is found to be older than any of the dependencies, execute the command(s) listed below it to bring it up to date. Most makefiles look much more complicated. This is primarily due to the use of macros. When you have a statement of the form: .nf COPTS = -abc -x fig -FPa .fi this means that you can subsequently use "$(COPTS)" as a text substitution macro elsewhere in the makefile. .nf fig.o : fig.c fig.h xyz.h cc $(COPTS) fig.c .fi This is frequently used as shown to hold C compiler options. It is also used to hold lists of filenames. .nf HFILES = fig.h xyz.h fig.o : fig.c $(HFILES) cc $(COPTS) fig.c .fi Another common cause for confusion in makefiles is that there are special $ symbols that signify "the dependencies" or "the product" in a command line. These can be used in powerful constructs that will indicate, in just a few lines, "compile all .c files that you need to compile and do it this way." Consult UNIX documentation for more details. The makefiles supplied with the sample server are not guaranteed to be nearly as portable as the code. In particular, there are situations where special techniques were used to get everything to compile. There are some routines that need to be compiled with #defines entered on the command line with the -D flag of the UNIX cc command instead of with a normal #define directive. If you don't have such a facility with your compiler, you should put such #defines in an .h file and do some file copying in the makefile to achieve the same result. .NH 3 Debuggers .XS Debuggers .XE .LP Because you are drawing graphics on the display, you will probably want to use a debugger that does not use the display. On some systems, a terminal connected to a serial port is the best way to communicate with the debugger. On network systems, you may be able to log into your test machine remotely and run the debugger and server from there. .NH 3 Profiling Tools .XS Profiling Tools .XE .LP After you have an initial implementation running, you may want to improve its performance. A profiler is invaluable for this purpose because it tells you where you are actually consuming CPU cycles. You can then change code based upon hard evidence. On UNIX systems, you might use the prof and gprof programs. To really analyze the code, it is very useful to use a basic-block profiler (like the MIPS pixie system); most of the frame-buffer graphics primitives are large functions wrapped around multiple small inner loops which perform the actual rendering. To gain more insight into performance deficits, the X client "x11perf" (contrib/demos/x11perf), contributed by DEC, offers a wide array of measurement tools and an easy base to add more to. It was used extensively in the development of the current cfb and mfb drivers, along with the release 4 changes to various data structures and mi algorithms with frequently astounding revelations. It is hard to recommend this program too much. Each time you sit down to optimize some area of the server, first develop a test case and integrate it into x11perf; measuring before and after to discover performance changes. X11perf is also useful in profiling the server as it provides a repeatable sequence of graphics requests; set it up to use a fixed number of iterations. .NH 2 Operating System Details .XS Operating System Details .XE .NH 3 Machine Dependencies .XS Machine Dependencies .XE .LP The sample server is written to be portable to a wide variety of architectures, including CPU chips with different word sizes and different bit and byte ordering. Before compiling the code, you should set some defines to indicate what kind of CPU you have. First, edit Xmd.h. Change the following: INT32, INT16, INT8 should be signed integers of 32, 16 and 8 bytes. CARD32, CARD16 and CARD8 should be equivalent unsigned integers. BITS32, BITS16 and BYTE should be types that are most convenient for bit-oriented data. BOOL is the most convenient boolean value type that fits in 8 bits. Change them according to your compiler. Unfortunately, most of the mfb and cfb code "knows" that both int and long are 32 bits and will not work on systems where this is not the case. The rest of the server is less encumbered, but as the sample code has never been run on such machines, it is unknown whether it will work. It will certainly not work if long != 32 bits or short != 16 bits. IMAGE_BUFSIZE is the size of a buffer of bytes that GetImage will return. Smaller systems may want to keep this at 1k or less; larger systems may put it at a few dozen k. IMAGE_BYTE_ORDER indicates the order of bytes in the image. On VAXen, this is LSBFirst because the least significant byte is on the left, and is sent down the pipe first. On 68000s it is MSBFirst. BITMAP_BIT_ORDER is the equivalent order of bits within a byte. On VAXen, this is LSBFirst because the least significant bit is most toward the left on the screen. On 68000s it is MSBFirst. BITMAP_SCANLINE_UNIT is the biggest piece of memory in which IMAGE_BYTE_ORDER applies (in bits). For most hardware, 32 is a good value. Note that mfb and cfb both assume that addresses ascend across the screen from left to right and then proceed down the screen. BITMAP_SCANLINE_PAD is the chunk size to which bitmaps sent over the bytestream should be padded. In other words, if you had a bitmap that only had one bit in it, would you want to send 8 bits, 16 bits or 32 bits? LOG2_BITMAP_PAD must be the log base 2 of BITMAP_SCANLINE_PAD. If BITMAP_SCANLINE_PAD is 32, this must be 5. LOG2_BYTES_PER_SCANLINE_PAD is the log base 2 of (BITMAP_SCANLINE_PAD divided by 8, the number of bits in a byte). If BITMAP_SCANLINE_PAD is 32, this must be 2. .NH 3 Client Access .XS Client Access .XE .LP On many systems, one large section of code to be written may be the client access. X requires a reliable byte stream that can handle binary data. The sample server has code in it to communicate over three different byte streams: TCP/IP Ethernet, DECNET, and UNIX domain sockets. If you do not have one of these already, you may find the byte stream somewhat time consuming to develop. If you have an operating system other than a UNIX 4.2 BSD system there is more work involved in client access. If it is another UNIX system, it is somewhat easier. The less it resembles 4.2 BSD, the more difficult it will be. If you can't use TCP/IP Ethernet, DECNET or UNIX domain sockets, the alternative is to use some other byte stream mechanism. This will also have to be dealt with on the X client side (there is an implementation-specific routine in the X library to communicate with the server). You might start out by implementing both sides in the same machine as long as the client and server are separate processes and there is a convenient interprocess bytestream mechanism. In particular, this may be a first step toward implementation of your alternate inter-machine client-communication scheme. In theory, any reliable byte stream will work. Its throughput should be approximately 5k bytes per second or more; otherwise performance will deteriorate. For instance, an RS-232 or RS-422 link would work, although its performance would leave much to be desired unless you could achieve a baud rate of 56kbaud or greater. Since 8-bit binary data is regularly transmitted, your bytestream cannot use command characters for handshaking and protocol (such as XON/XOFF). Many modems or other telecommunication equipment will not work if designed for just normal ASCII communications because they may intercept certain control characters. Also, an RS-422 link would only offer one client-server bytestream, whereas you may want more than one such connection. .NH 3 Multi-Processor OS's and Graphic Processors .XS Multi-Processor OS's and Graphic Processors .XE .LP The X server runs as a single process that imitates multitasking using an event-dispatching loop that checks for things to do from all sources and processes them one at a time. Many operations do not consume much time, so the multitasking appearance is upheld; but certain graphics operations may consume substantial amounts of CPU time. If another CPU or a graphics processor were available for these tasks, a significant gain in performance could be realized. Graphics processors, in particular, can offer a unique opportunity to create a very high-performance X server. See the section "Implementing On Top of Another Graphics System" for more details if you have a graphics processor. The X sample server was written as a single-threaded program for a single processor. A multi-processor system with a core processor (running the main server code) might dispatch tasks to a set of slave processors that effect low-level graphics operations. Or it may even have a completely different scheduling system, with multiple processors participating in the dispatch loop. In such cases, large parts of the server code will probably need to be rewritten. In particular, there are shared resources among clients, and you need to ensure that requests received by the server are executed in apparent synchrony, and you must ensure that global data structures such as the window tree and the resource table are maintained correctly. X is merely a bytestream protocol and anyone can write any software to implement it in any language on any computer system. The sample server is merely one implementation. .NH 3 Server Reset .XS Server Reset .XE .LP The X server will reset itself immediately after all clients terminate. It is helpful to provide a way for the user to cause the server to terminate all client connections and reset itself. At an appropriate time, your server can cause all clients to be terminated by calling DoomClients(). The following cycle through the dispatch loop, all clients will be terminated in a somewhat reasonable way. This will cause a reset. Upon reset, you should instruct your network to close all open client connections. For instance, when the server process receives a SIGHUP signal on UNIX systems, the signal routine calls DoomClients(). On a non-UNIX system, you may prefer a special sequence of modifiers and keys at the keyboard. Whatever the user does, all windows and applications will be closed and the user will have only an empty screen. .NH 3 Shutdown .XS Shutdown .XE .LP Depending upon your workstation environment, you may want your X11 server to run forever, or you may want to provide a way for the user to cause the server to quit gracefully without turning off the machine. Your server can quit by calling KillServerResources(), closing all network connections and then calling exit(). For instance, on UNIX systems, when the server process gets a SIGINT or SIGTERM signal, it calls KillServerResources() and then exit(). On a non-UNIX system, you may prefer to have the user press a special sequence of modifiers and keys at the keyboard. Whatever the user does to accomplish this, it will cause the X11 server to return to your operating system and/or shell. You may want to clear the graphics screen(s) before exiting. .NH 2 Input Details .XS Input Details .XE .LP .NH 3 The WaitForSomething Scheduler .XS The WaitForSomething Scheduler .XE .LP WaitForSomething() must wait for any of three occurrences: a hardware input event is received, a request from a client is received, or a request from a new client to open a connection is received. In the interim, you can do anything you want. On a multitasking system, you probably want to block yourself. This can be accomplished using mechanisms such as select(2) on 4.2BSD, or poll(2) on V.3. On systems on which the entire machine is dedicated to the X server you can loop endlessly, checking for input and client requests. It would be unwise to depend exclusively upon idle times for polling the keyboard and pointing device. You should also poll these input devices at other times. In fact, these tasks should be monitored by an interrupt service routine checking at regular intervals. Otherwise, the users will be constantly annoyed when their keystrokes and mouse events are lost. Also, many paint-style programs depend upon regular pointing-device event-reporting to enable the user to draw smooth curves with the pointing device without leaps from one cursor location to another. (Even if the hardware can queue one or two such events, some graphic operations such as copying a large image can consume more time than a few keystrokes in rapid succession by a touch typist.) DIX will process requests from each client until the variable isItTimeToYield is set. If you do not set it, you will enable one client to lock out all others by constantly drawing graphics. Therefore, you should devise a strategy for setting isItTimeToYield and ending the "timeslice" of a time-consuming client. The sample server will set this after ten requests have been read from the same client. The DIX code will service each client in the order received from WaitForSomething(). You might tune the server so that if you write an event to a client, the priority of that client increases, by returning him earlier in the list or allowing more time before setting isItTimeToYield. You might set isItTimeToYield if the current request changes the window tree (causing exposures). .NH 3 Keyboards .XS Keyboards .XE .LP The keyboard consists of two kinds of keys, regular keys and modifier keys. Modifier keys, like Shift and Control, are keys the user presses while typing regular keys. Your keyboard must be able to indicate when the user presses or releases keys. More specifically, your keyboard-interface software must be able to generate a KeyPress when a modifier or a regular key is pressed and a KeyRelease when a modifier key is released. You must also generate a KeyRelease for a normal key, but you can generate it immediately after the KeyPress is queued. If you cannot at least do this, you may have problems. If your keyboard currently generates queue events upon each key motion or calls an interrupt routine that can do this, your situation is improved. If you have a system in which a keymap has one bit for each key that is being pressed, you simply need to check this keymap at regular intervals in an interrupt service routine and queue events on an internal queue you maintain. If you have a keyboard at the other end of a serial line, things become more difficult because you must reverse-map your ASCII characters into keycodes. In addition, you need to simulate modifier keys being used. For instance, when you get a lowercase "a", you must send a KeyPress for the "A" key, then a KeyRelease for "A". If you get an uppercase "A", you must send a KeyPress for the Shift key, send a KeyPress for the "A" key, then a KeyRelease for "A", then a KeyRelease for Shift. If you get a space character, you do not know if the shift key has been pressed, so you assume it has not. Between keystrokes, there is no way to know if the shift key has been pressed. Since with this scheme the client cannot ascertain when the user is pressing the shift key without typing any keys, some client applications that try to detect this will not operate properly. If you want autorepeat, you must simulate this in your code or hardware by generating KeyPress and KeyRelease events when appropriate. The X protocol specification describes in detail how these options are set by a client. .NH 3 Pointing Devices .XS Pointing Devices .XE .LP The pointing device may be a mouse, a graphics tablet, a light pen, a touch screen, a trackball, a joystick, a pair of thumbwheels, or any other device that allows the user to indicate a location on a two-dimensional surface. The surface should bear some resemblance to the screen, because a visible cursor is displayed on the screen at a location that corresponds to the pointing-device location. The pointing device must report a location as a graphics coordinate on the screen. The pointing device must have one or more "buttons" or other momentary control that the user can touch or press, such that the software driver can report a "press" and a "release" event. For instance, a touch screen can report press and release events when the user touches the screen. A trackball will probably require one or more separate buttons. Some of these pointing devices are absolute, some are relative. For instance, with a touch screen, the user directly indicates the desired location on the screen. Mice and trackballs, on the other hand, only provide relative motion information; some other hardware or software must integrate these moves into a location. A graphics tablet is on the absolute side, but requires a mapping between the absolute coordinates on the tablet surface and the screen coordinates. Some relative devices, such as mice, have a scheme in software or firmware to "accelerate" the motion of the mouse. For instance, on the Apple Macintosh, the interrupt service routine for mouse motions checks each increment to be added to the cursor location. If the jump is past a certain threshold, it doubles the jump distance. In this way, the user can move the mouse quickly across the screen, while still retaining fine control over the location for detail work. Unfortunately, this technique is frequently used because the hardware simply cannot generate fine enough position increments. If you implement or have available such a scheme, you should allow standard control calls from a client to turn this effect off and on. Buttons are numbered starting with one. Probably, the left button on a mouse should be number one and they should be numbered towards the right from there. Client applications that use fewer buttons than you have will start with one and use only as many as needed. Since the X protocol specifies mechanism, and not policy, programs that depend upon more mouse buttons than you have may end up waiting for a long time before you hand it a button click which you cannot generate. On the other hand, light pens, graphics tablets with pens, and touch screens all implicitly have one "button", so it is reasonable to assume that client developers will be encouraged to consider one-button pointing devices. Keep in mind that the mainstream pointing devices will be mice with one or more buttons and graphics tablets. Client programs written with one pointing device in mind may prove hard to use with another pointing device. That is, programs written for a mouse usually assume that the mouse location can be chosen very accurately. If your touch screen is coarse, it may be very frustrating to use. Also, a touch screen usually cannot generate mouse move events while the mouse "button" is not "pressed". Make a mouse in a multiple screen environment move from one screen to the next by creating the impression that the screens are adjacent to one another; when the user moves the pointing device off the edge of one screen, the cursor moves onto another. X provides no policy for this, and you are free to make any geometric models you please. .NH 2 Graphics Output Details .XS Graphics Output Details .XE .LP .NH 3 Porting MFB .XS Porting MFB .XE .LP If your screen is a simple monochrome frame buffer, you probably want to start by porting the mi and mfb routines. These will get you up quickly so you have something that works on which to build. Mfb has been extensively tuned for a few environments; in particular mfb runs very well on 68020 and vax CPUs where GNU CC is available. It also runs quite well on many RISC processors, where C compiler technology is more able to optimize some of the common operations. Although you could easily expend considerable time optimizing it, it is not unreasonable to leave most mfb routines the way they are. The mfb routines are extremely portable. Most monochrome screens need only a half-dozen defines changed before the code works. System bit and byte order and other machine dependencies are given by #defines. (It assumes that byte ordering on the screen is the same as byte ordering in main memory.) First, make sure you have edited Xmd.h for your CPU. See the section "Machine Dependencies" for instructions on how to do this. Then edit server/include/servermd.h to set up bit/byte orders and font padding information (mfb will work with any font padding, but MSBFirst machines work best with GLYPHPADBYTES == 4, GETLEFTBITS_ALIGNMENT == 1). Next, write a screen initialization routine which sets the whitePixel and blackPixel values in the screen structure and calls mfbScreenInit with appropriate parameters; in particular you'll need to pass the address of the frame buffer, the screen size in pixels, both horizontal and vertical resolution in dots/inch (truncated to an int, which limits the accuracy a bit) and the frame buffer width in pixels. This last parameter may seem redundant, but many displays have extra framebuffer memory per scanline which is not visible on the display. Set this final parameter to the total pixel width of the display and mfb will ignore the invisible space. You could just type in a literal address in hexadecimal for the frame buffer address, but you may want to be a bit more sophisticated. In this screen initialization routine, you'll want to initialize the various screen functions which apply to your hardware; hardware cursor routines (or mi software cursors) should be set up here. Mfb requires that the CloseScreen function which it stores in the screen be called at server reset time, make sure you wrap it if you need your own hooks here. If mfbScreenInit returns without troubles (TRUE), call mfbCreateDefColormap(pScreen) to initialize the default colormap with appropriate values. That's it! All other machine dependencies should be taken care of, for most screens. If you have an interlaced screen, where rows of neighboring pixels are not neighboring in memory, there is a way to make mfb work on it. The changes needed are few; carry them out carefully. They involve changing the mapping from the row number to address. Look for places where we multiply by devKind or width. .NH 3 Porting CFB .XS Porting CFB .XE .LP If your screen is a simple packed-pixel frame buffer (either gray scale or color), you will want to start by porting the mi, cfb and mfb routines. You'll need to use mfb, even though your screen is color, as each server is required to support 1-bit pixmaps. The cfb routines have been extensively tuned for 1-byte-per-pixel displays and will work quite well with little change. On other displays (2 bit up to 32 bit), the existing code will still work, but in many areas performance will be disappointing. The cfb routines are also extremely portable. Most color screens need only a few changes to Xmd.h and server/include/servermd.h. The 8-bit specific cfb text code works best with GLYPHPADBYTES == 4, GETLEFT_BITS_ALIGNMENT == 1, but will function with any padding. Also in servermd.h are several CPU specific tuning parameters. Read the comments carefully at the top of the file and set the ones appropriate for your CPU. They do not affect the correctness of the code, but can offer substantial performance gains if set correctly. If you are unsure, guess and use a profiling tool to discover which set work best. As with the mfb code, you could spend almost unbounded effort tuning various portions of the cfb layer for your particular system; but most of the code should run well enough unchanged to not warrant the effort. Finally, set up an initialization routine which calls cfbScreenInit which uses arguments similar to those used by mfbScreenInit, the sizes are all still in pixels. CfbScreenInit does take an additional parameter before the framebuffer width, the visual class of the default visual. Set this appropriately (probably PseudoColor). You needn't set up whitePixel/blackPixel on pseudo color machines as cfbCreateDefColormap() will pick appropriate values and store them in the colormap when called after cfbScreenInit returns success. If you want to force the values for whitePixel/blackPixel, set them in the screen structure after cfbScreenInit and before cfbCreateDefColormap. .NH 3 Implementing On Top of Another Graphics System .XS Implementing On Top of Another Graphics System .XE .LP Many workstations already have their own graphics library or even their own windowing system. In order to coexist with the rest of the world as peacefully as possible, you may want to implement your X server on top of such a library. In fact, your machine may come with its own graphics processor that can greatly speed up graphics if used judiciously. Beware, however, that many X clients draw small objects, or only a few at a time. The overhead for translating X requests into graphics-system primitives may dominate the drawing time and cause the resultant server to be slower than a simple dumb frame-buffer system. Do not casually assume that the graphics processor is the fastest way to do things. Profile, profile, profile. Since such graphic systems usually perform high level operations such as line drawing, text drawing, and area fill, you would start accommodating them at the "Drawing Primitives" level. In other words, you would rewrite one or more of the drawing primitive routines provided (such as miPutImage(), miPolyArc(), miPolyFillRectangle(), or miImageText8()). Instead of using the equivalent mi routine, you would write your own routine to use the graphics system. One problem with a graphics processor, which also occurs when trying to implement a server atop an outside graphics library, is that the definition of certain functions can change in subtle ways. For instance, a graphics processor may support text drawing only by ORing the glyphs into place; the X routines require more sophisticated text-drawing capabilities. A more difficult case is that in which a graphics processor can draw only fixed-width characters or can draw only 8-pixel-wide characters, or can draw characters only in its own hardwired font. There are several approaches to this problem. First, you can recognize the 80 percent of the situations that can be executed by your graphics system, using the graphics system for those cases, and then executing the remaining 20 percent with mi (and possibly even cfb or mfb) code. Your GC validate routine can route different requests to various routines to do things differently. (See the Definition document for more information on the GC validate routine.) Secondly, you can supplement the graphics processor's work. You can implement each X primative call for with more than one call to your graphics system, possibly with some auxiliary touch-up. Third, request changes in your graphics processor or library. By using as many of these approaches as appropriate, you can maximize the overall performance and compatibility of your workstation while correctly interpreting the X protocol. Example: Your graphics processor applies glyphs only by "ORing" them into the image. Make the ImageGlyph routine call the graphics processor to draw the character's rectangle in the background color, then call the graphics processor to draw the character. If using just a solid-fill style in OR mode, you make the PolyGlyph routine call the graphics processor to draw the character. You use the slower mi routines for PolyGlyph routine that must effect tiling, stippling, etc. Example: A graphics processor can draw only fixed-width characters. In this case, you use the Validate routine to change the primitive procedure pointers in the GC depending upon whether your font is fixed width or variable width. The fixed-width fonts go directly to the graphics processor. The variable-width fonts would be drawn in software, probably using routines borrowed from the sample server. (Depending upon the application, much text on the screen may be fixed width in the default font.) Example: The graphics processor cannot clip to an irregular region as the entire Drawing Primitive set must do. Each routine checks the clipping region and ascertains whether the entity to be drawn falls entirely within the region. If so, the drawing is executed by the graphics processor. If any part of the entity is clipped, it is handled by the mi, cfb or mfb code. Example: A graphics processor can draw text only with its own hardwired font. You create the font data that would correspond to your hardwired font, including the character glyph images. You make up a name for this font and make that your default font. Once again, you use the Validate routine to change the primitive procedure pointers in the GC depending upon whether your font is the hardwired font or not. The hardwired font goes directly to the graphics processor, as long as you can handle the fill style and clipping. Other fill styles or clipping may be handled by using hardware to draw into a pixmap and then applying it to the screen. Anything else would be drawn in software, probably using routines borrowed from the sample server. Example: In X, lines are drawn with a model borrowed from PostScript in which the width of a line is a scalar number and ends of lines can either be butt (squarely cut off perpendicular to line) round (semicircular end), or projecting (like butt but extending past end of line by 1/2 line width). Imagine your graphics processor draws lines by smearing a rectangle from the source to destination. You get to set the height and width of the rectangle, but nothing else. You will not be able to use this operation for X wide lines in any but the simplest (i.e. horizontal/vertical or zero-width) cases. In X there are few requirements placed on zero-width lines. (If you get a line width of zero, the intent is that it be "the fastest, easiest line," not an invisible line that has no width.) Fill-style rules still apply, the width should be approximately 1 pixel. The line style (dash style) should still be processed. The join style can be ignored because all join styles look the same at this resolution (except that miter joins for acute angles can get very long; you can ignore this effect). Your algorithm can be anything reasonable, it is desirable that you obey the cap style "NotLast" which indicates whether the ending pixel should be drawn. There is also a requirement that the lines be identical in the face of clipping; and a suggestion that the lines be identical when drawn in the reverse direction. Client programs that are picky about the lines they draw can draw width 1 lines. Your GC Validate routine can change the line-drawing routine pointer in the GC so that zero width lines get drawn by the graphics processor and the others are drawn by mi. Of course, the facilities of each graphics processor are unique and each has special considerations. This is an area that will require meticulous attention to detail on your part. .NH 3 Hardware Tiling and Stipples .XS Hardware Tiling and Stipples .XE .LP Some hardware has the ability to apply patterns to the graphic surface. X makes a distinction between a tile versus a stipple. A tile is a "full color" pattern, the depth of which matches the target drawable. A stipple is a binary pattern that writes the foreground color where there are 1-bits areas and (if opaque) the background color on 0-bit areas. In addition, X allows a tile or stipple cell to have any size. Some graphics processors can apply patterns that are only certain cell sizes, such as 8x8 or 16x16. Most CPU chips will apply patterns more efficiently to some frame buffers when the pattern width is 8, 16 or 32. In these cases, you use the GC validate routine to switch between fast pattern writing versus slow pattern writing via the mi routines. If your pattern size is a factor of your hardware pattern size (such as 2x4), you can simply replicate it to fill the hardware rectangle. (Many patterns will, in fact, be such sizes, so this will not be wasted effort. There is a request, QueryBestSize, that a client can execute to ascertain what sizes are optimal.) .NH 3 Graphic Contexts in Hardware .XS Graphic Contexts in Hardware .XE .LP Many hardware and firmware graphics systems have internal state analogous to X's Graphic Contexts. Such settings as current line width, current font, and current foreground color can be set in hardware for subsequent drawing operations. The sample server provides a mechanism for conveniently and efficiently specifying these settings: the GC validate procedure, which is called when necessary just before drawing. Each drawable (window or pixmap) has a fixed serial number, which is unique for that drawable. Each GC has a serial number field that reflects the last drawable for which it was validated. Before a drawing operation with a drawable and a GC, the two serial numbers are compared; and, if different, the validate routine(s) are called to validate the GC. When a GC is validated for a drawable, its serial number is set to the serial number of the drawable so that the next time these two are used together, the validate routines are not called. But the GC serial number is changed when some of its fields are changed, forcing a validate the next time around (the high bit is changed- it is unused for anything else). In other words, by default this validate procedure you write is called only when the graphic context about to be used in a drawing operation has been changed since the last validate for this GC and drawable or if the last validate for this GC was for another drawable. If you have only one hardware GC state, however, the validate routine must be called more often, because it must also be called whenever you switch between different GC's. For instance, under normal conditions, if you drew with drawable a and GC A and then drew with drawable b and GC B and kept switching between aA and bB without changing the GC's, each would no longer need to be validated because their serial numbers would match. You could ensure that the validate routines are called for each change of the GC in use by keeping a static GC pointer variable that points to the last GC used. When a new GC is validated, the serial number of the last GC would be changed (change the high bit -- do not change the rest which is clipping information). Once this has been done, set your static GC pointer to point to the new GC. The validate routine will then be called whenever the hardware GC information needs to be changed. Unfortunately, the validate routine is probably much more involved than is necessary for this process. Instead, keep a global variable which points at the current GC in use and check in each graphics operation that the global GC pointer matches the GC passed in. If not, call a function to reload the hardware state from the new GC and change the global pointer to point at the new GC. DestroyGC would then check to ensure the cached GC pointer was invalidated when the GC was deleted. If you have a sophisticated graphics processor that has, for instance, eight "contexts" of graphic parameters among which it can switch, you can retain eight static GC pointers (in an array). Before each graphic operation, set the hardware to use the hardware GC it needs. (You might want to run benchmarks to ensure you are not spending more time switching hardware GC's than necessary.) See the Definition document for more details. .NH 3 Implementing X on top of Another Window System .XS Implementing X on top of Another Window System .XE .LP If you have another windowing system on top of which you want X to run there are several procedures in the ScreenRec and WindowRec you can use to execute almost all window operations. (Remember, DIX does not interact with your screen directly, so there is considerable leeway in this area.) For instance, the window borders are always drawn with PaintWindowBorder() and the background with PaintWindowBackground(), which you supply. The contents of windows are drawn with the Drawing Primitives, which you supply. In addition, DIX calls your routines CreateWindow() and DestroyWindow() when it makes and destroys windows. Other hooks are provided for mapping and unmapping windows, moving them, and changing their attributes. See the Definition document section on windows for more details. .NH 3 Color .XS Color .XE .LP Color requires special considerations. You need to decide what class of display you have (see the Definition document, the section on Visuals and Depths). Next, set up all of the visuals you will support. Each depth can have one or more visuals with which it is associated; if your screen has several modes, you can list them all. As with depths, it may be best to begin with the simplest and then add visuals one at a time. Cfb has quite a range of support for colormaps. It has routines which emulate any visual type on a pseudo color system, most of which are also appropriate for other hardware types. You'll need to implement StoreColors, InstallColormap, UninstallColormap and ListInstalledColormaps; all of which are typically quite short. Place pointers to these routines in the screen structure before calling cfbScreenInit. If you are not using cfb, you may want to extract the colormap code anyway; it is not dependent on the rest of that directory. You might want to construct your server so that it appears to support multiple lookup tables simultaneously, so you can have multiple Colormaps installed at the same time. For instance, if you had a display that had ten bits per pixel and a lookup table of 1024 entries, instead of declaring the obvious, you could declare that you had a display with depth 8 and four lookup tables. The extra two bits in each pixel would determine the lookup table to use for that pixel. Each time you wrote into windows on this screen, you would need to write those extra two bits surreptitiously to indicate the lookup table to use for this pixel. When copying pixel data off the screen onto pixmaps, the window would be considered eight deep, the extra two bits would be ignored. CopyWindow() would have to attend to these extra bits as it changed the colormap allegiance of affected pixels. .NH 3 Multiple Screens .XS Multiple Screens .XE .LP If you have multiple screens, the implementation is more complicated. Each screen may have its own method of managing windows or drawing graphics. Each screen may have a different scheme for its frame buffer. Each screen manages pixmaps whose format is specific to that screen. There are no commands available to the client to transfer pixels directly from one screen to another or between pixmaps of different screens. Each server must decide what depths and formats of image pixmaps it is willing to transfer between the client and server. This usually involves some consensus among the screens. A given server must support depth 1, and probably supports all of the depths of its screens. Fortunately, you need not implement routines to copy pixels between different depths. The only way for the client to copy pixels between drawables of different depths is with CopyPlane, which copies one plane from one drawable to another. The client can copy whatever planes it needs into 1-deep pixmaps and can then logically combine these to achieve any desired result. Every drawable has a fixed depth. Every GC has a fixed depth. The GC's depth must match the depth of the drawable for drawing, or an error results. Any tile pixmap used with a GC must be the same depth as the GC. All screens should have the same byte and bit ordering. If they don't, you need to declare the "real" bit and byte ordering to follow one of your screens and set the variables in the screenInfo struct to it. Conversion would happen in GetImage() and PutImage() for each screen. .NH 3 Backing Store and Save-Unders .XS Backing Store and Save-Unders .XE .LP Backing Store and Save-Unders are schemes in which the server saves parts of windows concealed by other windows so that when they become exposed again, the server can replace the pixel values quickly instead of asking the client to repaint the window. Backing Store is a scheme where a window stores away obscured areas of itself when covered by other windows. Save-Unders is a scheme where a window saves away parts of the windows beneath it when it is placed in front. The basic idea is the same, but the subtle differences have important implications. With Backing Store, a window tracks its own contents. When the client draws into a window that is partially obscured, the window must intercept these drawing operations and either cause the drawing to happen to the saved backing or forget the saved backing so that an expose event is generated the next time that part is exposed. With Save-Unders, this is difficult because the window would need to know which pixels are associated with which windows; it would need to intercept all drawing commands to all windows. For this reason, Save-Unders is practical only for situations in which either there will be no drawing underneath, or if there is, it can be easily intercepted in one location in the code. (See the section on software cursors for an example of this.) Backing store, on the other hand, is more complicated in another way-- the pieces of backing that need to be stored are often irregular shapes. In the case of X, windows are always rectangular, so the backing store can always be saved as a set of rectangular pixmaps. If this is done, though, drawing into the backing becomes extremely complicated and probably slows the system to the extent that your initial performance savings are severely diminished. If backing is saved as one large pixmap, you waste pixmap memory; you essentially retain a duplicate copy of each window in memory in which the only parts that are not used are those exposed on the screen. Thus, it is usually most practical simply to discard parts of backing store that are drawn onto while hidden; an expose event will always execute properly. The sample implementation of backing store is very device-independent. All that is needed to use it is a small vector of device-specific functions, only two of which are typically used; SaveAreas and RestoreAreas: .nf (*miBSFuncs->SaveAreas) (pixmap, region, x, y); PixmapPtr pixmap; RegionPtr region; int x, y; (*miBSFuncs->RestoreAreas) (pixmap, region, x, y); PixmapPtr pixmap; RegionPtr region; int x, y; .fi (*SaveAreas) copies the specified region (which is pixmap relative) from the screen starting at (x,y) to the pixmap; (*RestoreAreas) copies the specific region (which is screen relative) from the pixmap to the screen, starting at (x,y). If you can provide these two functions; call miInitializeBackingStore(pScreen, funcs) and the rest will be taken care of. Cfb and mfb already call miInitializeBackingStore. DIX provides SaveUnders when DDX provides BackingStore. This implementation is not as optimal as a real save unders implementation would be, but is better than nothing in most cases; the mi backing store implementation changes its behavior when bits are saved because of save unders instead of backing store. .NH 3 Software Cursors .XS Software Cursors .XE .LP The sample server is designed for a hardware cursor that maintains a separate cursor bit map in hardware so that the video electronics mixes the image of the normal display and the cursor before being displayed. Nevertheless, a software cursor module is provided with hooks which require various levels of support. The easiest to use level is the DispCur module (server/ddx/mi/midispcur.c). This provides software cursors with nearly no device-specific code. All that is required is that you read events from the pointer device and send position update events to the mi routines. Four routines are required, one of which is implemented in mi for the truly meek. The relevant header file is "server/ddx/mi/mipointer.h"; this file contains the function vector definition and some useful function defines. .nf typedef struct { long (*EventTime)(); /* pScreen */ Bool (*CursorOffScreen)(); /* ppScreen, px, py */ void (*CrossScreen)(); /* pScreen, entering */ void (*QueueEvent)(); /* pxE, pPointer, pScreen */ } miPointerCursorFuncRec, *miPointerCursorFuncPtr; .fi An initialized structure of this type is passed, along with the screen which needs cursors to miDCInitialize(pScreen, &pointerCursorFuncs). (*EventTime) is required to return the time of the last event processed (as 32 bits of milliseconds). This allows the mi cursor support to build events and fill in the appropriate data. (*CursorOffScreen) is called whenever the cursor would be off of the current screen if the user motion were tracked exactly. This routine returns FALSE if the cursor should be confined to the screen, TRUE if cursor should wander to some other screen. ppScreen should be smashed to indicate the new screen, px and py should indicate the position on that screen; they are initialized to be the position of the cursor on the old screen if the cursor were not confined or warped (i.e. (x,y) is not on the screen). (*CrossScreen) is called whenever the cursor is moved on/off of the screen. entering is TRUE when pScreen is the screen now containing the cursor and FALSE when pScreen used to contain the cursor. (*QueueEvent) is called in response to WarpPointer protocol requests. It should place the event at the tail of the input queue to be processed in series with the other events; this is frequently quite difficult to implement, however, and the mi routine, miPointerQueueEvent, simply processes the motion event immediately can be used (this may cause occasional small protocol violations). When your pointer device moves, call .nf miPointerDeltaCursor (pScreen, dx, dy, generateEvent) ScreenPtr pScreen; int dx, dy; Bool generateEvent; .fi with the distance the device has moved and TRUE for generateEvent. If you device reports absolute coordinates instead, use miPointerMoveCursor instead (which replaces the delta coordinates with absolute ones). To fill in the current pointer position for other event types, use miPointerPosition (pScreen, &rootX, &rootY), passing the address of the event rootX, rootY fields which will be filled in as appropriate. The other two cursor layers can be investigated by looking through the miDispCur layer which uses them. In particular, you may want to use miSprite which allows you to provide device-specific cursor drawing primitives to speed up cursor rendering. .NH 3 Limited Hardware Cursors .XS Limited Hardware Cursors .XE .LP Many hardware cursor systems limit the maximum size of the cursor (for instance, to 16 pixels square). The X specification, however, specifies that a cursor can be any size. It is allowable for the server simply to truncate the cursor to an appropriate n-by-m rectangle. This may be the top-left corner, or it may be any n by m pixel rectangle that is entirely within the cursor and contains the hotspot; the exact choice is implementation dependent. .NH 3 Fonts in Off-Screen Memory .XS Fonts in Off-Screen Memory .XE .LP Fonts are probably stored on disk on the server when not in use, probably in a bitmap format in binary, a form that is ready to go. Character drawing consumes much of the CPU, so you should try to ease the burden. Of course, you need to read fonts into memory when they are needed. Unless you have an extra megabyte of main memory, it is probably best not to retain them in memory forever; users have a tendency to build up large font libraries. You should have some scheme for loading fonts into memory on demand and for purging old fonts when no longer needed. Rarely will people use more than a dozen fonts simultaneously. (The main exceptions are programs specifically designed to show a sample of each font and novice What You See Is What You Get word processor users.) You will probably want to record the font least recently used and purge it when required. Appropriate algorithms can be found in many places, or you can devise your own. The binary format in which the fonts are stored (probably snf) has glyphs aligned and padded to byte, 16-bit, or 32-bit boundaries. You can decide which based upon #defines. .NH 3 Graphic Memory Usage .XS Graphic Memory Usage .XE .LP Some servers have extremely complex hardware, possibly consisting of multiple frame buffers among which the screen can switch, possibly having a graphics processor. Sometimes, the graphics processor has its own address space that may include memory in addition to the frame buffer that is displayed on the screen. Sometimes, the graphics processor can also access main memory in your server. Sometimes, your main processor can access graphics-processor memory. Sometimes, your main processor cannot access the frame buffer. For these situations, you should carefully consider what to put in graphics memory and what to put in main memory for your particular hardware configuration. You should consider putting the following in graphics memory: .IP \(bu 5 Anything you must put in graphics memory because of the requirements of your graphics processor .IP \(bu 5 Hardware color lookup tables .IP \(bu 5 Hardware GC information .IP \(bu 5 Cursors .IP \(bu 5 Font Glyphs .IP \(bu 5 Pixmaps .IP \(bu 5 Regions .IP \(bu 5 Save-Unders .IP \(bu 5 Backing Store .LP Use the GC validate routine to move things in and out of graphics memory. If your graphics hardware has limited resources, you might want to consider drawing into pixmaps that live in main memory, rather than special graphics memory. To do this, you should provide an in-memory version of the Spans functions. When drawing to an in-memory pixmap, and swap these Spans functions and the mi output code into the GC at ValidateGC time. Then the mi code will draw the appropriate things into the bits in memory. This will probably be slower than using the graphics hardware, but may be easier that dealing with memory allocation on the graphics hardware. Furthermore, you might consider drawing into pixmaps in main memory if your hardware does not draw according to the X11 spec; mixing the two styles of drawing may produce odd results. After you have implemented the above, and you use your X server, reconsider your decisions. (It is difficult to know how you will use an X server before you actually do so.) You may find that you want to change the use of graphics memory. .NH 3 Graphic Output Tuning .XS Graphic Output Tuning .XE .LP The mi code is designed to be portable by sacrificing a certain amount of performance. Once you have got it running and have a large user base, it might be appropriate to make it run faster. Mfb and cfb have already been extensively tuned for many platforms; it is unlikely that you could increase performance by substantial amounts without resorting to assembly or gratuitous code expansion. The overall rule in optimizing software is to collect experimental data. Do not subjectively judge whether something "feels" faster; subjectivity can be easily led astray. Do not merely assume where the performance bottlenecks are: use a profiler; run benchmarks; use a stopwatch. If you do not have a profiler, try running a series of benchmarks. For instance, if you think that a major bottleneck is a certain loop in ImageGlyph, try commenting out the loop to see what performance gains are effected. Run benchmarks before and after, while running a program that will exercise that function. This gives you an indication of whether your hunches are right concerning the location of the bottlenecks before you devote a great deal of time implementing and debugging a complex algorithm. Before you install an optimization, run benchmarks. After you install the optimization, run the benchmarks again to check performance gains. Complicated software that yields no substantial performance gains will simply be a liability later when the software needs to be modified. Much optimization effort should be directed toward the operations that are executed most frequently. Sometimes, you can make a quick routine to handle a special case that occurs frequently and leave the more unusual cases for more general software that takes the time to handle all cases. For instance, most items that are drawn will be entirely within the clip region. Most of those that are not will be entirely outside of the clip region. Most drawing is executed with a plane mask of all 1s, and with an alu mode of Copy. Most drawing is done with a solid fill style. If draw is done with another fill style, the tile or stipple frequently has a size that is a byte or word multiple. The cfb and mfb routines have already been optimized for some of these special cases. In general, start optimizing where you have a better algorithm or know more about the hardware than the portable routines. .NH 4 First-Round Optimization .XS First-Round Optimization .XE .LP The most important things to optimize first are probably text drawing, zero-width lines, and large area pixel copying and filling. Text drawing is best optimized by working on the Glyph routines. You may want to rewrite them in assembly language or implement them in hardware. Since most glyphs are written with solid fill styles and the glyph images usually do not lie on a clip-region boundary, you may want to make your speedy routine handle just this special case, and handle everything else with mi, cfb and mfb routines. You can even optimize the mfb and cfb glyph routines to your machine without changing much. Fonts glyphs are padded to byte boundaries for each scanline. You can have this padded to 32-bit boundaries, if desired. The macro getleftbits() in maskbits.h gets glyph bits from glyphs; optimize it for your machine. (For instance, take into account byte, word and longword boundaries, whether your machine can address 16-bit or 32-bit words, and whether this is efficient.) Zero-width lines are a good candidate because the rules for drawing them are relaxed. You need not worry about many of the details. Frequently, hardware or firmware can generate these. The most common lines are vertical and horizontal; special routines to draw these may be worthwhile. CopyArea and CopyWindow optimization will improve window-movement performance. Frequently, a machine will have special hardware to perform such graphic operations. .NH 4 Second Round Optimization .XS Second Round Optimization .XE .LP The next phase of optimization will probably concentrate on painting window backgrounds, wide lines, some of the easy-to-perform rectangle operations, and PushPixels(). Wide lines no longer present much of an opportunity to invest a great deal of work into an optimization and receive much benefit from it. The mi code now uses integer arithmetic for nearly all of the wide line computations and provides nearly-perfect protocol-conforming lines. Experimentation with moving the rendering into cfb or mfb has shown that not much performance is to be gained; if you want to try, the polygon edge walking code is written using macros which could easily be used directly inside the graphics layer. The code you want to look at is in miwideline.c and miwideline.h Although wide arcs have seen a substantial speedup since the original protocol-conformant code was shipped with R3, it would be nice they ran much faster. Unfortunately, the protocol defines an object which is quartic in description, the straight line code solution of which involves several square roots and a couple cube roots. If you examine the implementation in miarc.c, you'll see a nightmare of complicated floating point arithmetic. These routines attempt to come as close as possible to the protocol definition for a wide arc, and suffer tremendously in performance because of it. Wide circles, however, do provide a reasonable opportunity for optimization. As the both inner and outer edges of a wide circle are circular (unlike elliptical arcs), a fully-integer circle edge walker is used to scan-convert them. Zero width arcs (like lines) are not completely specified by the protocol, and so the traditional integer walker is included in mfb, cfb and mi. If you can't use one of those version directly, look in the relevant code in ddx/mi/mizerarc.c,mizerarc.h, ddx/mfb/mfbzerarc.c and ddx/cfb/cfbzerarc.c. Also included in cfb for R5 are some interesting optimizations for clipping points to rectangles and rendering polygons. Look in ddx/cfb/cfb8line.c and ddx/cfb/cfbply1rct.c for these algorithms. The polygon code could easily be ported to mfb, however the zero-width line code is dependent on having addressable pixels. PushPixels may also be an important routine to optimize. This is because it is used by the software cursor code. Both mfb and cfb have heavily tuned PushPixels routines which work in solid fill/copy mode and provide adequate performance for software cursors.