gui.highlighting
Class Scanner

java.lang.Object
  extended by gui.highlighting.Scanner
All Implemented Interfaces:
TokenTypes
Direct Known Subclasses:
JavaScanner

public class Scanner
extends java.lang.Object
implements TokenTypes

A Scanner object provides a lexical analyser and a resulting token array. Incremental rescanning is supported, e.g. for use in a token colouring editor. This is a base class dealing with plain text, which can be extended to support other languages.

The actual text is assumed to be held elsewhere, e.g. in a document. The change() method is called to report the position and length of a change in the text, and the scan() method is called to perform scanning or rescanning. For example, to scan an entire document held in a character array text in one go:

 scanner.change(0, 0, text.length);
 scanner.scan(text, 0, text.length);
 

For incremental scanning, the position() method is used to find the text position at which rescanning should start. For example, a syntax highlighter might contain this code:

 // Where to start rehighlighting, and a segment object
 int firstRehighlightToken;
 Segment segment;

 ...

 // Whenever the text changes, e.g. on an insert or remove or read.
 firstRehighlightToken = scanner.change(offset, oldLength, newLength);
 repaint();

 ...

 // in repaintComponent
 int offset = scanner.position();
 if (offset < 0) return;
 int tokensToRedo = 0;
 int amount = 100;
 while (tokensToRedo == 0 && offset >= 0)
 {
    int length = doc.getLength() - offset;
    if (length > amount) length = amount;
    try { doc.getText(offset, length, text); }
    catch (BadLocationException e) { return; }
    tokensToRedo = scanner.scan(text.array, text.offset, text.count);
    offset = scanner.position();
    amount = 2*amount;
 }
 for (int i = 0; i < tokensToRedo; i++)
 {
    Token t = scanner.getToken(firstRehighlightToken + i);
    int length = t.symbol.name.length();
    int type = t.symbol.type;
    doc.setCharacterAttributes (t.position, length, styles[type], false);
 }
 firstRehighlightToken += tokensToRedo;
 if (offset >= 0) repaint(2);
 

Note that change can be called at any time, even between calls to scan. Only small number of characters are passed to scan so that only a small burst of scanning is done, to prevent the program's user interface from freezing.


Field Summary
protected  char[] buffer
          The current buffer of text being scanned.
protected  int end
          The end offset in the buffer.
protected  int start
          The current offset within the buffer, at which to scan the next token.
protected  int state
          The current scanner state, as a representative token type.
protected  java.util.HashMap symbolTable
          The symbol table can be accessed by initSymbolTable or lookup, if they are overridden.
 
Fields inherited from interface gui.highlighting.TokenTypes
BINARY, BRACKET, CHARACTER, COMMENT, END_COMMENT, END_TAG, IDENTIFIER, KEYWORD, KEYWORD2, LITERAL, MID_COMMENT, NUMBER, OPERATOR, PUNCTUATION, QUOTE, SEPARATOR, START_COMMENT, STRING, TAG, typeNames, UNRECOGNIZED, URL, WHITESPACE, WORD
 
Method Summary
 int change(int start, int len, int newLen)
          Report the position of an edit, the length of the text being replaced, and the length of the replacement text, to prepare for rescanning.
 int find(int p)
          Find the index of the valid token starting before, but nearest to, text position p.
 Token getToken(int n)
          Find the n'th token, or null if it is not currently valid.
protected  void initSymbolTable()
          Create the initial symbol table.
protected  Symbol lookup(int type, java.lang.String name)
          Lookup a symbol in the symbol table.
 int position()
          Find out at what text position any remaining scanning work should start, or -1 if scanning is complete.
protected  int read()
          Read one token from the start of the current text buffer, given the start offset, end offset, and current scanner state.
 int scan(char[] array, int offset, int length)
          Scan or rescan a given read-only segment of text.
 int size()
          Find the number of available valid tokens, not counting tokens in or after any area yet to be rescanned.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

buffer

protected char[] buffer
The current buffer of text being scanned.


start

protected int start
The current offset within the buffer, at which to scan the next token.


end

protected int end
The end offset in the buffer.


state

protected int state
The current scanner state, as a representative token type.


symbolTable

protected java.util.HashMap symbolTable
The symbol table can be accessed by initSymbolTable or lookup, if they are overridden. Symbols are inserted with symbolTable.put(sym,sym) and extracted with symbolTable.get(sym).

Method Detail

read

protected int read()

Read one token from the start of the current text buffer, given the start offset, end offset, and current scanner state. The method moves the start offset past the token, updates the scanner state, and returns the type of the token just scanned.

The scanner state is a representative token type. It is either the state left after the last call to read, or the type of the old token at the same position if rescanning, or WHITESPACE if at the start of a document. The method succeeds in all cases, returning whitespace or comment or error tokens where necessary. Each line of a multi-line comment is treated as a separate token, to improve incremental rescanning. If the buffer does not extend to the end of the document, the last token returned for the buffer may be incomplete and the caller must rescan it. The read method can be overridden to implement different languages. The default version splits plain text into words, numbers and punctuation.


size

public int size()
Find the number of available valid tokens, not counting tokens in or after any area yet to be rescanned.


getToken

public Token getToken(int n)
Find the n'th token, or null if it is not currently valid.


find

public int find(int p)
Find the index of the valid token starting before, but nearest to, text position p. This uses an O(log(n)) binary chop search.


change

public int change(int start,
                  int len,
                  int newLen)
Report the position of an edit, the length of the text being replaced, and the length of the replacement text, to prepare for rescanning. The call returns the index of the token at which rescanning will start.


position

public int position()
Find out at what text position any remaining scanning work should start, or -1 if scanning is complete.


initSymbolTable

protected void initSymbolTable()
Create the initial symbol table. This can be overridden to enter keywords, for example. The default implementation does nothing.


lookup

protected Symbol lookup(int type,
                        java.lang.String name)
Lookup a symbol in the symbol table. This can be overridden to implement keyword detection, for example. The default implementation just uses the table to ensure that there is only one shared occurrence of each symbol.


scan

public int scan(char[] array,
                int offset,
                int length)
Scan or rescan a given read-only segment of text. The segment is assumed to represent a portion of the document starting at position(). Return the number of tokens successfully scanned, excluding any partial token at the end of the text segment but not at the end of the document. If the result is 0, the call should be retried with a longer segment.