Introduction
Parsing, syntax analysis, or syntactic analysis is the process of analysing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech). In this example Java Code from a Java Source file is used. You will see that we will be able to recognize parts of the code and store them in variables.
We use the StreamTokenizer class to parse the text into “tokens”, allowing the tokens to be read one at a time. The parsing process is controlled by a table and a number of flags that can be set to various states. The stream tokenizer can recognize identifiers, numbers, quoted strings, and various comment styles.
The example below shows how this can be done. The input is a Java file.:
try {
// Create the tokenizer to read from a file
FileReader rd = new FileReader("filename.java");
StreamTokenizer st = new StreamTokenizer(rd);
// Prepare the tokenizer for Java-style
// tokenizing rules
st.parseNumbers();
st.wordChars('_', '_');
st.eolIsSignificant(true);
// If whitespace is not to be discarded, make
// this call
st.ordinaryChars(0, ' ');
// These calls caused comments to be
// discarded
st.slashSlashComments(true);
st.slashStarComments(true);
// Parse the file
int token = st.nextToken();
while (token != StreamTokenizer.TT_EOF) {
token = st.nextToken();
switch (token) {
case StreamTokenizer.TT_NUMBER:
// A number was found; the value is
// in nval
double num = st.nval;
break;
case StreamTokenizer.TT_WORD:
// A word was found; the value is in
// sval
String word = st.sval;
break;
case '"':
// A double-quoted string was found;
// sval contains the contents
String dquoteVal = st.sval;
break;
case ''':
// A single-quoted string was found;
// sval contains the contents
String squoteVal = st.sval;
break;
case StreamTokenizer.TT_EOL:
// End of line character found
break;
case StreamTokenizer.TT_EOF:
// End of file has been reached
break;
default:
// A regular character was found; the
// value is the token itself
char ch = (char)st.ttype;
break;
}
}
rd.close();
} catch (IOException e) {
}