Tag Archives: Parsing

Parsing Java Code


Parsing, syntax analysis, or syntactic analysis is the process of analysing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech). In this example Java Code from a Java Source file is used. You will see that we will be able to recognize parts of the code and store them in variables.

We use the  StreamTokenizer class to parse the text  into “tokens”, allowing the tokens to be read one at a time. The parsing process is controlled by a table and a number of flags that can be set to various states. The stream tokenizer can recognize identifiers, numbers, quoted strings, and various comment styles.

The example below shows how this can be done. The input is a Java file.:

try {
    // Create the tokenizer to read from a file
    FileReader rd = new FileReader("filename.java");
    StreamTokenizer st = new StreamTokenizer(rd);

    // Prepare the tokenizer for Java-style
    // tokenizing rules
    st.wordChars('_', '_');

    // If whitespace is not to be discarded, make
    //  this call
    st.ordinaryChars(0, ' ');

    // These calls caused comments to be
    // discarded

    // Parse the file
    int token = st.nextToken();
    while (token != StreamTokenizer.TT_EOF) {
        token = st.nextToken();
        switch (token) {
        case StreamTokenizer.TT_NUMBER:
            // A number was found; the value is
            // in nval
            double num = st.nval;
        case StreamTokenizer.TT_WORD:
            // A word was found; the value is in
            // sval
            String word = st.sval;
        case '"':
            // A double-quoted string was found;
            // sval contains the contents
            String dquoteVal = st.sval;
        case ''':
            // A single-quoted string was found;
            // sval contains the contents
            String squoteVal = st.sval;
        case StreamTokenizer.TT_EOL:
            // End of line character found
        case StreamTokenizer.TT_EOF:
            // End of file has been reached
            // A regular character was found; the
            // value is the token itself
            char ch = (char)st.ttype;
} catch (IOException e) {