Parsing Data

Date: 2/11/2002

CSV Files

CSV stands for Comma-Separated Values. Each field for a record is separated by a comma. Usually, string values are enclosed in quotes, which allows a comma to be part of a field value. However, our application will not be sophisticated enough to make this distinction. Our string values will NOT be enclosed in quotes, and the ONLY commas will be those that separate the fields.

StringTokenizer

StringTokenizer takes a string and breaks it into parts, or "tokens," based on a string separator. The constructor takes a String argument:

StringTokenizer st = new StringTokenizer(s);

Think of it as creating a list of strings.

By default, the separator is a space. So, the string "Advanced Java" would be separated into two tokens: "Advanced" and "Java".

Since we are using CSV files, we want to specify the separator to be a comma:

StringTokenizer st = new StringTokenizer(s, ",");

The previous string would now be one token.

The String "Advanced Java,Advanced Web Programming" would be separated into two tokens.

Some useful methods:

nextToken():

Returns a String, which is the next token in the list. The list is one way only, from beginning to end (like a Sequential Access file).

hasMoreTokens()

Returns a boolean- true if there are more tokens in the list.

countTokens()

Returns the number of tokens in the list.

StringBuffer

Strings are created to hold exactly the number of characters stored. If you add to a String, the JVM creates a new String object to hold all the characters, then transfers them into the new object. This causes a bit of overhead.

StringBuffer sb = new StringBuffer();
StringBuffer sb = new StringBuffer(100);

With a StringBuffer, you can specify how much space to reserve for storing characters. This way, a new object is not created unless you exceed the size of the buffer.

Use the append() method to add Strings to the buffer:

sb.append("Some Text");

To convert the StringBuffer to a String, use the toString() method:

String s = sb.toString();