com.mindprod.csv
Class CSVReader

java.lang.Object
  extended by com.mindprod.csv.CSVReader

public final class CSVReader
extends java.lang.Object

Read CSV (Comma Separated Value) files.

This format is used my Microsoft Word and Excel. Fields are separated by commas, and enclosed in quotes if they contain commas or quotes. Embedded quotes are doubled. Embedded spaces do not normally require surrounding quotes. The last field on the line is not followed by a comma. Null fields are represented by two commas in a row. We optionally trim leading and trailing spaces on fields, even inside quotes. File must normally end with a single CrLf, other wise you will get a null when trying to read a field on older JVMs.

Must be combined with your own code.

There is another CSVReader at: at http://ostermiller.org/utils/ExcelCSV.html If this CSVReader is not suitable for you, try that one.
There is one written in C# at http://www.csvreader.com/

Future ideas:

1. allow \ to be used for quoting characters.

Since:
2002-03-27
Version:
4.6 2011-02-25 new csvReader constructor parm trimUnquoted. Use Intellij to fill it in with true every place full constructor used.
Author:
Roedy Green, Canadian Mind Products
See Also:
TestCSVReader

Constructor Summary
CSVReader(java.io.Reader r)
          Simplified convenience constructor to read a CSV file , default to comma separator, " for quote, no multiline fields, with trimming.
CSVReader(java.io.Reader r, char separatorChar, char quoteChar, java.lang.String commentChars, boolean hideComments, boolean trimQuoted, boolean trimUnquoted, boolean allowMultiLineFields)
          Detailed constructor to read a CSV file
 
Method Summary
(package private)  void buildLookup()
          build table to for quick lookup of char category.
(package private)  CSVCharCategory categorise(char c)
          categorise a character for the finite state machine.
 void close()
          Close the Reader.
 java.lang.String get()
          Read one field from the CSV file.
 java.lang.String[] getAllFieldsInLine()
          Get all fields in the line.
 boolean getBoolean()
          Read one boolean field from the CSV file, e.g.
 double getDouble()
          Read one double field from the CSV file.
 float getFloat()
          Read one float field from the CSV file.
 int getHexInt()
          Read one hex-encoded integer field from the CSV file
 long getHexLong()
          Read one hex-encoded long field from the CSV file
 int getInt()
          Read one integer field from the CSV file
 int getLineCount()
          How many lines have been processed so far.
 long getLong()
          Read one long field from the CSV file
 java.lang.String getYYYYMMDD()
          Read one Date field from the CSV file, in ISO format yyyy-mm-dd
 void skip(int fields)
          Skip over fields you don't want to process.
 void skipToNextLine()
          Skip over remaining fields on this line you don't want to process.
 boolean wasComment()
          Was the last field returned via get a comment (including a label comment)? Also works after getAllFieldsInLine to tell if there was a comment at the end of that line.
 boolean wasLabelComment()
          Was the last field returned via get a label ## comment? Also works after getAllFieldsInLine to tell if there was a comment at the end of that line.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CSVReader

public CSVReader(java.io.Reader r)
Simplified convenience constructor to read a CSV file , default to comma separator, " for quote, no multiline fields, with trimming.

Parameters:
r - input Reader source of CSV Fields to read. Should be buffered.

CSVReader

public CSVReader(java.io.Reader r,
                 char separatorChar,
                 char quoteChar,
                 java.lang.String commentChars,
                 boolean hideComments,
                 boolean trimQuoted,
                 boolean trimUnquoted,
                 boolean allowMultiLineFields)
Detailed constructor to read a CSV file

Parameters:
r - input Reader source of CSV Fields to read. Should be a BufferedReader.
separatorChar - field separator character, usually ',' in North America, ';' in Europe and sometimes '\t' for tab. Note this is a 'char' not a "string".
quoteChar - char to use to enclose fields containing a separator, usually '\"' . Use (char)0 if you don't want a quote character, or any other char that will not appear in the file. Note this is a 'char' not a "string".
commentChars - characters that mark the start of a comment, usually "#", but can be multiple chars. Note this is a "string" not a 'char'.
hideComments - true if clients sees none of the comments. false if client processes the comments.
trimQuoted - true if quoted fields are trimmed of lead/trail blanks. Usually true.
trimUnquoted - true if unquoted fields are trimmed of lead/trail blanks. Usually true.
allowMultiLineFields - true if reader should allow quoted fields to span more than one line. Microsoft Excel
Method Detail

close

public void close()
           throws java.io.IOException
Close the Reader.

Throws:
java.io.IOException - if problems closing

get

public java.lang.String get()
                     throws java.io.IOException
Read one field from the CSV file. You can also use methods like getInt and getDouble to parse the String for you. You can use getAllFieldsInLine to read the entire line including the EOL.

Returns:
String value, even if the field is numeric. Surrounded and embedded double quotes are stripped. possibly "". null means end of line. Normally you use skiptoNextLine to start the next line rather then using get to read the eol. Might also be a comment, with lead # stripped. If field was a comment, it is returned with lead # stripped. Check wasComment to see if it was a comment or a data field.
Throws:
java.io.EOFException - at end of file after all the fields have been read.
java.io.IOException - Some problem reading the file, possibly malformed data.

getAllFieldsInLine

public java.lang.String[] getAllFieldsInLine()
                                      throws java.io.IOException
Get all fields in the line. This reads only one line, not the whole file. Skips to next line as a side effect, so don't need skipToNextLine. Can find out if last field was a comment with wasComment();

Returns:
Array of strings, one for each field. Possibly empty, but never null.
Throws:
java.io.EOFException - if run off the end of the file.
java.io.IOException - if some problem reading the file.

getBoolean

public boolean getBoolean()
                   throws java.io.IOException
Read one boolean field from the CSV file, e.g. (true, yes, 1, +) or (false, no, 0, -).

Returns:
boolean, empty field returns false, as does end of line.
Throws:
java.io.EOFException - at end of file after all the fields have been read.
java.io.IOException - Some problem reading the file, possibly malformed data.
NumberFormatException, - if field does not contain a well-formed int.

getDouble

public double getDouble()
                 throws java.io.IOException,
                        java.lang.NumberFormatException
Read one double field from the CSV file.

Returns:
double value, empty field returns 0, as does end of line.
Throws:
java.io.EOFException - at end of file after all the fields have been read.
java.io.IOException - Some problem reading the file, possibly malformed data.
NumberFormatException, - if field does not contain a well-formed double.
java.lang.NumberFormatException

getFloat

public float getFloat()
               throws java.io.IOException,
                      java.lang.NumberFormatException
Read one float field from the CSV file.

Returns:
float value, empty field returns 0, as does end of line.
Throws:
java.io.EOFException - at end of file after all the fields have been read.
java.io.IOException - Some problem reading the file, possibly malformed data.
NumberFormatException, - if field does not contain a well-formed float.
java.lang.NumberFormatException

getHexInt

public int getHexInt()
              throws java.io.IOException,
                     java.lang.NumberFormatException
Read one hex-encoded integer field from the CSV file

Returns:
int value, empty field returns 0, as does end of line.
Throws:
java.io.EOFException - at end of file after all the fields have been read.
java.io.IOException - Some problem reading the file, possibly malformed data.
NumberFormatException, - if field does not contain a well-formed int.
java.lang.NumberFormatException

getHexLong

public long getHexLong()
                throws java.io.IOException,
                       java.lang.NumberFormatException
Read one hex-encoded long field from the CSV file

Returns:
long value, empty field returns 0, as does end of line.
Throws:
java.io.EOFException - at end of file after all the fields have been read.
java.io.IOException - Some problem reading the file, possibly malformed data.
NumberFormatException, - if field does not contain a well-formed long.
java.lang.NumberFormatException

getInt

public int getInt()
           throws java.io.IOException,
                  java.lang.NumberFormatException
Read one integer field from the CSV file

Returns:
int value, empty field returns 0, as does end of line.
Throws:
java.io.EOFException - at end of file after all the fields have been read.
java.io.IOException - Some problem reading the file, possibly malformed data.
NumberFormatException, - if field does not contain a well-formed int.
java.lang.NumberFormatException

getLineCount

public int getLineCount()
How many lines have been processed so far.

Returns:
count of how many lines have been read.

getLong

public long getLong()
             throws java.io.IOException,
                    java.lang.NumberFormatException
Read one long field from the CSV file

Returns:
long value, empty field returns 0, as does end of line.
Throws:
java.io.EOFException - at end of file after all the fields have been read.
java.io.IOException - Some problem reading the file, possibly malformed data.
NumberFormatException, - if field does not contain a well-formed long.
java.lang.NumberFormatException

getYYYYMMDD

public java.lang.String getYYYYMMDD()
                             throws java.io.IOException
Read one Date field from the CSV file, in ISO format yyyy-mm-dd

Returns:
yyyy-mm-dd date string, empty field returns "", end of line. returns null.
Throws:
java.io.EOFException - at end of file after all the fields have been read.
java.io.IOException - Some problem reading the file, possibly malformed data.
NumberFormatException, - if field does not contain a well-formed date.

skip

public void skip(int fields)
          throws java.io.IOException
Skip over fields you don't want to process.

Parameters:
fields - How many field you want to bypass reading. The newline counts as one field.
Throws:
java.io.EOFException - at end of file after all the fields have been read.
java.io.IOException - Some problem reading the file, possibly malformed data.

skipToNextLine

public void skipToNextLine()
                    throws java.io.IOException
Skip over remaining fields on this line you don't want to process.

Throws:
java.io.EOFException - at end of file after all the fields have been read.
java.io.IOException - Some problem reading the file, possibly malformed data.

wasComment

public boolean wasComment()
Was the last field returned via get a comment (including a label comment)? Also works after getAllFieldsInLine to tell if there was a comment at the end of that line.

Returns:
true if last field returned via get was a comment.

wasLabelComment

public boolean wasLabelComment()
Was the last field returned via get a label ## comment? Also works after getAllFieldsInLine to tell if there was a comment at the end of that line.

Returns:
true if last field returned via get was a ## label comment.

buildLookup

void buildLookup()
build table to for quick lookup of char category.


categorise

CSVCharCategory categorise(char c)
categorise a character for the finite state machine.

Parameters:
c - the character to categorise
Returns:
integer representing the character's category.