Read Csv File In Java

production quality code, not crap

That's right, another article on how to read/parse a csv in Java!

But there are millions on the web.. That means there is demand 😁

No further ado, let's jump into it.

Should you use a library

Because we are going to write production code, the answer is YES !! I would also use this code in a senior level interview, where they want to test your real life code code skills.

On the other hand in code interviews where the interviewer wants to test your problem solving skills, I would advise against and instead implement a parser yourself.

The reasons why I recommend to use a library is to handle:

  • quotes in fields

  • commas in fields

  • values distributed on multiple lines

  • irregular number of colums per line

I think not even the RFC 4180 covers all cases of csv types you can encounter in your work.

What library to choose

There are many out there specific for Java: OpenCsv, Apache Commons Csv, but my favourite is Jackson.

Most of you will know Jackson for their json and xml parsers, but guess what, csv is another very nice feature of theirs.

Java Code

First import the jackson dependency:

<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.dataformat/jackson-dataformat-csv -->
<dependency>
    <groupId>com.fasterxml.jackson.dataformat</groupId>
    <artifactId>jackson-dataformat-csv</artifactId>
    <version>2.16.0</version>
</dependency>

Then declare the CsvMapper object:

 private CsvMapper csvMapper = new CsvMapper();

It's thread safe, you can declare it once and reuse at your like.

Now you could just read the csv file line by line as List<String> but that is not production code.

What if in future there are going to be more or less columns in the csv? what if the column names change? You want to spot these situations as early as possible and fail fast or react, and a untyped list is not helpful.

Instead we create a model to represent the lines we are reading.

The Model

I can't enphatize enough how important it is to use proper classes instead of Map<String, String> or Set<String>. This is Java, not python or javascript!! (I have nothing against those btw)

In this example the csv we are reading is about dinosaurs:

name,length,top weight,comment
Supersaurus,138,40,"lived about 153 million years ago"
Maraapunisaurus,99,132,"may be one of the world’s largest dinosaurs"
Argentinosaurus,98,110,"people believe them to be 98 feet in length"
Patagotitan,121,85,"lived during the Late Cretaceous period"

you wish you could work on that πŸ˜‰

But we are not interested in all fields, in particular we don't care about comment. (Just for the sake of showing how to ignore fields)

Also some fields have spaces in the name.

How to represent it in java? here is our pojo..

import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.Data;

@Data
public class Dinosaur {

    private String name;

    private int length;

    @JsonProperty("top weight")
    private int topWeight;
}

we are using Lombok to save typing getters and setters, but is up to you.

Did you notice topWeight? As in the csv is made of two words, but in Java we can't have a two word variable, we use @JsonProperty to map it to the top weight header in the csv.

Parsing the Csv

Now we want to open the file, read the headers from the first line, and the value from all the other lines.

And of course we want to write less code as possible. Because we are lazy? no.. because less code = less bugs 😜

Here is how to parse the csv..

    CsvSchema headerSchema = CsvSchema.emptySchema().withHeader(); // 1
    MappingIterator<Dinosaur> it = csvMapper
            .readerFor(Dinosaur.class)
            .with(headerSchema)
            .with(CsvParser.Feature.WRAP_AS_ARRAY)
            .without(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES) // 2
            .readValues(new File("/path/to/the/file.csv")); // 3
    while (it.hasNextValue()) {
        Dinosaur dino = it.next();
        // do what you need with your dino
    }

The code is self explanatory, just a few comments:

  1. The CsvSchema is needed to map the values in the csv to our pojo, we could build a CsvSchema programmatically, but is quicker to read it from the file.

  2. remember we don't need all columns in the csv? that is very common in real life. We need to tell jackson not to fail when encountering a field that is not present in the model.

  3. Jackson can accept a variety of inputs: InputStream, File, Reader, String (csv content)

And that's it!

Let me know if you have a favourite csv library and why!

ps. is your use case different? have a look at the official jackson csv documentation.

Β