Friday, August 8, 2014

Fixed Length File Without Delimiter and BeanIO



What happened is I have this IBM Mainframe's CobolCopyBook and I need to read a number of fields from there.

For your information the CobolCopyBook is a FixedLength file which has NO Delimiter.

My problem is that in my environment I have to stick to the BeanIO framework for the parsing. That means I cannot use the normal BufferedReader + substring way which is considered NOT SO ELEGANT way to solve this problem.

Also this is the first time I am using BeanIO. I found that the existing FixedLength utilities in BeanIO, such as the FixedLengthReader expects each line has a Carriage Return (CR or \r) or Line Feed (LF or \n). But it also allows the user to specify a "delimiter" to determine when to assume a Line.

Unfortunately in my use case, as I mentioned earlier , my file does not have any delimiter. Basically it is one string of characters. Worst still is one of the field always has the CR or LF character in it! What will happen is the BeanIO's reader will consider whatever after the CR or LF character is a brand new Line but the fact it is NOT!

To summarize, I have 2 problems here.
1) No delimiter to determine a new line
2) One of the field contains Carriage Return and Line Feed 

I was cracking head for these two problems and my colleague came out with some brilliant ideas. They suggested me to use OuterBean which contains the Target Bean as a Segment. Segment as one annotation is another BeanIO feature that does all the grouping or nested bean stuff. This approach will solve the "No Delimiter to break line" issue.

Another colleague suggested to create a custom java.io.Reader that extends from java.io.Reader and implements my own "read()" method. Because in the end BeanIO is going to invoke the "read()" method from the Reader object that is passed into its internal classes. So inside this read() method I will check if the character is either a Carriage Return or a Line Feed then I will return a space means I am actually replacing \n and \r with space. This will solve my second problem where the Carriage Return or Line Feed will cause BeanReader to take the characters after it as a new line.

So now I managed to fix these two problems with the OuterBean approach and Customized java.io.Readar approach. However this is not so complete because I don't like the replacing carriage return and line feed character with space.

public class CustomerOuterUserBean {

    @Segment(name="customerUserBeans", collection=ArrayList.class, minOccurs=0, maxOccurs=-1, type=CustomerUserBean.class )

    private List customerUserBeans = new ArrayList();

public class NpiFilterReader extends FilterReader {

    public NpiFilterReader(Reader in) {
        super(in);
    }

    @Override
    public int read() throws IOException {
        int read = super.read();
        if (read == '\r' || read == '\n')
            return ' ';

        return read;
    }


So I continued the journey and in the end I found a very very simple solution for these two issues.

What I did is to create a custom RecordParserFactory which manipulates a custom RecordReader and I overwrite the read() method for the custom RecordReadear. The read() method in the custom RecordReader  looks like follows:

public class MyFixedLengthRecordReader extends FixedLengthReader

    public String read() throws IOException, RecordIOException {

        char[] buffer = new char[300];

        if(in.read(buffer, 0, 300) != -1){

            return new String(buffer);

        }      

        return null;

    }


Note that in my use case my line is fixed at 300 characters per line.

This is the most elegant yet simple solution! Thank Divine!






No comments:

Post a Comment