Tuesday 14 July 2015

Swapping Columns of CSV Row

Swapping column values of CSV rows efficiently is not a simple piece of code, especially when it comes to processing of millions of records. Processing on String values consumes more CPU cycles when compared to char arrays. When char array representation of a String is easily available, let us try to make use of it for the swap purpose.

This sample code is written for a requirement where I had to swap two column values:
6,255,00aecd,00ebcb00,0,0,28356,0,417,1
had to be converted to
6,255,00ebcb00,00aecd,0,0,28356,0,417,1
i.e. to swap columns 2 and 3, if columns are numbered from 0.

Simple approach towards this is to split the line with delimiter, swap values in the respective array and join with delimiter again. This would be an expensive operation when you have millions of lines to process.

Other way of handling this operation is to get char array of the String, locate the starting and ending points of columns to be swapped and do array operation to move the characters.

Approach 1 (Simple and Expensive)


 private static String swapColumns(final String value,
   final String separator, final int swapColumn1, final int swapColumn2) {
  String[] splitValues = value.split(separator);
  String temp = splitValues[swapColumn1];
  splitValues[swapColumn1] = splitValues[swapColumn2];
  splitValues[swapColumn2] = temp;

  StringBuilder stringBuilder = new StringBuilder();

  boolean itemAdded = false;

  for (String splitValue : splitValues) {
   if (itemAdded) {
    stringBuilder.append(separator);
   }
   stringBuilder.append(splitValue);
   itemAdded = true;
  }

  return stringBuilder.toString();
 }
This code is available for download here.

Approach 2 (char array)


 private static String swapColumns(final String value, final char separator,
   final int swapColumn1, final int swapColumn2) {
  char[] charValues = value.toCharArray();

  int swapColumn1StartIndex = swapColumn1 == 0 ? 0 : -1;
  int swapColumn1EndIndex = -1;
  int swapColumn2StartIndex = -1;
  int swapColumn2EndIndex = -1;

  int i = 0;
  int separatorOccurance = 0;
  for (; i < charValues.length; i++) {
   if (charValues[i] == separator) {
    separatorOccurance++;
    if (swapColumn1StartIndex > -1) {
     swapColumn1EndIndex = i;
     break;
    } else if (swapColumn1 == separatorOccurance) {
     swapColumn1StartIndex = i + 1;
    }
   }
  }

  swapColumn2StartIndex = swapColumn2 == separatorOccurance
    ? i + 1
    : swapColumn2StartIndex;

  for (i++; i < charValues.length; i++) {
   if (charValues[i] == separator || i + 1 == charValues.length) {
    separatorOccurance++;
    if (swapColumn2StartIndex > -1) {
     swapColumn2EndIndex = i
       + ((i + 1 == charValues.length) ? 1 : 0);
     break;
    } else if (swapColumn2 == separatorOccurance) {
     swapColumn2StartIndex = i + 1;
    }
   }
  }

  if (swapColumn1EndIndex - swapColumn1StartIndex > swapColumn2EndIndex
    - swapColumn2StartIndex
    || swapColumn1EndIndex - swapColumn1StartIndex == swapColumn2EndIndex
      - swapColumn2StartIndex) {
   char[] tempArray = new char[swapColumn1EndIndex
     - swapColumn1StartIndex];
   System.arraycopy(charValues, swapColumn1StartIndex, tempArray, 0,
     tempArray.length);
   System.arraycopy(charValues, swapColumn2StartIndex, charValues,
     swapColumn1StartIndex, swapColumn2EndIndex
       - swapColumn2StartIndex);
   System.arraycopy(charValues, swapColumn1EndIndex, charValues,
     swapColumn1EndIndex - swapColumn1EndIndex
       + swapColumn1StartIndex + swapColumn2EndIndex
       - swapColumn2StartIndex, swapColumn2StartIndex
       - swapColumn1EndIndex);
   System.arraycopy(tempArray, 0, charValues, swapColumn2StartIndex
     - swapColumn1EndIndex + swapColumn1StartIndex
     + swapColumn2EndIndex - swapColumn2StartIndex,
     tempArray.length);
  } else {
   char[] tempArray = new char[swapColumn2EndIndex
     - swapColumn2StartIndex];
   System.arraycopy(charValues, swapColumn2StartIndex, tempArray, 0,
     tempArray.length);
   System.arraycopy(charValues, swapColumn1StartIndex, charValues,
     swapColumn2StartIndex + swapColumn2EndIndex
       - swapColumn2StartIndex - swapColumn1EndIndex
       + swapColumn1StartIndex, swapColumn1EndIndex
       - swapColumn1StartIndex);
   System.arraycopy(charValues, swapColumn1EndIndex, charValues,
     swapColumn1EndIndex + swapColumn2EndIndex
       - swapColumn2StartIndex - swapColumn1EndIndex
       + swapColumn1StartIndex, swapColumn2StartIndex
       - swapColumn1EndIndex);
   System.arraycopy(tempArray, 0, charValues, swapColumn1StartIndex,
     tempArray.length);
  }

  return new String(charValues);
 }
Approach 2 has more advantage over Approach 1. Have performed a test and found that Approach 2 has completed task in 2.776421 ms where Approach 1 has taken 19.690034 ms to complete.

You can download the Java source files here.

4 comments:

  1. In the work with the stoksabmitter, I constantly use CSV files, and I'm already accustomed to this "strange" sort of separators. Thanks for the code, this is better for me. I have several ideas on how to use massive, but I need to call to mind Java, and I have already forgotten a
    bit about the details, but here there is an excellent textbook https://explainjava.com/split-string-java/, after a repetition, I'll return to this issue.

    ReplyDelete
  2. Hey There. I found your weblog the usage of msn. That is a very smartly written article. I'll be sure to bookmark it and come back to learn more of your helpful info. Thank you for the post. I'll certainly return. facebook log in

    ReplyDelete
  3. For a brand new mortgage, the amortization period is usually 25 years. mortgage calculator Quite often, borrowers are focused on the question presented by their lender: Would you like a flexible rate or a set rate mortgage. canada mortgage calculator

    ReplyDelete
  4. Your article is very informative and has a lot of information. I really like your effort, keep posting.
    custom logo design services

    ReplyDelete