Tuesday 14 July 2015

Swapping Columns of CSV Row

Swapping column values of CSV rows efficiently is not a simple piece of code, especially when it comes to processing of millions of records. Processing on String values consumes more CPU cycles when compared to char arrays. When char array representation of a String is easily available, let us try to make use of it for the swap purpose.

This sample code is written for a requirement where I had to swap two column values:
6,255,00aecd,00ebcb00,0,0,28356,0,417,1
had to be converted to
6,255,00ebcb00,00aecd,0,0,28356,0,417,1
i.e. to swap columns 2 and 3, if columns are numbered from 0.

Simple approach towards this is to split the line with delimiter, swap values in the respective array and join with delimiter again. This would be an expensive operation when you have millions of lines to process.

Other way of handling this operation is to get char array of the String, locate the starting and ending points of columns to be swapped and do array operation to move the characters.

Approach 1 (Simple and Expensive)


 private static String swapColumns(final String value,
   final String separator, final int swapColumn1, final int swapColumn2) {
  String[] splitValues = value.split(separator);
  String temp = splitValues[swapColumn1];
  splitValues[swapColumn1] = splitValues[swapColumn2];
  splitValues[swapColumn2] = temp;

  StringBuilder stringBuilder = new StringBuilder();

  boolean itemAdded = false;

  for (String splitValue : splitValues) {
   if (itemAdded) {
    stringBuilder.append(separator);
   }
   stringBuilder.append(splitValue);
   itemAdded = true;
  }

  return stringBuilder.toString();
 }
This code is available for download here.

Approach 2 (char array)


 private static String swapColumns(final String value, final char separator,
   final int swapColumn1, final int swapColumn2) {
  char[] charValues = value.toCharArray();

  int swapColumn1StartIndex = swapColumn1 == 0 ? 0 : -1;
  int swapColumn1EndIndex = -1;
  int swapColumn2StartIndex = -1;
  int swapColumn2EndIndex = -1;

  int i = 0;
  int separatorOccurance = 0;
  for (; i < charValues.length; i++) {
   if (charValues[i] == separator) {
    separatorOccurance++;
    if (swapColumn1StartIndex > -1) {
     swapColumn1EndIndex = i;
     break;
    } else if (swapColumn1 == separatorOccurance) {
     swapColumn1StartIndex = i + 1;
    }
   }
  }

  swapColumn2StartIndex = swapColumn2 == separatorOccurance
    ? i + 1
    : swapColumn2StartIndex;

  for (i++; i < charValues.length; i++) {
   if (charValues[i] == separator || i + 1 == charValues.length) {
    separatorOccurance++;
    if (swapColumn2StartIndex > -1) {
     swapColumn2EndIndex = i
       + ((i + 1 == charValues.length) ? 1 : 0);
     break;
    } else if (swapColumn2 == separatorOccurance) {
     swapColumn2StartIndex = i + 1;
    }
   }
  }

  if (swapColumn1EndIndex - swapColumn1StartIndex > swapColumn2EndIndex
    - swapColumn2StartIndex
    || swapColumn1EndIndex - swapColumn1StartIndex == swapColumn2EndIndex
      - swapColumn2StartIndex) {
   char[] tempArray = new char[swapColumn1EndIndex
     - swapColumn1StartIndex];
   System.arraycopy(charValues, swapColumn1StartIndex, tempArray, 0,
     tempArray.length);
   System.arraycopy(charValues, swapColumn2StartIndex, charValues,
     swapColumn1StartIndex, swapColumn2EndIndex
       - swapColumn2StartIndex);
   System.arraycopy(charValues, swapColumn1EndIndex, charValues,
     swapColumn1EndIndex - swapColumn1EndIndex
       + swapColumn1StartIndex + swapColumn2EndIndex
       - swapColumn2StartIndex, swapColumn2StartIndex
       - swapColumn1EndIndex);
   System.arraycopy(tempArray, 0, charValues, swapColumn2StartIndex
     - swapColumn1EndIndex + swapColumn1StartIndex
     + swapColumn2EndIndex - swapColumn2StartIndex,
     tempArray.length);
  } else {
   char[] tempArray = new char[swapColumn2EndIndex
     - swapColumn2StartIndex];
   System.arraycopy(charValues, swapColumn2StartIndex, tempArray, 0,
     tempArray.length);
   System.arraycopy(charValues, swapColumn1StartIndex, charValues,
     swapColumn2StartIndex + swapColumn2EndIndex
       - swapColumn2StartIndex - swapColumn1EndIndex
       + swapColumn1StartIndex, swapColumn1EndIndex
       - swapColumn1StartIndex);
   System.arraycopy(charValues, swapColumn1EndIndex, charValues,
     swapColumn1EndIndex + swapColumn2EndIndex
       - swapColumn2StartIndex - swapColumn1EndIndex
       + swapColumn1StartIndex, swapColumn2StartIndex
       - swapColumn1EndIndex);
   System.arraycopy(tempArray, 0, charValues, swapColumn1StartIndex,
     tempArray.length);
  }

  return new String(charValues);
 }
Approach 2 has more advantage over Approach 1. Have performed a test and found that Approach 2 has completed task in 2.776421 ms where Approach 1 has taken 19.690034 ms to complete.

You can download the Java source files here.