Search This Blog

Friday, February 3, 2012

Credit Card Parsing Algorithm for Unix

I was asked to parse 85 million records of varchar(255) character fields to find credit card numbers in it and hide them to achieve PCI DSS complaince . This is what I did to simplify the stuff !

1) Get a dump of database table (or important columns) to your unix box.
2) Write a "ccengine" parsing script based on Lugn's algorithm.
3) Dump the rows with credit card numbers.
4) Hide the CC# with regexp .
5) Upload back to the database.

Here is a credit card engine script in SED to help developers parse through credit card numbers in Unix which is needed prior to implementing Luhn's algorithm .

Create a file "ccengine.sed" with following content


s/[./\-#$%^&@_*!\ ]4[0-9]\{3\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{3\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*/g
s/[./\-#$%^&@_*!\ ]5[1-5][0-9]\{2\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{3\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/[./\-#$%^&@_*!\ ]6011[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{3\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*/g
s/[./\-#$%^&@_*!\ ]4[0-9]\{3\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/[./\-#$%^&@_*!\ ]5[1-5][0-9]\{2\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/[./\-#$%^&@_*!\ ]6011[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/[./\-#$%^&@_*!\ ]3[47][0-9]\{13\}/\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*/g
s/[./\-#$%^&@_*!\ ]30[0-5][0-9][./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{2\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/[./\-#$%^&@_*!\ ]36[0-9]\{2\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{2\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/[./\-#$%^&@_*!\ ]38[0-9]\{2\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{2\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/[./\-#$%^&@_*!\ ]3[47][0-9]\{2\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}[./\-#$%^&@_*!\ ]*[0-9]\{4\}/\*\*\*\*\*\*\*\*\*\*\*\*\*/g
s/4[0-9]\{3\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{3\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*/g
s/5[1-5][0-9]\{2\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{3\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*/g
s/6011[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{3\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*/g
s/4[0-9]\{3\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/5[1-5][0-9]\{2\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/6011[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/3[47][0-9]\{13\}/\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*/g
s/4[0-9]\{3\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{3\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*/g
s/5[1-5][0-9]\{2\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{3\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*/g
s/6011[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{3\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*/g
s/4[0-9]\{3\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/5[1-5][0-9]\{2\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/6011[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/3[47][0-9]\{13\}/\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*/g
s/30[0-5][0-9][- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{2\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/36[0-9]\{2\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{2\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/38[0-9]\{2\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{2\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/[^0-9]4[0-9]\{3\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/[#- ]4[0-9]\{3\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*\*/g
s/3[47][0-9]{2\}[- ]*[0-9]\{4\}[- ]*[0-9]\{4\}[- ]*[0-9]\{3\}/\*\*\*\*-\*\*\*\*-\*\*\*\*-\*\*\*/g

Say CCENGINE=ccengine.sed
INFILE= is your initial load
OUTFILE = your output with CC numbers.

If you have the dump of database as a bcp or sqlload or just a file , just need to translate the file using following sed command

sed -f $CCENGINE $INFILE >$OUTFILE

bcp or sqlload back dependending on whether you are using Sybase or Oracle.Keep it Simple !