Tuesday, July 15, 2008

Importing CSV to MS SQL -or- Did I Do This Right?

I was trying to find a quick and dirty method of exporting, and subsequently importing CSV data into MS SQL; programmatically.

I'm dealing with rather small amounts of data, so this method may not work for many, but it Works For Me (tm).

My source data comes from an AIX / uSQL box running Misys Tiger. Yes, I realize I could query that database directly, if I purchased the Transoft ODBC drivers for Server 2003, but at a couple grand it's much more cost effective to do it this way. The easiest method to retrieve my data is to use Misys Query, create a scheduled job, and export the resulting data to a network location for DTS (Data Transformation Services) to pick it up.

The data is a rather exhaustive list of Insurance Companies, and their related plans. As one could imagine, this data is ever changing, a result of new "product lines" being developed by the lumbering health engine we call commercial payors. In the past, we exported the reports to PDF, and used the built-in search functionality to find the payor information needed. Unfortunately, it still leads to a "hunt-and-peck" situation, depending on how the user forms their search query. I felt I could dramatically speed up this operation by using MS SQL and ASP.

There are two reports, one for Plans, and one for Companies. The reports contain relate-able information, so I can use this to query and join results from the two tables created for each.

Misys Query contains a Cognos Scheduler component that can be set to run reports at a pre-defined time. It's pretty easy to schedule a report: open said report, run it, and when it's done, select Report, Schedule. Options for running it once or recurring, where and how to output the results, and even a command file option, when it's done running. The documentation explains about using security options, but this is not necessary, since Misys Query handles storing the authentication information needed for Cognos Scheduler. The other thing that threw me was that there was no option to run it on demand; you MUST schedule it...so testing forces you to set a time in the near future to observe it. Don't forget to re-set the time to the one you wanted originally!

The next step was to create a DTS job to take this exported CSV information, and import it into my database. DTS is really a great tool for this and makes it quite simple, IMHO.

After creating my database and tables, I set up my DTS job:

My steps were...

  1. To create a data source for your CSV file, either click and drag the circled icon to the workspace, or double-click it...
  2. A new widow will appear. Fill in the New Connection Box with a meaningful title, and in the Data Source drop-down box, select Text File (Source). Next to the File Name box, browse to the location of your CSV file, and click OK.
  3. The Text File Properties window appears. The option Delimited should be selected, with File Type being ANSI, Row Delimeter is {CR}{LF} and the Text Qualifier of Double Quote {"}. Change these as they suit your environment. There is also an option to skip rows, and to inform the processor that the first row contains column names. This comes handy in later steps. Clicking Next reveals your data, as it will appear during the transformation process. If this is not correct, click the back button and change it. Otherwise, click finish, and OK.
  4. You should now have an icon on the workspace that represents what we just created. Next, lets create the database connection object. As before, either drag and drop the MS OLE DB Provider for SQL Server object to the workspace, or double-click it to have one added. The easiest method is to right-click the workspace, and select "Add Connection". The default is the one we need.
  5. Give the new connection name something meaningful, choose your environments server and authentication method, and your database. Click OK.
  6. Right-Click the workspace, select Add Task, Transform Data Task. This changes your icon to this:
  7. Click on your Text data source to select it as your data source. The cursor will then change to a "Select Destination....", so select your DB connection next. A line appears between your sources, indicating the flow of data. Double-click that Data Transform Task.
  8. Again, give this task a meaningful title. Click the Destination tab.
  9. Select the table you'd like this data pushed to. Click the Transformations Tab.
  10. This is where most of the work is performed. You can see from the screenshot, that my fields don't map up, because I created an indexed field called ID. Probably didn't need it, but as my memory tried to recall what I'd learned in DBA classes, it was wise to have one... For each link, highlight it, and click the Edit button above. The next window that appears lets you set the source and destination fields. Make sure they match what you'd like, and click OK to save the results. In my environment I need to clear out all fields before inputting the new ones, and delete the source CSV files when finished.
  11. Right-Click the workspace, Add Task, Execute SQL Task. Give it a meaningful name, and in my case, truncate table my_tableofdata; in the SQL Statement box was enough to get the job done.
  12. Now to ensure this is performed before all other tasks. Right-Click the Transform Data Task you created earlier (the arrow pointing from data source to data destination).
  13. Select workflow properties. Click New. From the Source Step drop-down, select your Execute SQL Task. Now you can see why I suggested using meaningful names for these different objects... ;). Leave the Precedence set to Success. This ensures that the import of data only occurs when the truncate finishes successfully. A white/green striped arrow appears, pointing from the SQL command to the Text Data source.
  14. The last step is to add an Execute Process task. Right-Click the workspace, select Add Task, and then Execute Process.
  15. Give it a meaningful name, and in my case, the command is %COMSPEC% (cmd.exe, or the Windows Environment Variable that points to the Command Interpreter), with the arguments /C del C:\Path\To\Mfile.CSV.
  16. The last step in this is to create a Workflow object that fires when my Transformation of data occurs successfully.
Keep in mind, that this was all done by a NON-DBA, and a noob. I only posted this for those of you that might find it helpful, and hope that others with more experience (most, compared to me) can point out some time-saving and helpful hints that I can learn from.

Tuesday, July 01, 2008

Code highlighting in blogs

I've looked around a bit for a utility for highlighting my code samples here on BlogSpot, and found a couple of useful tools.

The first is code2html, which has been around for some time, and turns your perl, python and other types into pretty, formatted html. It works well, is CGI, but has some limits to its' implementation.

Then I found a great swiss-army knife of a utility, called highlight.js. It will automatically detect code blocks, and highlight them for you. It does have issues with too much code, or intermixed code on one page, but does a nice job. If it fails to correctly identify your code, a sample export page is included with the package that allows you to manually format your code to html.

Happy highlighting!

NPI Search Redundancy

Most healthcare professionals know at this point that all providers of health care, require NPI (National Provider Identifier) numbers. Without one, it will become increasingly difficult for claims to be paid by commercial payers, and impossible to collect medicare and medicaid payments.

Since we are a coding/billing/collection/management agency, we have frequented the NPPES (National Plan & Provider Enumeration System) to lookup NPI information.

Unfortunately, there have been brief periods of downtime of the site, causing us to implement our own solution: a backup of the registry.

I have a daily cron job that downloads the NPI database, push it into MySQL, giving us an albeit slow, but accurate access to an off-line version of this data.

The shell script below performs the retrieval:

#!/bin/sh
WORKDIR="/srv/htdocs/npi"
LOG="$WORKDIR/npi.log"
MONTH=`date +%b`
LASTMONTH=`date +%b --date='1 month ago'`
YEAR=`date +%Y`
URL="http://nppesdata.cms.hhs.gov/"
CFILE="NPPES_Data_Dissemination_${MONTH}_${YEAR}.zip"
LFILE="NPPES_Data_Dissemination_${LASTMONTH}_${YEAR}.zip"

cd $WORKDIR

# Attempt to download the file. --progress=dot:mega is used to prevent
# WGET_OUTPUT from getting too long.
FILE=$(2>&1 wget --timestamping --progress=dot:mega \
           "${URL}${CFILE}")


if [ "$?" -eq "0" ]; then
     echo "Retrieved zip file for ${MONTH} on `date`." > $LOG
     BOOL="1"
else
     FILE=$(2>&1 wget --timestamping --progress=dot:mega \
           "${URL}${LFILE}")
     if [ "$?" -eq "0" ]; then
             echo "Retrieved zip file for ${MONTH} on `date`." > $LOG
     else
             echo "There was a problem retreiving the file." > $LOG
     fi
fi

if [ "$BOOL" -eq "1" ]; then
    #Find npidata in zip file
    OFILE=`unzip -l ${CFILE} | grep csv | grep -vi 'fileheader' | awk '{ print $4 }'`

    #Now unzip it
    unzip ${CFILE} ${OFILE}
else
    #Find npidata in zip file
    OFILE=`unzip -l ${LFILE} | grep csv | grep -vi 'fileheader' | awk '{ print $4 }'`
   
    #Now unzip it
    unzip ${LFILE} ${OFILE}
fi

#Rename file
mv ${OFILE} source.csv

#Import data
echo "Importing ${OFILE} data..." >> ${LOG}
mysql -uroot -pmypass < /srv/htdocs/npi/npi.sql echo "Imported ${OFILE}." >> ${LOG}

#Remove source data
rm -f /srv/htdocs/npi/source.csv

#Remove the zip file
rm -f /srv/htdocs/npi/*.zip

Here is the npi.sql file used to import the data into the database. The data imported is trimmed down, since we didn't need all of it.

USE npi;
TRUNCATE TABLE npidata;
LOAD DATA INFILE '/srv/htdocs/npi/source.csv' INTO TABLE npidata FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES
(npi,entity_type_code,replacement_npi,ein,legal_business_name,last_name,first_name,
middle_name,prefix,suffix,credential,other_organization_name,other_organization_name_type_code,
other_last_name,other_first_name,other_middle_name,other_prefix,other_suffix,other_credential,
other_last_name_type_code,first_line_business_mailing_address,second_line_business_mailing_address,
business_mailing_address_city_name,business_mailing_address_state_name,business_mailing_address_postal_code,
business_mailing_address_country_code,business_mailing_address_telephone_number,
business_mailing_address_fax_number,first_line_business_practice_location_address,
second_line_business_practice_location_address,business_practice_location_address_city_name,
business_practice_location_address_state_name,business_practice_location_address_postal_code,
business_practice_location_address_country_code,business_practice_location_address_telephone_number,
business_practice_location_address_fax_number,enumeration_date,last_update_date,
npi_deactivation_reason_code,npi_deactivation_date,npi_reactivation_date,gender,authorized_official_last_name,
authorized_official_first_name,authorized_official_middle_name,authorized_official_title_or_position,
authorized_official_telephone_number);

And here is a look at the MySQL table used for this

mysql> describe npidata;
+-----------------------------------------------------+--------------+------+-----+---------+-------+
| Field                                               | Type         | Null | Key | Default | Extra |
+-----------------------------------------------------+--------------+------+-----+---------+-------+
| npi                                                 | varchar(20)  |      | PRI |         |       |
| entity_type_code                                    | char(2)      | YES  |     | NULL    |       |
| replacement_npi                                     | varchar(20)  | YES  |     | NULL    |       |
| ein                                                 | varchar(20)  | YES  |     | NULL    |       |
| legal_business_name                                 | varchar(255) | YES  |     | NULL    |       |
| last_name                                           | varchar(255) | YES  |     | NULL    |       |
| first_name                                          | varchar(255) | YES  |     | NULL    |       |
| middle_name                                         | varchar(255) | YES  |     | NULL    |       |
| prefix                                              | varchar(255) | YES  |     | NULL    |       |
| suffix                                              | varchar(255) | YES  |     | NULL    |       |
| credential                                          | varchar(255) | YES  |     | NULL    |       |
| other_organization_name                             | varchar(255) | YES  |     | NULL    |       |
| other_organization_name_type_code                   | varchar(10)  | YES  |     | NULL    |       |
| other_last_name                                     | varchar(255) | YES  |     | NULL    |       |
| other_first_name                                    | varchar(255) | YES  |     | NULL    |       |
| other_middle_name                                   | varchar(255) | YES  |     | NULL    |       |
| other_prefix                                        | varchar(255) | YES  |     | NULL    |       |
| other_suffix                                        | varchar(255) | YES  |     | NULL    |       |
| other_credential                                    | varchar(255) | YES  |     | NULL    |       |
| other_last_name_type_code                           | varchar(10)  | YES  |     | NULL    |       |
| first_line_business_mailing_address                 | varchar(255) | YES  |     | NULL    |       |
| second_line_business_mailing_address                | varchar(255) | YES  |     | NULL    |       |
| business_mailing_address_city_name                  | varchar(255) | YES  |     | NULL    |       |
| business_mailing_address_state_name                 | varchar(5)   |      | PRI |         |       |
| business_mailing_address_postal_code                | varchar(20)  | YES  |     | NULL    |       |
| business_mailing_address_country_code               | varchar(5)   | YES  |     | NULL    |       |
| business_mailing_address_telephone_number           | varchar(20)  | YES  |     | NULL    |       |
| business_mailing_address_fax_number                 | varchar(20)  | YES  |     | NULL    |       |
| first_line_business_practice_location_address       | varchar(255) | YES  |     | NULL    |       |
| second_line_business_practice_location_address      | varchar(255) | YES  |     | NULL    |       |
| business_practice_location_address_city_name        | varchar(255) | YES  |     | NULL    |       |
| business_practice_location_address_state_name       | varchar(5)   | YES  |     | NULL    |       |
| business_practice_location_address_postal_code      | varchar(20)  | YES  |     | NULL    |       |
| business_practice_location_address_country_code     | varchar(5)   | YES  |     | NULL    |       |
| business_practice_location_address_telephone_number | varchar(20)  | YES  |     | NULL    |       |
| business_practice_location_address_fax_number       | varchar(20)  | YES  |     | NULL    |       |
| enumeration_date                                    | varchar(15)  | YES  |     | NULL    |       |
| last_update_date                                    | varchar(15)  | YES  |     | NULL    |       |
| npi_deactivation_reason_code                        | varchar(255) | YES  |     | NULL    |       |
| npi_deactivation_date                               | varchar(15)  | YES  |     | NULL    |       |
| npi_reactivation_date                               | varchar(15)  | YES  |     | NULL    |       |
| gender                                              | char(2)      | YES  |     | NULL    |       |
| authorized_official_last_name                       | varchar(255) | YES  |     | NULL    |       |
| authorized_official_first_name                      | varchar(255) | YES  |     | NULL    |       |
| authorized_official_middle_name                     | varchar(255) | YES  |     | NULL    |       |
| authorized_official_title_or_position               | varchar(255) | YES  |     | NULL    |       |
| authorized_official_telephone_number                | varchar(20)  | YES  |     | 0       |       |
+-----------------------------------------------------+--------------+------+-----+---------+-------+

I then use a very simple php page to serve query's of the database:

<?php
function formatPhoneNumber($strPhone) {

        if (strlen($strPhone) != 10) {
                $strPrefix = substr($strPhone, 0, 3);
                $strNumber = substr($strPhone, 6, 4);
                $strPhone = $strPrefix."-".$strNumber;
                return $strPhone;
        }

      $strArea = substr($strPhone, 0, 3);
      $strPrefix = substr($strPhone, 3, 3);
      $strNumber = substr($strPhone, 6, 4);

       $strPhone = "(".$strArea.") ".$strPrefix."-".$strNumber;

       return ($strPhone);
}


function formatZipCode($strZipCode) {

        if (strlen($strZipCode) !=9) {
                return $strZipCode;
        }

        $strFirst = substr($strZipCode, 0, 5);
        $strSecond = substr($strZipCode, 5, 4);
        $strZipCode = $strFirst."-".$strSecond;
        return $strZipCode;

}

$db = "npi";
$user = "npi";
$pass = "npi";
$host = "localhost";

$con = mysql_connect($host, $user, $pass) or die("Could not connect; " . mysql_error());
$db_selected = mysql_select_db($db);

$mode=$_GET["mode"];

if ($mode > '1') {
?>
      <table class="report"><tbody>
<?php
         $sql = "SELECT * FROM npidata WHERE npi = \"$mode\"";
        $result = mysql_query($sql,$con);

        //echo $sql . "\n"; //debug

        $row = mysql_fetch_row($result);

        //echo "<pre>";
        //print_r($row); //debug
        //echo "</pre>";

        echo "<tr><td><span class=label>NPI</span></td><td>".$row[0]."</td></tr>";
        echo "<tr><td><span class=label>Full Name</span></td><td>".$row[8]." ".$row[6]." ".$row[7]." ".$row[5]."</td></tr>";
        echo "<tr><td><span class=label>Mailing Address</span></td><td>".$row[19]."<br>".$row[20]."<br>".$row[21]."<br>".$row[22].", ".$row[23]." ".formatZipCode($row[24])."</td></tr>";
        echo "<tr><td><span class=label>Phone</span></td><td>".formatPhoneNumber($row[26])."</td></tr>";
        echo "<tr><td><span class=label>Fax</span></td><td>".formatPhoneNumber($row[27])."</td></tr>";
        echo "<tr><td><span class=label>Practice Address</span></td><td>".$row[28]."<br>".$row[29]."<br>".$row[30].", ",$row[31]." ".formatZipCode($row[32])."</td></tr>";
        echo "<tr><td><span class=label>Phone</span></td><td>".formatPhoneNumber($row[34])."</td></tr>";
        echo "<tr><td><span class=label>Fax</span></td><td>".formatPhoneNumber($row[35])."</td></tr>";
        echo "</tbody></table>";

        mysql_free_result($result);
        mysql_close($con);

} else if ($_SERVER['REQUEST_METHOD'] != 'POST') {
?>

<form name="form1" action="<?php echo $_SERVER['PHP_SELF'];?>" method="post" onsubmit="return validate_form(this)">
<table border="0" cellspacing="5">
<tr><td>Last Name</td><td><input type="text" size="30" id="lname" name="lname"></td></tr>
<tr><td>First Name</td><td><input type="text" size="30" id="fname" name="fname"></td></tr>
<tr><td>State</td><td>
<select name="state" size="1" id="state">
<option value=""><select state here></option>
<option value="AL">Alabama</option>
<option value="AK">Alaska</option>
<option value="AZ">Arizona</option>
<option value="AR">Arkansas</option>
<option value="CA">California</option>
<option value="CO">Colorado</option>
<option value="CT">Connecticut</option>
<option value="DE">Delaware</option>
<option value="DC">Dist of Columbia</option>
<option value="FL">Florida</option>
<option value="GA">Georgia</option>
<option value="HI">Hawaii</option>
<option value="ID">Idaho</option>
<option value="IL">Illinois</option>
<option value="IN">Indiana</option>
<option value="IA">Iowa</option>
<option value="KS">Kansas</option>
<option value="KY">Kentucky</option>
<option value="LA">Louisiana</option>
<option value="ME">Maine</option>
<option value="MD">Maryland</option>
<option value="MA">Massachusetts</option>
<option value="MI">Michigan</option>
<option value="MN">Minnesota</option>
<option value="MS">Mississippi</option>
<option value="MO">Missouri</option>
<option value="MT">Montana</option>
<option value="NE">Nebraska</option>
<option value="NV">Nevada</option>
<option value="NH">New Hampshire</option>
<option value="NJ">New Jersey</option>
<option value="NM">New Mexico</option>
<option value="NY">New York</option>
<option value="NC">North Carolina</option>
<option value="ND">North Dakota</option>
<option value="OH">Ohio</option>
<option value="OK">Oklahoma</option>
<option value="OR">Oregon</option>
<option value="PA">Pennsylvania</option>
<option value="RI">Rhode Island</option>
<option value="SC">South Carolina</option>
<option value="SD">South Dakota</option>
<option value="TN">Tennessee</option>
<option value="TX">Texas</option>
<option value="UT">Utah</option>
<option value="VT">Vermont</option>
<option value="VA">Virginia</option>
<option value="WA">Washington</option>
<option value="WV">West Virginia</option>
<option value="WI">Wisconsin</option>
<option value="WY">Wyoming</option>
</select>

</td></tr>
<tr><td> </td><td><input value="Search" type="submit"></td></tr>
</tbody></table>


       <a href="javascript:history.go(-1)">Cancel</a>
<?php
} else if ($_SERVER['REQUEST_METHOD'] = 'POST') {

$fname = $_POST["fname"];
$fname = strtoupper($fname);

$lname = $_POST["lname"];
$lname = strtoupper($lname);

$state = $_POST["state"];
$state = strtoupper($state);

if ($fname == "") {
#RUN QUERY WITHOUT FNAME
        $sql = "SELECT * FROM npidata WHERE UPPER(last_name) = \"$lname\" AND UPPER(business_mailing_address_state_name) = \"$state\"";
} else {
        $sql = "SELECT * FROM npidata WHERE UPPER(last_name) = \"$lname\" AND UPPER(first_name) = \"$fname\" AND UPPER(business_mailing_address_state_name) = \"$state\"";
}

$result = mysql_query($sql,$con);

//echo $sql . "\n"; //debug
?>
<table class="report"><tbody>
<tr>
       <th>NPI</th>
       <th>First Name</th>
       <th>Last Name</th>
       <th>Address</th>
       <th>City</th>
       <th>State</th>
<form name="form1" method="post">
<?php

        while ($row = mysql_fetch_array($result)) {

                //echo "<pre>";
                //print_r(mysql_fetch_array($result)); //debug
                //echo "</pre>";

                echo "<tr><td><a href=\"" . $_SERVER['PHP_SELF'] . "?mode="  . $row[0] . "\">" . $row[0] . "</a></td>";
                echo "<td>" . $row[6] . "</td>";
                echo "<td>" . $row[5] . "</td>";
                echo "<td>" . $row[20] . "<br>" . $row[21] . "</td>";
                echo "<td>" . $row[22] . "</td>";
                echo "<td>" . $row[23] . "</td></tr>";

        } //end while

?>

       </form>
       </tbody>
       </table>
<?php
mysql_free_result($result);
mysql_close($con);

} // end if

?>