Database with arrows pointing into it from Xs

Ways of getting data from Excel files into a Database

18 January 2024

The lingua franca of the government/enterprise world for communicating data is the excel file. SQL doesn’t have the same level of adoption for non-technical people. This means that if you’re an SQL person (As I'm attempting to be), you’re bound to get asked to translate between these two forms of tabular storage - from excel to an SQL db and vice versa. Often a big part of my job is that I get sent an excel file from a non-technical person, then I need to make bulk updates in a database based on this data.

I find that my preferred method to do bulk updates is to smash the data into a table (you can use a temporary table, but I prefer using a regular table, then delete it after a few days) as opposed to writing a single script and running it as a one-off,. This allows me to make some assertions on what I expect the updates to look like (eg. how many updates I’m about to do) then I can execute when I’m confident about what the result will be.

Before I go on: I should note that the flavour of SQL I’ll be looking at here is that of Microsoft SQL Server. If you work in government or enterprise, this is likely the flavour you’ll be dealing with too.

So without further adieu, here are some ways to get data into a database from an excel file:

SSMS flat file import #

This method is quite convenient and requires the least typing as you get to use a graphical user interface to get the data into your database.

The first step is to save the excel sheet you want to import as a CSV file. Just the regular default save settings for saving as a CSV work for me.

Now open up SSIS, then navigate to your database. Right click your database, click into “Tasks” then “Import Flat File…”. Get the path to the import file, configure the table name (it will have to be a new table; this method doesn’t seem to allow appending to tables). Then in “Modify columns” tab you can configure the types for each of the columns.

Click through to “Summary”, then “Finish”, then refresh your database and you should be good to go.

Using the import/export wizard - Microsoft Access #

This is another way to insert from an excel into a database where you don’t need to convert your excel file to a CSV first. It also means you can do multiple excel sheets at once, and with this method you can append to a table instead of only being able to create a new table. There do seem to be issues with this method: I’ve found that you have to download some tools to get this going, and even then, it seems to periodically not work for reasons that aren’t entirely clear to me.

The first step you’ll need to take is to download the Microsoft Access Database Engine Redistributable. Also you need to get the 32 bit version - not the 64 bit version.

After you have Microsoft Access installed, we’re going to do something similar to the flat file import: open up SSIS, right click your database, click into “Tasks” then “Import data…”. If it’s your first time here, you should see the title “Welcome to SQL Server Import and Export wizard”.Click “Next” (or alt N if you’re too cool to use a mouse).

Choosing a data source

In the dropdown, click into “Microsoft Excel”, it should have an ancient Excel logo (one which immediately transported me back to my childhood in the 90s). Then add the path to your excel file and select the excel version you have on your computer. Click “Next” (or alt + N).

Choosing a destination

I choose “.Net Framework Data Provider for SqlServer” for a destination. Unfortunately, this doesn’t prefill any connection string to your database. That would make far too much sense. So if you don’t have the connection string ready here’s a quick way to do it:

An easy (ish) way to get SQL Server Connection string
First open up Windows Powershell. Let’s create a blank “.udl” file with this command:

     ni test.udl

Then open Windows Explorer in whatever folder you’re in:

    explorer .  

Right click you’re new test.udl file and open Properties. Click into “Connection”.

Keep powershell open, but go back to SSMS and click “connect”, then if you’ve already connected to your database, it should pre-fill with the server name. Copy this and paste it in “server name” in the test.udl properties. Then enter your credentials/Windows NT integration for how you’ll authenticate to the db. Now select your database from the dropdown (this should prefill if the authentication and server name are correct). Finally click “Apply” and close. Back in PowerShell, you can now print it out and you should see the connection string:

    cat test.udl

And you can use this command to copy the second line (the connection string itself):

    Get-Content test.udl -TotalCount 3 | Select-Object -Last 1 | clip;

Now you can delete the .udl file if you want to keep things clean:

    rm test.udl

Finally, for our import, you can paste the connection string you’ve generated into the “Connection string” value in the wizard on the “Destination” page where we’re importing data.

Specify Table Copy or Query #

In this page the default “Copy data from one or more tables or views” will work for us if don’t need to modify the data.

Select Source Tables and Views #

Now we get to choose sources and destinations. On the left you’ll see the list of sheets in the excel file, on the right, you’ll see the list of tables in your database that the wizard will create/append to. You can edit the names of the tables. It will also try and guess the types from the excel file, but you can configure types with the “Edit mappings” button. It will default to appending to a table if the table exists and creating table if it doesn’t. In “Edit mappings” you can configure it to it to delete the existing rows.

Save and Run Package #

Here you can configure whether you want to run it immediately, save it as a .DTXS Integration Services Package file, or both. We want to run immediately so hit that then watch it run. If you do save it there are a many ways to run the .DTXS file for later (I tried to use the dtexec utility but even that wanted the 64 bit version of the Microsoft Access database engine 😣).

BULK INSERT Command #

The SQL BULK INSERT command is similar to the flat file insert, except that this time you’ll have to define your table before you create it. For this method, you will have to save your excel file as a CSV first. This is nice if you don’t want to use a GUI tool and want to just use SQL:

    -- Create the table first
    CREATE TABLE Countries(Id bigint,Country varchar(100),Population bigint ,President varchar(100));
    -- Then you can bulk insert
    BULK INSERT Countries
    FROM 'C:\Path\To\File\Countries.csv'
    WITH (
      FIELDTERMINATOR = ',',
      ROWTERMINATOR = '\n',
      FIRSTROW = 2
    );

Pretty nice that you can do it without having to click through all the steps of the “Insert Data…” method.

Using Excel Formulas to generate an SQL script #

Now we’re starting to get into more manual methods. These methods rely on generating an SQL script. This means the data upload time might be slower, as SQL server will have to parse all the commands you give it. If you have a lot of data this might take a long time. Creating a script does give you a bit more control to edit data as you go.

Lets say we have an excel table that looks like this:

id country population leader
1 New Zealand 5000000 Jacinda Ardern
2 USA 300000000 Joe Biden
3 UK 30000000 Rishi Sunak

And we want to get this data into the database, we could first create our table:

    -- Create the table
    CREATE TABLE Countries(Id bigint,Country varchar(100),Population bigint ,President varchar(100));

Then use an excel formula to generate a string.

So we would start with the cell to the right of “Jacinda Ardern”, then type this formula:

    =CONCAT("INSERT INTO Countries (Id,Country,Population,Leader) VALUES (",A2,",'",B2,"',",C2,",'",D2,"');")

The string that this formula will generate should look like:

    INSERT INTO Countries (Id,Country,Population,Leader) VALUES (1,'New Zealand',5000000,'Jacinda Ardern');

And we can use the auto-formula function to pull down the formula to create each cell and make the script which will run through every line.

I’ve created a few of these in my time, and there are some gotchas to watch out for. It can be quite difficult to balance all the different quotes (the double quotes for the excel strings and the single quotes for the SQL strings). Also, if there are any single quotes in strings, this will cause issues (I would complain about Irish names here, but the Irish have been through enough). You can use the SUBSTITUTE function of excel to escape single quotes. So, if we were worried about our “leader” column having single quotes in the string, the formula would look like:

    =CONCAT("INSERT INTO Countries (Id,Country,Population,Leader) VALUES (",A2,",'",B2,"',",C2,",'",SUBSTITUTE(D2,"'","''"),"');")

Using awk to generate an SQL script #

I‘ve had a fascination for the old-school tool awk ever since listening to the Changelog interview with Brian Kernighan. Awk is a simple programming language for data processing which has been around since the 70s and gets included in bash scripting environments.

If you have linux or mac you will have awk ready to go by default. If you’re on windows, there are a myriad of ways to get a bash-like environment. I’d argue the simplest is git bash. If you’re using git, you might already have it installed on your machine.

A great way to use awk is to first save your file as a tab delimited text file. Let’s continue with the countries file above. First save that as countries.txt.

Then open git bash and navigate to the directory and type this command:

    awk -F'\t' $'{print}' countries.txt

This should print all the lines of the text file. An explanation:

Now if we were looking at our countries file we could use this script to generate the sql script we wanted:

    awk -F'\t' $'NR>1 {print "INSERT INTO countries (Id,Country,Population,Leader) VALUES ("$1",\'"$2"\',"$3",\'"$4"\')"}' countries.txt >> countries.sql

So, what is this new code doing?

Now, we’re going to have the same issue with this as we did with generating scripts in excel, that we don’t want single quotes to destroy our strings. We can use the sub function to prevent this from happening:

    awk -F'\t' $'NR>1 {sub(/\'/,"\'\'",$4); print "INSERT INTO countries (Id,Country,Population,Leader) VALUES ("$1",\'"$2"\',"$3",\'"$4"\');"}' countries.txt

Using node.js to generate an SQL script #

If you’re a normal person, you might want to use a more modern programming language than awk to generate a script. And we should never bet against JavaScript. We can use an npm package called [csv-parser](https://www.npmjs.com/package/csv-parser) to parse a CSV file and create an SQL script.

First, navigate to a folder, save your excel file as a .csv in it and use these commands to initialise npm and install csv-parser:

    npm init -y 
    npm install csv-parser

Now you can write a node script to create your SQL file. First, import fs and csv-parser

    const csv = require('csv-parser')
    const fs = require('fs')

Now you can declare an array to store the data in, pipe your file to the csv parser and loop trhough the array to print off a script:

    // script.js
    const data = [];
    fs.createReadStream('countries.csv')
      .pipe(csv())
      .on('data',(data)=>{results.push('data')})
      .on('end',()=>{
        let string = '';
        results.forEach(r=>{
          string += `INSERT INTO countries (Id,Country,Population,Leader) VALUES (${r['id']},'${r['country']}',${r['population']},'${r['leader']}');`
        })
    })

Then in git bash you can run your script and print it into an SQL file:

    node.exe script.js >> script.sql

Our JS script is less ugly compared to the awk script, but it’s still pretty ugly and took more set up. Still, Javascript is so ubiquitous that any developer would be able to do this.

Other methods #

Do you know quicker/better ways than these to get data from excel files into excel?

Get in touch on Bluesky or Mastodon and I’ll update this post with your method and give you a shout out.

Back to blog