Node JSProgramming

Importing CSV Data Into MySQL

Reading a CSV file and importing it into a MySQL Database With NodeJS

We’re going to be looking at how to read a CSV file and how to dump that data into a MySQL database.

Setting Everything Up

Lets go ahead and make sure we have these dependencies installed and imported

const fs = require('fs');
const mysql = require('mysql');
const csv = require('fast-csv');

Loading The CSV File

Next lets go ahead and create a read the data into an array

let stream = fs.createReadStream("my_file.csv");

Now we have our stream, lets use the CSV module and parse it. Note we listen for two events (data, end). The data event is when a new line of the CSV has been parsed, and the end event is once the file has been completely read.

let myArray = [];
let csvStream = csv
	.parse()
	.on("data", (data) => {
		myArray.push(data);
	})
	.on("end", () => {
	});
stream.pipe(csvStream);

For those of you who have already looked at a CSV file in a text editor, you may already be aware that the first line of the file is the column headers. We want to get rid of those, so lets just shift the array. This removes the first element of the array.

.on("end", function () {
    myArray.shift();
}

Inserting Multiple Records

I have already covered how to insert multiple records easily in my post Insert Multiple Records Into MySQL. It may be worth you reading that before getting started on inserting the data into a database..

On the event event, we’re going to just open up a new database connection, and bulk insert the records into the database using the MySQL’s native bulk insert functionality.

.on("end", function () {
	// remove the first element in the array
	myArray.shift();
	// create a new connection to the database
	const connection = mysql.createConnection({
		host: 'localhost',
		user: 'username',
		password: 'super_secure_password',
		database: 'database_name'
	});
	// open the connection
	connection.connect((error) => {
		if (error) {
			console.error(error);
		} else {
			let query: string = 'INSERT INTO table (field1, field2, field3, field4) VALUES ?';
    		connection.query(query, [myArray], (error, response) => {
				if(error) {
					console.log(error);
				} else {
					console.log(response); // x records inserted
    		});
		}
	});
}

Known Bugs

Packet Too Large

There’s a few issues I’ve had when implementing this. As discussed in the Insert Multiple Records Into MySQL, having a large CSV file can lead to the database throwing an error due to a large packet file exceeding the ‘maximum_allowed_packet’ variable. You can see an easy fix below for that.

ER_NET_PACKET_TOO_LARGE

I was trying to bulk insert around 180k records from a CSV file, and I kept getting an error. When I traced the error back to MySQL output in the console, I could see that my MySQL server was running with a fairly small ‘maximum_allowed_packet’.

In order to check what packet size you have set currently, just run the following command, and the one underneath it to update the variable.

show variables like 'max_allowed_packet';
set global max_allowed_packet=33554432;

Incorrect DateTime For MySQL

Something else I had to deal with, is the date field in the CSV did not match the DateTime field in the MySQL database. In order to solve this, I just looped through the array and replaced correct index of the element with a converted value. Something along the lines of this:

.on("end", function () {
	myArray.shift();
	for (let i = 0, len = myArray.length; i < len; i++) {
		myArray[i][3] = convertToMysql(myArray[i][3]);
	}
	// ... 

The ‘convertToMySQL on line 4 is a custom function I wrote. You will have to write your own as your input may differ from mine.

Side Note: If you don’t want to shift the array to remove the first index, you can pass an options object to the CSV Stream, see below

csv
    .fromStream(stream, {headers : true})

Errors In The CSV

This one is kind of situational, but you can listen to two events that may help you here. One is the ‘error’ event, and the other is the ‘data-invalid’ event. The error event gets triggered when an error occurs such as the file not found, or not being CSV etc. The data-invalid will be called when a row has invalid data on it.

.on('error', (error) {
	// error event handler
});
.on('data-invalid', (data) => {
	// data was invalid
});

Side Note: “Any rows consisting of nothing but empty strings and/or commas will be skipped, without emitting a ‘data’ or ‘error’ event.”

Whole Thing

If you want to double check your code after reading through; it might look something like below.

const fs = require('fs');
const mysql = require('mysql');
const csv = require('fast-csv');
let stream = fs.createReadStream("path_to_my_file.csv");
let myData = [];
let csvStream = csv
    .parse()
    .on("data", function (data) {
        myData.push(data);
    })
    .on("end", function () {
		myData.shift();
		// create a new connection to the database
		const connection = mysql.createConnection({
			host: 'localhost',
			user: 'username',
			password: 'super_secure_password',
			database: 'database_name'
		});
        // open the connection
		connection.connect((error) => {
			if (error) {
				console.error(error);
			} else {
				let query = 'INSERT INTO table (field1, field2, field3, fieldn) VALUES ?';
				connection.query(query, [myData], (error, response) => {
					console.log(error || response);
				});
			}
		});
   	});
stream.pipe(csvStream);

And that just about does it for this. I hope you guys found it useful! If you have any issues or suggestions, please don’t hesitate to comment!

Related Articles

5 Comments

Leave a Reply

Back to top button