1

I have a huge number of data stored in PDF files which I would like to convert into a SQL database. I can extract the tables from the PDF files with some online tools. I also know how to import this into MySQL. BUT:

The list contains users with names, birth dates and some other properties. A user may exist in other PDF files too. So when I'm about to convert the next file into Excel and import it to MySQL, I want to check if that user already exists in my table. And this should be done based on several properties - we may have the same user name, but with different date of birth, that can be a new record. But if all the selected properties match then that specific user would be a duplicate and shouldn't be imported.

I guess this is something I can do with a copy from temporary table but not sure what the selection should be. Let's say user name is stored in column A, date of birth in column B and city in column C. What would be the right script to verify these in the existing table and skip copy if all three match with an existing record?

Thanks!

2 Answers 2

0

1- Create a permanent table

Create table UploadData
( 
   id int not null AUTO_INCREMENT,
   name varchar(50),
   dob datetime,
   city varchar(30)
)

2- Import your data in Excel to your SQL DB. This is how you do it in Sql Server mentioned below, not sure about MySQL but might be something similar. You said you know how to do it already in your question, that's why I am not specifying each step for MySQL

Right-click to your DB, go to Tasks -> Import Data, From: Microsoft Excel, To: Your DB name, Select UploadData table, (check Edit Columns to make sure the columns are matching), finish uploading from Excel to your SQL DB.

3- Check if data exists in your main table, if not, add.

CREATE TEMPORARY TABLE #matchingData (id int, name varchar(50), dob datetime, city (varchar(30))

INSERT INTO #matchingData
select u.id, u.name, u.dob, u.city 
from main_table m
inner join UploadData u on u.name = m=name 
                       and u.dob = m.dob
                       and u.city = m.city

insert into main_table (name, dob, city)
select name, dob, city
from UploadData
where id not in (select id from #matchingData)

4- No need UploadData table anymore. So: DROP TABLE UploadData

0

Add primary key constraints to Column A, Column B and Column C

It will avoid duplicate rows but can have duplicate values under single column.

Note: There is a limit on maximum number of primary keys in a particular table.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.