3

I've got a .txt file that looks like this:

id        nm        lat        lon        countryCode
5555555  London    55.876456   99.546231   UK

I need to parse each field and add them to a SQLite database. So far I've managed to transfer into my db the id, name and countryCode columns, but I'm struggling to find a solution to parse the lat and lon of each record individually.

I tried with regex, but no luck. I also thought about making a parser to check if the last non-whitespace char is a letter, to determine that the string is lat and not lon, but have no idea how to implement it correctly. Can I solve it using regex or should I use a custom parser? if so, how?

3
  • I think I might do something like this: stackoverflow.com/questions/8113782/… Commented Dec 23, 2016 at 2:46
  • Why not just split the data rows by space since they're all in the same order column-wise? All you really need to do is go line by line and do id, nm, lat, lon, cc = line.split() Commented Dec 23, 2016 at 2:46
  • 1
    You can do that in 1 line using pandas. df = pandas.read_csv('file_path', sep='\t') And then insert the entire dataframe into your SQLite db. Commented Dec 23, 2016 at 2:50

3 Answers 3

5

You can do that with pandas like this:

import pandas as pd
import sqlite3

con = sqlite3.connect('path/new.db')
con.text_factory = str

df = pd.read_csv('file_path', sep='\t')
df.to_sql('table_01', con)

If there are bad lines and you can afford to skip them then use this:

df = pd.read_csv('file_path', sep='\t', error_bad_lines=False)

Read more.

3

Looking at the text file, it looks like it's always the same format for each line. As such, why not just split like this:

for line in lines:
    id, nm, lat, lon, code = line.split()
    # Insert into SQLite db

With split() you don't have to worry about how much whitespace there is between each token of the string.

1
  • id, nm, lat, lon, code = s is better for clarity and more pythonic. Commented Dec 23, 2016 at 2:49
1

using str.split

txt = '5555555  London    55.876456   99.546231   UK'
(id, nm, lat, lon, countryCode) = txt.split()

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.