2

I am trying to extract some information from one table and store it in another table using Sqlite and Python. Table 1 contains a list of websites in the form of (www.abc.com). I am trying to extract the (abc) part from each row and store it in Table 2 which also maintains a count for each site. If the site already exist in Table 2, then it just increment the count.

Here the code I have:

p = re.compile('^.+\.([a-zA-Z]+)\..+$')
for row in c.execute('SELECT links FROM table1'):
    link = p.match(row[0])

    if link.group(1):
        print(link.group(1))
        c.execute('SELECT EXISTS(SELECT 1 FROM table2 WHERE site_name = ?)', (link.group(1), ))

When I run the script, it will only execute once, then I get:

Traceback (most recent call last):
  File "test.py", line 43, in <module>
      link = p.match(row[0])
TypeError: expected string or buffer

If I comment out the c.execute line, all the site names are printed properly. I am new to Python and Sqlite, so I am not sure what the problem is.

Any help will be great, thanks in advance.

2
  • Why is that c.execute line there? You're iterating over a cursor, and then you tell that cursor to do a different query in the middle of the one you're iterating. What did you want to happen there?
    – abarnert
    Commented Sep 11, 2014 at 23:02
  • Don't know about SQLite stuff, but your regex requires at least 3 parts with 2 periods. You could be more flexible if it were ^(?:.+\.)?([a-zA-Z]+)\..+$
    – user557597
    Commented Sep 11, 2014 at 23:11

2 Answers 2

1

The problem is that you're iterating over a cursor whose rows contain a single string:

for row in c.execute('SELECT links FROM table1'):

… but then, in the middle of the iteration, you change it into a cursor whose rows consist of a single number:

    c.execute('SELECT EXISTS(SELECT 1 FROM table2 WHERE site_name = ?)', (link.group(1), ))

So, when you get the next row, it's going to be [1] instead of ['http://example.com'], so p.match(row[0]) is passing the number 1 to match, and it's complaining that 1 is not a string or buffer.


For future reference, it's really helpful to debug things by looking at the intermediate values. Whether you run in the debugger, or just add print(row) calls and the like to log what's going on, you'd know that it works the first time through the loop, but that it fails the second time, and that row looked like [1] when it failed. That would make it much easier for you to track down the problem (or allow you to ask a better question on SO, because obviously you still won't be able to find everything yourself.)


You could fix this in (at least) three ways, in increasing order of "goodness if appropriate":

  • Fetch all of the values from the first query, then loop over those, so your second query doesn't get in the way.
  • Use a separate cursor for each query, instead of reusing the same one.
  • Don't make the second query in the first place—it's a SELECT query and you aren't doing anything with the rows, so what good is it doing?
0

The inner execute is probably stepping on your cursor iterator state. Try creating a second cursor object for that query.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.