Python: convert variables into correct format for DataFrame

Question

I have 3 variables that I would like to use to build my dataset but since they are in a weird shape/format, I had no success so far. I'm quite new to this and really appreciate any help!!

The 3 variables I have are:

print(newspaper)

['Bolero']
['Schweizer Illustrierte Style']
['Bolero']

print(title)

['Schönheit und Tragik']
['magie pur']
['Das sind unsere Favoriten']

print(pubDate)

['2007-01-01']
['2007-01-01']
['2007-01-01']

It seems to like all variables are a list of lists, but I'm not quite sure. However, since the data is scrapped from a private website, I can't post the entire code here, but I hope this is already enough for you to access what the problem is with that variable format.

What I would like to have is a dataset of this format:

Newspaper	Title	PubDate
Bolero	Schönheit und Tragik	2007-01-01
Schweizer Illustrierte Style	magie pur	2007-01-01
Bolero.	Das sind unsere Favoriten	2007-01-01

Vikas Bhandary · Accepted Answer · 2021-02-05 18:37:07Z

0

First, you need to convert list of list into a list.

From link, you can convert a list of lists into a list by declaring the following function.

flatten = lambda t: [item for sublist in t for item in sublist]

Now all you need is to create dataframe using created lists.

data = {"Newspaper":flatten(newspaper), "Title": flatten(title), "PubDate": flatten(pubDate)}
pd.DataFrame.from_dict(data)

answered Feb 5, 2021 at 18:37

Vikas Bhandary

3522 silver badges10 bronze badges

$\begingroup$ I tried that, but then it separated every single letter instead of separating the 3 words. Any idea why? I get something like this: ['B', 'o', 'l', 'e', 'r', 'o'] ['S', 'c', 'h', 'w', 'e', 'i', 'z', 'e', 'r', ' ', 'I', 'l', 'l', 'u', 's', 't', 'r', 'i', 'e', 'r', 't', 'e', ' ', 'S', 't', 'y', 'l', 'e'] ['B', 'o', 'l', 'e', 'r', 'o'] $\endgroup$

newbieeeee_e
– newbieeeee_e

2021-02-05 19:29:05 +00:00
Commented Feb 5, 2021 at 19:29
$\begingroup$ Can you check the type of variable newspaper by executing type(newspaper)? if does not returns list then try forcibly converting it into a list by executing newspaper = list(newspaper) and execute the code again. Hope it helps. $\endgroup$

Vikas Bhandary
– Vikas Bhandary

2021-02-06 17:07:49 +00:00
Commented Feb 6, 2021 at 17:07
$\begingroup$ type(newspaper) returns this:<class 'list'> <class 'list'> <class 'list'>, so it seems to me like the variable consists of 3 lists.. I tried newspaper = list(newspaper) but the output remained the same..any other advice? $\endgroup$

newbieeeee_e
– newbieeeee_e

2021-02-07 01:08:27 +00:00
Commented Feb 7, 2021 at 1:08
$\begingroup$ Okay, that's weird! What is the output of print(newspaper[0]) and print(dir(newspaper))? I would suggest looking for documentation of the scraping package being used and check for equivalent function for the required format. $\endgroup$

Vikas Bhandary
– Vikas Bhandary

2021-02-07 17:29:43 +00:00
Commented Feb 7, 2021 at 17:29
$\begingroup$ I used np_elem = article_elem.find('div', class_='so_txt') to get the elements from the html file and newspaper = pd.Series(np_elem.text) to extract the text. The output of print(newspaper[0]) is Bolero Schweizer Illustrierte Style Bolero and the output of the other is something really long, starting with: ['add', 'class', 'contains', 'delattr', 'delitem', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'getitem', 'gt', 'hash', 'iadd', 'imul', 'init', 'init_subclass', 'iter', '__.Is this what you expected? $\endgroup$

newbieeeee_e
– newbieeeee_e

2021-02-07 19:48:01 +00:00
Commented Feb 7, 2021 at 19:48

| Show 2 more comments

Stack Exchange Network

Python: convert variables into correct format for DataFrame

1 Answer 1

Hot Network Questions

Python: convert variables into correct format for DataFrame

1 Answer 1

Related

Hot Network Questions