or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+').
In [2]: df
Out[2]:
A B
0 1 2
1 1 3
2 4 6
Test it yourself to make sure it works and reproduces the issue.
You can format the text for Stack Overflow by highlighting and using Ctrl+K (or prepend four spaces to each line), or place three backticks (```) above and below your code with your code unindented.
I really do mean small. The vast majority of example DataFrames could be fewer than 6 rows and 6 columns,[citation needed] and I bet I can do it in 5x3. Can you reproduce the error with df = df[relevant_columns].head()? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.
But every rule has an exception, the obvious one being for performance issues (in which case definitely use %timeit and possibly %prun to profile your code), where you should generate:
For getting runnable code, df.to_dict is often useful, with the different orient options for different cases. In the example above, I could have grabbed the columns and values from df.to_dict('split').
Write out the outcome you desire (similarly to above)
In [3]: iwantthis
Out[3]:
A B
0 1 5
1 4 6
Explain where the numbers come from:
The 5 is the sum of the B column for the rows where A is 1.
Do show the code you've tried:
In [4]: df.groupby('A').sum()
Out[4]:
B
A
1 5
4 6
But say what's incorrect:
The A column is in the index rather than a column.
Aside: the answer here is to use df.groupby('A', as_index=False).sum().
If it's relevant that you have Timestamp columns, e.g. you're resampling or something, then be explicit and apply pd.to_datetime to them for good measure.
df['date'] = pd.to_datetime(df['date']) # this column ought to be date.
Sometimes this is the issue itself: they were strings.
Do include a small example DataFrame, either as runnable code:
or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+').
In [2]: df
Out[2]:
A B
0 1 2
1 1 3
2 4 6
Test it yourself to make sure it works and reproduces the issue.
You can format the text for Stack Overflow by highlighting and using Ctrl+K (or prepend four spaces to each line), or place three backticks (```) above and below your code with your code unindented.
I really do mean small. The vast majority of example DataFrames could be fewer than 6 rows and 6 columns,[citation needed] and I bet I can do it in 5x3. Can you reproduce the error with df = df[relevant_columns].head()? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.
But every rule has an exception, the obvious one being for performance issues (in which case definitely use %timeit and possibly %prun to profile your code), where you should generate:
df = pd.DataFrame(np.random.randn(100000000, 10))
Consider using np.random.seed so we have the exact same frame. Having said that, "make this code fast for me" is not strictly on topic for the site.
For getting runnable code, df.to_dict is often useful, with the different orient options for different cases. In the example above, I could have grabbed the columns and values from df.to_dict('split').
Write out the outcome you desire (similarly to above)
In [3]: iwantthis
Out[3]:
A B
0 1 5
1 4 6
Explain where the numbers come from:
The 5 is the sum of the B column for the rows where A is 1.
Do show the code you've tried:
In [4]: df.groupby('A').sum()
Out[4]:
B
A
1 5
4 6
But say what's incorrect:
The A column is in the index rather than a column.
Aside: the answer here is to use df.groupby('A', as_index=False).sum().
If it's relevant that you have Timestamp columns, e.g. you're resampling or something, then be explicit and apply pd.to_datetime to them for good measure.
df['date'] = pd.to_datetime(df['date']) # this column ought to be date.
Sometimes this is the issue itself: they were strings.
Do include a small example DataFrame, either as runnable code:
or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+').
In [2]: df
Out[2]:
A B
0 1 2
1 1 3
2 4 6
Test it yourself to make sure it works and reproduces the issue.
You can format the text for Stack Overflow by highlighting and using Ctrl+K (or prepend four spaces to each line), or place three backticks (```) above and below your code with your code unindented.
I really do mean small. The vast majority of example DataFrames could be fewer than 6 rows and 6 columns,[citation needed] and I bet I can do it in 5x3. Can you reproduce the error with df = df[relevant_columns].head()? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.
But every rule has an exception, the obvious one being for performance issues (in which case definitely use %timeit and possibly %prun to profile your code), where you should generate:
For getting runnable code, df.to_dict is often useful, with the different orient options for different cases. In the example above, I could have grabbed the columns and values from df.to_dict('split').
Write out the outcome you desire (similarly to above)
In [3]: iwantthis
Out[3]:
A B
0 1 5
1 4 6
Explain where the numbers come from:
The 5 is the sum of the B column for the rows where A is 1.
Do show the code you've tried:
In [4]: df.groupby('A').sum()
Out[4]:
B
A
1 5
4 6
But say what's incorrect:
The A column is in the index rather than a column.
Aside: the answer here is to use df.groupby('A', as_index=False).sum().
If it's relevant that you have Timestamp columns, e.g. you're resampling or something, then be explicit and apply pd.to_datetime to them for good measure.
df['date'] = pd.to_datetime(df['date']) # this column ought to be date.
Sometimes this is the issue itself: they were strings.
or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+').
In [2]: df
Out[2]:
A B
0 1 2
1 1 3
2 4 6
Test it yourself to make sure it works and reproduces the issue.
You can format the text for Stack Overflow by highlighting and using Ctrl+K (or prepend four spaces to each line), or place three backticks (```) above and below your code with your code unindented.
I really do mean small. The vast majority of example DataFrames could be fewer than 6 rows and 6 columns,[citation needed] and I bet I can do it in 5x3. Can you reproduce the error with df = df[relevant_columns].head()? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.
But every rule has an exception, the obvious one being for performance issues (in which case definitely use %timeit and possibly %prun to profile your code), where you should generate:
df = pd.DataFrame(np.random.randn(100000000, 10))
Consider using np.random.seed so we have the exact same frame. Having said that, "make this code fast for me" is not strictly on topic for the sitenot strictly on topic for the site.
For getting runnable code, df.to_dict is often useful, with the different orient options for different cases. In the example above, I could have grabbed the columns and values from df.to_dict('split').
Write out the outcome you desire (similarly to above)
In [3]: iwantthis
Out[3]:
A B
0 1 5
1 4 6
Explain where the numbers come from:
The 5 is the sum of the B column for the rows where A is 1.
Do show the code you've tried:
In [4]: df.groupby('A').sum()
Out[4]:
B
A
1 5
4 6
But say what's incorrect:
The A column is in the index rather than a column.
Aside: the answer here is to use df.groupby('A', as_index=False).sum().
If it's relevant that you have Timestamp columns, e.g. you're resampling or something, then be explicit and apply pd.to_datetime to them for good measure.
df['date'] = pd.to_datetime(df['date']) # this column ought to be date.
Sometimes this is the issue itself: they were strings.
Do include a small example DataFrame, either as runnable code:
or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+').
In [2]: df
Out[2]:
A B
0 1 2
1 1 3
2 4 6
Test it yourself to make sure it works and reproduces the issue.
You can format the text for Stack Overflow by highlighting and using Ctrl+K (or prepend four spaces to each line), or place three backticks (```) above and below your code with your code unindented.
I really do mean small. The vast majority of example DataFrames could be fewer than 6 rows and 6 columns,[citation needed] and I bet I can do it in 5x3. Can you reproduce the error with df = df[relevant_columns].head()? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.
But every rule has an exception, the obvious one being for performance issues (in which case definitely use %timeit and possibly %prun to profile your code), where you should generate:
df = pd.DataFrame(np.random.randn(100000000, 10))
Consider using np.random.seed so we have the exact same frame. Having said that, "make this code fast for me" is not strictly on topic for the site.
For getting runnable code, df.to_dict is often useful, with the different orient options for different cases. In the example above, I could have grabbed the columns and values from df.to_dict('split').
Write out the outcome you desire (similarly to above)
In [3]: iwantthis
Out[3]:
A B
0 1 5
1 4 6
Explain where the numbers come from:
The 5 is the sum of the B column for the rows where A is 1.
Do show the code you've tried:
In [4]: df.groupby('A').sum()
Out[4]:
B
A
1 5
4 6
But say what's incorrect:
The A column is in the index rather than a column.
Aside: the answer here is to use df.groupby('A', as_index=False).sum().
If it's relevant that you have Timestamp columns, e.g. you're resampling or something, then be explicit and apply pd.to_datetime to them for good measure.
df['date'] = pd.to_datetime(df['date']) # this column ought to be date.
Sometimes this is the issue itself: they were strings.
Do include a small example DataFrame, either as runnable code:
or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+').
In [2]: df
Out[2]:
A B
0 1 2
1 1 3
2 4 6
Test it yourself to make sure it works and reproduces the issue.
You can format the text for Stack Overflow by highlighting and using Ctrl+K (or prepend four spaces to each line), or place three backticks (```) above and below your code with your code unindented.
I really do mean small. The vast majority of example DataFrames could be fewer than 6 rows and 6 columns,[citation needed] and I bet I can do it in 5x3. Can you reproduce the error with df = df[relevant_columns].head()? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.
But every rule has an exception, the obvious one being for performance issues (in which case definitely use %timeit and possibly %prun to profile your code), where you should generate:
df = pd.DataFrame(np.random.randn(100000000, 10))
Consider using np.random.seed so we have the exact same frame. Having said that, "make this code fast for me" is not strictly on topic for the site.
For getting runnable code, df.to_dict is often useful, with the different orient options for different cases. In the example above, I could have grabbed the columns and values from df.to_dict('split').
Write out the outcome you desire (similarly to above)
In [3]: iwantthis
Out[3]:
A B
0 1 5
1 4 6
Explain where the numbers come from:
The 5 is the sum of the B column for the rows where A is 1.
Do show the code you've tried:
In [4]: df.groupby('A').sum()
Out[4]:
B
A
1 5
4 6
But say what's incorrect:
The A column is in the index rather than a column.
Aside: the answer here is to use df.groupby('A', as_index=False).sum().
If it's relevant that you have Timestamp columns, e.g. you're resampling or something, then be explicit and apply pd.to_datetime to them for good measure.
df['date'] = pd.to_datetime(df['date']) # this column ought to be date.
Sometimes this is the issue itself: they were strings.
or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+').
In [2]: df
Out[2]:
A B
0 1 2
1 1 3
2 4 6
Test it yourself to make sure it works and reproduces the issue.
You can format the text for Stack Overflow by highlighting and using Ctrl+K (or prepend four spaces to each line), or place three backticks (```) above and below your code with your code unindented.
I really do mean small. The vast majority of example DataFrames could be fewer than 6 rows and 6 columns,[citation needed] and I bet I can do it in 5x3. Can you reproduce the error with df = dfdf[relevant_columns].head()[relevant_columns]? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.
But every rule has an exception, the obvious one being for performance issues (in which case definitely use %timeit and possibly %prun to profile your code), where you should generate:
df = pd.DataFrame(np.random.randn(100000000, 10))
Consider using np.random.seed so we have the exact same frame. Having said that, "make this code fast for me" is not strictly on topic for the site.
For getting runnable code, df.to_dict is often useful, with the different orient options for different cases. In the example above, I could have grabbed the columns and values from df.to_dict('split').
Write out the outcome you desire (similarly to above)
In [3]: iwantthis
Out[3]:
A B
0 1 5
1 4 6
Explain where the numbers come from:
The 5 is the sum of the B column for the rows where A is 1.
Do show the code you've tried:
In [4]: df.groupby('A').sum()
Out[4]:
B
A
1 5
4 6
But say what's incorrect:
The A column is in the index rather than a column.
Aside: the answer here is to use df.groupby('A', as_index=False).sum().
If it's relevant that you have Timestamp columns, e.g. you're resampling or something, then be explicit and apply pd.to_datetime to them for good measure.
df['date'] = pd.to_datetime(df['date']) # this column ought to be date.
Sometimes this is the issue itself: they were strings.
Do include a small example DataFrame, either as runnable code:
or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+').
In [2]: df
Out[2]:
A B
0 1 2
1 1 3
2 4 6
Test it yourself to make sure it works and reproduces the issue.
You can format the text for Stack Overflow by highlighting and using Ctrl+K (or prepend four spaces to each line), or place three backticks (```) above and below your code with your code unindented.
I really do mean small. The vast majority of example DataFrames could be fewer than 6 rows and 6 columns,[citation needed] and I bet I can do it in 5x3. Can you reproduce the error with df = df.head()[relevant_columns]? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.
But every rule has an exception, the obvious one being for performance issues (in which case definitely use %timeit and possibly %prun to profile your code), where you should generate:
df = pd.DataFrame(np.random.randn(100000000, 10))
Consider using np.random.seed so we have the exact same frame. Having said that, "make this code fast for me" is not strictly on topic for the site.
For getting runnable code, df.to_dict is often useful, with the different orient options for different cases. In the example above, I could have grabbed the columns and values from df.to_dict('split').
Write out the outcome you desire (similarly to above)
In [3]: iwantthis
Out[3]:
A B
0 1 5
1 4 6
Explain where the numbers come from:
The 5 is the sum of the B column for the rows where A is 1.
Do show the code you've tried:
In [4]: df.groupby('A').sum()
Out[4]:
B
A
1 5
4 6
But say what's incorrect:
The A column is in the index rather than a column.
Aside: the answer here is to use df.groupby('A', as_index=False).sum().
If it's relevant that you have Timestamp columns, e.g. you're resampling or something, then be explicit and apply pd.to_datetime to them for good measure.
df['date'] = pd.to_datetime(df['date']) # this column ought to be date.
Sometimes this is the issue itself: they were strings.
Do include a small example DataFrame, either as runnable code:
or make it "copy and pasteable" using pd.read_clipboard(sep=r'\s\s+').
In [2]: df
Out[2]:
A B
0 1 2
1 1 3
2 4 6
Test it yourself to make sure it works and reproduces the issue.
You can format the text for Stack Overflow by highlighting and using Ctrl+K (or prepend four spaces to each line), or place three backticks (```) above and below your code with your code unindented.
I really do mean small. The vast majority of example DataFrames could be fewer than 6 rows and 6 columns,[citation needed] and I bet I can do it in 5x3. Can you reproduce the error with df = df[relevant_columns].head()? If not, fiddle around to see if you can make up a small DataFrame which exhibits the issue you are facing.
But every rule has an exception, the obvious one being for performance issues (in which case definitely use %timeit and possibly %prun to profile your code), where you should generate:
df = pd.DataFrame(np.random.randn(100000000, 10))
Consider using np.random.seed so we have the exact same frame. Having said that, "make this code fast for me" is not strictly on topic for the site.
For getting runnable code, df.to_dict is often useful, with the different orient options for different cases. In the example above, I could have grabbed the columns and values from df.to_dict('split').
Write out the outcome you desire (similarly to above)
In [3]: iwantthis
Out[3]:
A B
0 1 5
1 4 6
Explain where the numbers come from:
The 5 is the sum of the B column for the rows where A is 1.
Do show the code you've tried:
In [4]: df.groupby('A').sum()
Out[4]:
B
A
1 5
4 6
But say what's incorrect:
The A column is in the index rather than a column.
Aside: the answer here is to use df.groupby('A', as_index=False).sum().
If it's relevant that you have Timestamp columns, e.g. you're resampling or something, then be explicit and apply pd.to_datetime to them for good measure.
df['date'] = pd.to_datetime(df['date']) # this column ought to be date.
Sometimes this is the issue itself: they were strings.
Add points about number of columns and length of scalars. Generalize "relevant DataFrame" → "relevant data". Minor clarification about "split".
Move code formatting help to its own bullet and link the guide. Cover `to_dict`. Add link about "entire stack trace". Add point about version, following from revision 13. Other minor changes. Remove unnecessary CSV link.
Link to specific magics. Improve formatting: avoid footnotes and tons of italics; use consistent quote formatting. Other minor improvements like grammar.
Active reading [<https://en.wikipedia.org/wiki/Pandas_%28software%29> <https://en.wikipedia.org/wiki/Comma-separated_values> <https://en.wikipedia.org/wiki/Sentence_clause_structure#Run-on_sentences> <http://stackoverflow.com/legal/trademark-guidance> (the last section)]. Expanded.