Timeline for answer to How to make good reproducible pandas examples by Andy Hayden
Current License: CC BY-SA 4.0
Post Revisions
42 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Apr 18 at 0:59 | history | edited | wjandrea | CC BY-SA 4.0 |
link np.random.seed
|
| Feb 16 at 17:34 | history | edited | wjandrea | CC BY-SA 4.0 |
+link on "not strictly on topic for the site"
|
| Feb 16 at 17:26 | history | edited | wjandrea | CC BY-SA 4.0 |
no sense getting `.head()` of irrelevant columns
|
| Sep 1, 2025 at 19:48 | history | edited | wjandrea | CC BY-SA 4.0 |
Add points about number of columns and length of scalars. Generalize "relevant DataFrame" → "relevant data". Minor clarification about "split".
|
| Dec 29, 2023 at 0:22 | history | edited | wjandrea | CC BY-SA 4.0 |
Avoid "SyntaxWarning: invalid escape sequence '\s' ".
|
| Dec 11, 2023 at 21:32 | history | edited | wjandrea | CC BY-SA 4.0 |
Clarify "Test it yourself."
|
| Nov 23, 2023 at 16:39 | history | edited | wjandrea | CC BY-SA 4.0 |
Mention Pandas 1.0 changes too. Clarify version numbers.
|
| Oct 22, 2023 at 17:52 | history | edited | wjandrea | CC BY-SA 4.0 |
Mention what `%prun` does
|
| Sep 13, 2023 at 14:54 | history | edited | wjandrea | CC BY-SA 4.0 |
Add `pd.show_versions()` as an alternative to `session_info`.
|
| Sep 9, 2023 at 20:09 | history | edited | wjandrea | CC BY-SA 4.0 |
Add session_info, following from revision 13.
|
| Sep 8, 2023 at 16:20 | history | edited | wjandrea | CC BY-SA 4.0 |
Move version point from "ugly" to "bad"
|
| Sep 7, 2023 at 17:47 | history | edited | wjandrea | CC BY-SA 4.0 |
Move code formatting help to its own bullet and link the guide. Cover `to_dict`. Add link about "entire stack trace". Add point about version, following from revision 13. Other minor changes. Remove unnecessary CSV link.
|
| Sep 7, 2023 at 16:44 | history | rollback | wjandrea |
Rollback to Revision 12
|
|
| S Aug 30, 2023 at 10:40 | history | suggested | Brian Tran | CC BY-SA 4.0 |
provide guideline to get session information
|
| Aug 29, 2023 at 9:35 | review | Suggested edits | |||
| S Aug 30, 2023 at 10:40 | |||||
| Feb 27, 2023 at 19:36 | comment | added | flywire |
As shown in the chat, StringIO() and pd.read_csv() seem useful for including csv data in code without requiring the csv to be transformed. A row-oriented list is also demonstrated.
|
|
| Feb 26, 2023 at 2:32 | comment | added | wjandrea | @flywire Did you reply to the wrong person? I'm still not really sure what you're talking about. | |
| Feb 26, 2023 at 2:18 | comment | added | flywire | @AndyHayden I provided a df. Feel free to demonstrate the dictionary approach but I don't think it changes my original comment. | |
| Feb 26, 2023 at 0:07 | comment | added | wjandrea |
@flywire Well, that's only one way of constructing a df. Other ways include a list of dicts, which is more like a CSV, or pd.read_csv() ofc. What I meant was if you print it, you get a table, like a CSV (but more readable).
|
|
| Feb 25, 2023 at 20:27 | comment | added | flywire | Look at the OPs question and consider the data compared to a csv. Again, eg: df = pd.DataFrame({'num_legs': [2, 4, 8, 0], 'num_wings': [2, 0, 0, 0], 'num_specimen_seen': [10, 2, 1, 8]}, index=['falcon', 'dog', 'spider', 'fish']) | |
| Feb 24, 2023 at 15:27 | comment | added | wjandrea | @flywire Sorry, what are you talking about exactly? If you have a question about parsing a CSV into a df, you can post the CSV; nobody's arguing against that. Andy's saying, if you have data, you need to post it; you can't just put a filename in your code and expect us to assume what the contents are. And I'm not sure what you mean about columns vs rows; CSVs and DFs are actually laid out the same in that respect... | |
| Feb 14, 2023 at 1:11 | comment | added | flywire | The Ugly: Don't link to a CSV. lol, even worse are all these answers. Nobody has presented a good way of converting actual csv to df, csv has properties in columns but DataFrames have properties in rows. | |
| Jan 27, 2023 at 18:03 | history | edited | wjandrea | CC BY-SA 4.0 |
Link to specific magics. Improve formatting: avoid footnotes and tons of italics; use consistent quote formatting. Other minor improvements like grammar.
|
| Jan 27, 2023 at 17:14 | history | edited | wjandrea | CC BY-SA 4.0 |
Clarify hatnote and add link to MRE.
|
| Sep 28, 2022 at 19:44 | history | edited | wjandrea | CC BY-SA 4.0 |
Improve grammar and formatting (including reducing overused italics). Update IPython docs link. Remove noise.
|
| Sep 19, 2022 at 7:02 | comment | added | Eelco van Vliet |
Concerning read_csv: you can use StringIO to import the data from a string. In that way you can mimic this as: import pandas as pd; from io import StringIO; text = """ Product,Perc,Storage,Price Azure,(2.4%,Server,£540 AWS,,Server,£640 GCP,,Server,£540 """; data = pd.read_csv(StringIO(text))
|
|
| Apr 3, 2022 at 17:29 | history | edited | wjandrea | CC BY-SA 4.0 |
Simplify formatting and grammar in notes for readability.
|
| Mar 23, 2022 at 17:32 | history | edited | wjandrea | CC BY-SA 4.0 |
Fix formatting for CommonMark
|
| Sep 13, 2021 at 20:07 | history | edited | ddejohn | CC BY-SA 4.0 |
small grammar fixes and minor correction
|
| Jul 24, 2021 at 20:43 | history | edited | Peter Mortensen | CC BY-SA 4.0 |
Active reading [<https://en.wikipedia.org/wiki/Pandas_%28software%29> <https://en.wikipedia.org/wiki/Comma-separated_values> <https://en.wikipedia.org/wiki/Sentence_clause_structure#Run-on_sentences> <http://stackoverflow.com/legal/trademark-guidance> (the last section)]. Expanded.
|
| Jan 13, 2021 at 18:04 | comment | added | philosofool |
I think it should be mentioned that df.head().to_dict() often produces a minimal representation of the dataset which can then be used to copy and paste code for the question. When it doesn't, it's usually because there are too many columns. If there are too many columns, a slice of the columns using df.columns[...] or df.select_dtypes will be very helpful.
|
|
| Feb 8, 2020 at 15:45 | history | edited | MarianD | CC BY-SA 4.0 |
added 22 characters in body
|
| Jun 10, 2019 at 20:39 | history | edited | TylerH | CC BY-SA 4.0 |
Updating some information and fixing spelling and capitalization
|
| Jun 10, 2019 at 11:26 | comment | added | U13-Forward |
Ugh, i always use pd.read_clipboard(), when their are spaces, i do: pd.read_clipboard(sep='\s+{2,}', engine='python') :P
|
|
| Dec 27, 2018 at 20:45 | comment | added | Andy Hayden | @MarianD the reason that \s\s+ is so popular is that there is often one e.g. in a column name, but multiple is rarer, and pandas output nicely puts in at least two between columns. Since this is just for toy/small datasets it's pretty powerful/majority of cases. Note: tabs separated would be a different story, though stackoverflow replaces tabs with spaces, but if you have a tsv then just use \t. | |
| Dec 26, 2018 at 22:32 | comment | added | MarianD |
Why pd.read_clipboard(sep='\s\s+'), and not a simpler pd.read_clipboard() (with the default ‘s+’)? The first need at least 2 whitespace characters, which may cause problems if there is only 1 (e. g. see such in the @JohnE 's answer).
|
|
| Aug 24, 2017 at 10:32 | history | edited | coldspeed95 | CC BY-SA 3.0 |
added 3 characters in body
|
| May 23, 2017 at 11:54 | history | edited | URL Rewriter Bot |
replaced http://stackoverflow.com/ with https://stackoverflow.com/
|
|
| Dec 9, 2016 at 17:50 | comment | added | user5359531 |
the pd.read_clipboard(sep='\s\s+') suggestion does not seem to work if you're using Python on a remote server, which is where a lot of large data sets live.
|
|
| Apr 13, 2016 at 17:32 | comment | added | zelusp |
+1 for the pd.read_clipboard(sep='\s\s+') tip. When I post SO questions that need a special but easily shared dataframe, like this one I build it in excel, copy it to my clipboard, then instruct SOers to do the same. Saves so much time!
|
|
| Nov 25, 2013 at 4:58 | vote | accept | Marius | ||
| Nov 23, 2013 at 6:19 | history | answered | Andy Hayden | CC BY-SA 3.0 |