Skip to main content

Timeline for answer to How to make good reproducible pandas examples by Andy Hayden

Current License: CC BY-SA 4.0

Post Revisions

42 events
when toggle format what by license comment
Apr 18 at 0:59 history edited wjandrea CC BY-SA 4.0
link np.random.seed
Feb 16 at 17:34 history edited wjandrea CC BY-SA 4.0
+link on "not strictly on topic for the site"
Feb 16 at 17:26 history edited wjandrea CC BY-SA 4.0
no sense getting `.head()` of irrelevant columns
Sep 1, 2025 at 19:48 history edited wjandrea CC BY-SA 4.0
Add points about number of columns and length of scalars. Generalize "relevant DataFrame" → "relevant data". Minor clarification about "split".
Dec 29, 2023 at 0:22 history edited wjandrea CC BY-SA 4.0
Avoid "SyntaxWarning: invalid escape sequence '\s' ".
Dec 11, 2023 at 21:32 history edited wjandrea CC BY-SA 4.0
Clarify "Test it yourself."
Nov 23, 2023 at 16:39 history edited wjandrea CC BY-SA 4.0
Mention Pandas 1.0 changes too. Clarify version numbers.
Oct 22, 2023 at 17:52 history edited wjandrea CC BY-SA 4.0
Mention what `%prun` does
Sep 13, 2023 at 14:54 history edited wjandrea CC BY-SA 4.0
Add `pd.show_versions()` as an alternative to `session_info`.
Sep 9, 2023 at 20:09 history edited wjandrea CC BY-SA 4.0
Add session_info, following from revision 13.
Sep 8, 2023 at 16:20 history edited wjandrea CC BY-SA 4.0
Move version point from "ugly" to "bad"
Sep 7, 2023 at 17:47 history edited wjandrea CC BY-SA 4.0
Move code formatting help to its own bullet and link the guide. Cover `to_dict`. Add link about "entire stack trace". Add point about version, following from revision 13. Other minor changes. Remove unnecessary CSV link.
Sep 7, 2023 at 16:44 history rollback wjandrea
Rollback to Revision 12
S Aug 30, 2023 at 10:40 history suggested Brian Tran CC BY-SA 4.0
provide guideline to get session information
Aug 29, 2023 at 9:35 review Suggested edits
S Aug 30, 2023 at 10:40
Feb 27, 2023 at 19:36 comment added flywire As shown in the chat, StringIO() and pd.read_csv() seem useful for including csv data in code without requiring the csv to be transformed. A row-oriented list is also demonstrated.
Feb 26, 2023 at 2:32 comment added wjandrea @flywire Did you reply to the wrong person? I'm still not really sure what you're talking about.
Feb 26, 2023 at 2:18 comment added flywire @AndyHayden I provided a df. Feel free to demonstrate the dictionary approach but I don't think it changes my original comment.
Feb 26, 2023 at 0:07 comment added wjandrea @flywire Well, that's only one way of constructing a df. Other ways include a list of dicts, which is more like a CSV, or pd.read_csv() ofc. What I meant was if you print it, you get a table, like a CSV (but more readable).
Feb 25, 2023 at 20:27 comment added flywire Look at the OPs question and consider the data compared to a csv. Again, eg: df = pd.DataFrame({'num_legs': [2, 4, 8, 0], 'num_wings': [2, 0, 0, 0], 'num_specimen_seen': [10, 2, 1, 8]}, index=['falcon', 'dog', 'spider', 'fish'])
Feb 24, 2023 at 15:27 comment added wjandrea @flywire Sorry, what are you talking about exactly? If you have a question about parsing a CSV into a df, you can post the CSV; nobody's arguing against that. Andy's saying, if you have data, you need to post it; you can't just put a filename in your code and expect us to assume what the contents are. And I'm not sure what you mean about columns vs rows; CSVs and DFs are actually laid out the same in that respect...
Feb 14, 2023 at 1:11 comment added flywire The Ugly: Don't link to a CSV. lol, even worse are all these answers. Nobody has presented a good way of converting actual csv to df, csv has properties in columns but DataFrames have properties in rows.
Jan 27, 2023 at 18:03 history edited wjandrea CC BY-SA 4.0
Link to specific magics. Improve formatting: avoid footnotes and tons of italics; use consistent quote formatting. Other minor improvements like grammar.
Jan 27, 2023 at 17:14 history edited wjandrea CC BY-SA 4.0
Clarify hatnote and add link to MRE.
Sep 28, 2022 at 19:44 history edited wjandrea CC BY-SA 4.0
Improve grammar and formatting (including reducing overused italics). Update IPython docs link. Remove noise.
Sep 19, 2022 at 7:02 comment added Eelco van Vliet Concerning read_csv: you can use StringIO to import the data from a string. In that way you can mimic this as: import pandas as pd; from io import StringIO; text = """ Product,Perc,Storage,Price Azure,(2.4%,Server,£540 AWS,,Server,£640 GCP,,Server,£540 """; data = pd.read_csv(StringIO(text))
Apr 3, 2022 at 17:29 history edited wjandrea CC BY-SA 4.0
Simplify formatting and grammar in notes for readability.
Mar 23, 2022 at 17:32 history edited wjandrea CC BY-SA 4.0
Fix formatting for CommonMark
Sep 13, 2021 at 20:07 history edited ddejohn CC BY-SA 4.0
small grammar fixes and minor correction
Jul 24, 2021 at 20:43 history edited Peter Mortensen CC BY-SA 4.0
Active reading [<https://en.wikipedia.org/wiki/Pandas_%28software%29> <https://en.wikipedia.org/wiki/Comma-separated_values> <https://en.wikipedia.org/wiki/Sentence_clause_structure#Run-on_sentences> <http://stackoverflow.com/legal/trademark-guidance> (the last section)]. Expanded.
Jan 13, 2021 at 18:04 comment added philosofool I think it should be mentioned that df.head().to_dict() often produces a minimal representation of the dataset which can then be used to copy and paste code for the question. When it doesn't, it's usually because there are too many columns. If there are too many columns, a slice of the columns using df.columns[...] or df.select_dtypes will be very helpful.
Feb 8, 2020 at 15:45 history edited MarianD CC BY-SA 4.0
added 22 characters in body
Jun 10, 2019 at 20:39 history edited TylerH CC BY-SA 4.0
Updating some information and fixing spelling and capitalization
Jun 10, 2019 at 11:26 comment added U13-Forward Ugh, i always use pd.read_clipboard(), when their are spaces, i do: pd.read_clipboard(sep='\s+{2,}', engine='python') :P
Dec 27, 2018 at 20:45 comment added Andy Hayden @MarianD the reason that \s\s+ is so popular is that there is often one e.g. in a column name, but multiple is rarer, and pandas output nicely puts in at least two between columns. Since this is just for toy/small datasets it's pretty powerful/majority of cases. Note: tabs separated would be a different story, though stackoverflow replaces tabs with spaces, but if you have a tsv then just use \t.
Dec 26, 2018 at 22:32 comment added MarianD Why pd.read_clipboard(sep='\s\s+'), and not a simpler pd.read_clipboard() (with the default ‘s+’)? The first need at least 2 whitespace characters, which may cause problems if there is only 1 (e. g. see such in the @JohnE 's answer).
Aug 24, 2017 at 10:32 history edited coldspeed95 CC BY-SA 3.0
added 3 characters in body
May 23, 2017 at 11:54 history edited URL Rewriter Bot
replaced http://stackoverflow.com/ with https://stackoverflow.com/
Dec 9, 2016 at 17:50 comment added user5359531 the pd.read_clipboard(sep='\s\s+') suggestion does not seem to work if you're using Python on a remote server, which is where a lot of large data sets live.
Apr 13, 2016 at 17:32 comment added zelusp +1 for the pd.read_clipboard(sep='\s\s+') tip. When I post SO questions that need a special but easily shared dataframe, like this one I build it in excel, copy it to my clipboard, then instruct SOers to do the same. Saves so much time!
Nov 25, 2013 at 4:58 vote accept Marius
Nov 23, 2013 at 6:19 history answered Andy Hayden CC BY-SA 3.0