Timeline for answer to How to make good reproducible pandas examples by Andy Hayden

Current License: CC BY-SA 4.0

Post Revisions

42 events

when toggle format	what		by	license	comment
Apr 18 at 0:59	history	edited	wjandrea	CC BY-SA 4.0	link np.random.seed
Feb 16 at 17:34	history	edited	wjandrea	CC BY-SA 4.0	+link on "not strictly on topic for the site"
Feb 16 at 17:26	history	edited	wjandrea	CC BY-SA 4.0	no sense getting `.head()` of irrelevant columns
Sep 1, 2025 at 19:48	history	edited	wjandrea	CC BY-SA 4.0	Add points about number of columns and length of scalars. Generalize "relevant DataFrame" → "relevant data". Minor clarification about "split".
Dec 29, 2023 at 0:22	history	edited	wjandrea	CC BY-SA 4.0	Avoid "SyntaxWarning: invalid escape sequence '\s' ".
Dec 11, 2023 at 21:32	history	edited	wjandrea	CC BY-SA 4.0	Clarify "Test it yourself."
Nov 23, 2023 at 16:39	history	edited	wjandrea	CC BY-SA 4.0	Mention Pandas 1.0 changes too. Clarify version numbers.
Oct 22, 2023 at 17:52	history	edited	wjandrea	CC BY-SA 4.0	Mention what `%prun` does
Sep 13, 2023 at 14:54	history	edited	wjandrea	CC BY-SA 4.0	Add `pd.show_versions()` as an alternative to `session_info`.
Sep 9, 2023 at 20:09	history	edited	wjandrea	CC BY-SA 4.0	Add session_info, following from revision 13.
Sep 8, 2023 at 16:20	history	edited	wjandrea	CC BY-SA 4.0	Move version point from "ugly" to "bad"
Sep 7, 2023 at 17:47	history	edited	wjandrea	CC BY-SA 4.0	Move code formatting help to its own bullet and link the guide. Cover `to_dict`. Add link about "entire stack trace". Add point about version, following from revision 13. Other minor changes. Remove unnecessary CSV link.
Sep 7, 2023 at 16:44	history	rollback	wjandrea		Rollback to Revision 12
S Aug 30, 2023 at 10:40	history	suggested	Brian Tran	CC BY-SA 4.0	provide guideline to get session information
Aug 29, 2023 at 9:35	review	Suggested edits
S Aug 30, 2023 at 10:40
Feb 27, 2023 at 19:36	comment	added	flywire		As shown in the chat, `StringIO()` and pd.read_csv() seem useful for including csv data in code without requiring the csv to be transformed. A row-oriented list is also demonstrated.
Feb 26, 2023 at 2:32	comment	added	wjandrea		@flywire Did you reply to the wrong person? I'm still not really sure what you're talking about.
Feb 26, 2023 at 2:18	comment	added	flywire		@AndyHayden I provided a df. Feel free to demonstrate the dictionary approach but I don't think it changes my original comment.
Feb 26, 2023 at 0:07	comment	added	wjandrea		@flywire Well, that's only one way of constructing a df. Other ways include a list of dicts, which is more like a CSV, or `pd.read_csv()` ofc. What I meant was if you print it, you get a table, like a CSV (but more readable).
Feb 25, 2023 at 20:27	comment	added	flywire		Look at the OPs question and consider the data compared to a csv. Again, eg: df = pd.DataFrame({'num_legs': [2, 4, 8, 0], 'num_wings': [2, 0, 0, 0], 'num_specimen_seen': [10, 2, 1, 8]}, index=['falcon', 'dog', 'spider', 'fish'])
Feb 24, 2023 at 15:27	comment	added	wjandrea		@flywire Sorry, what are you talking about exactly? If you have a question about parsing a CSV into a df, you can post the CSV; nobody's arguing against that. Andy's saying, if you have data, you need to post it; you can't just put a filename in your code and expect us to assume what the contents are. And I'm not sure what you mean about columns vs rows; CSVs and DFs are actually laid out the same in that respect...
Feb 14, 2023 at 1:11	comment	added	flywire		The Ugly: Don't link to a CSV. lol, even worse are all these answers. Nobody has presented a good way of converting actual csv to df, csv has properties in columns but DataFrames have properties in rows.
Jan 27, 2023 at 18:03	history	edited	wjandrea	CC BY-SA 4.0	Link to specific magics. Improve formatting: avoid footnotes and tons of italics; use consistent quote formatting. Other minor improvements like grammar.
Jan 27, 2023 at 17:14	history	edited	wjandrea	CC BY-SA 4.0	Clarify hatnote and add link to MRE.
Sep 28, 2022 at 19:44	history	edited	wjandrea	CC BY-SA 4.0	Improve grammar and formatting (including reducing overused italics). Update IPython docs link. Remove noise.
Sep 19, 2022 at 7:02	comment	added	Eelco van Vliet		Concerning read_csv: you can use StringIO to import the data from a string. In that way you can mimic this as: `import pandas as pd; from io import StringIO; text = """ Product,Perc,Storage,Price Azure,(2.4%,Server,£540 AWS,,Server,£640 GCP,,Server,£540 """; data = pd.read_csv(StringIO(text))`
Apr 3, 2022 at 17:29	history	edited	wjandrea	CC BY-SA 4.0	Simplify formatting and grammar in notes for readability.
Mar 23, 2022 at 17:32	history	edited	wjandrea	CC BY-SA 4.0	Fix formatting for CommonMark
Sep 13, 2021 at 20:07	history	edited	ddejohn	CC BY-SA 4.0	small grammar fixes and minor correction
Jul 24, 2021 at 20:43	history	edited	Peter Mortensen	CC BY-SA 4.0	Active reading [<https://en.wikipedia.org/wiki/Pandas_%28software%29> <https://en.wikipedia.org/wiki/Comma-separated_values> <https://en.wikipedia.org/wiki/Sentence_clause_structure#Run-on_sentences> <http://stackoverflow.com/legal/trademark-guidance> (the last section)]. Expanded.
Jan 13, 2021 at 18:04	comment	added	philosofool		I think it should be mentioned that df.head().to_dict() often produces a minimal representation of the dataset which can then be used to copy and paste code for the question. When it doesn't, it's usually because there are too many columns. If there are too many columns, a slice of the columns using `df.columns[...]` or `df.select_dtypes` will be very helpful.
Feb 8, 2020 at 15:45	history	edited	MarianD	CC BY-SA 4.0	added 22 characters in body
Jun 10, 2019 at 20:39	history	edited	TylerH	CC BY-SA 4.0	Updating some information and fixing spelling and capitalization
Jun 10, 2019 at 11:26	comment	added	U13-Forward		Ugh, i always use `pd.read_clipboard()`, when their are spaces, i do: `pd.read_clipboard(sep='\s+{2,}', engine='python')` :P
Dec 27, 2018 at 20:45	comment	added	Andy Hayden		@MarianD the reason that \s\s+ is so popular is that there is often one e.g. in a column name, but multiple is rarer, and pandas output nicely puts in at least two between columns. Since this is just for toy/small datasets it's pretty powerful/majority of cases. Note: tabs separated would be a different story, though stackoverflow replaces tabs with spaces, but if you have a tsv then just use \t.
Dec 26, 2018 at 22:32	comment	added	MarianD		Why `pd.read_clipboard(sep='\s\s+')`, and not a simpler `pd.read_clipboard()` (with the default `‘s+’`)? The first need at least 2 whitespace characters, which may cause problems if there is only 1 (e. g. see such in the @JohnE 's answer).
Aug 24, 2017 at 10:32	history	edited	coldspeed95	CC BY-SA 3.0	added 3 characters in body
May 23, 2017 at 11:54	history	edited	URL Rewriter Bot		replaced http://stackoverflow.com/ with https://stackoverflow.com/
Dec 9, 2016 at 17:50	comment	added	user5359531		the `pd.read_clipboard(sep='\s\s+')` suggestion does not seem to work if you're using Python on a remote server, which is where a lot of large data sets live.
Apr 13, 2016 at 17:32	comment	added	zelusp		+1 for the `pd.read_clipboard(sep='\s\s+')` tip. When I post SO questions that need a special but easily shared dataframe, like this one I build it in excel, copy it to my clipboard, then instruct SOers to do the same. Saves so much time!
Nov 25, 2013 at 4:58	vote	accept	Marius
Nov 23, 2013 at 6:19	history	answered	Andy Hayden	CC BY-SA 3.0

toggle format

Collectives™ on Stack Overflow

Timeline for answer to How to make good reproducible pandas examples by Andy Hayden

Current License: CC BY-SA 4.0

Post Revisions