This is to mainly to expand on @AndyHayden's answer by providing examples of how you can create sample dataframes. Pandas and (especially) numpy give you a variety of tools for this such that you can generally create a reasonable facsimile of any real dataset with just a few lines of code.
After importing numpy and pandas, be sure to provide a random seed if you want folks to be able to exactly reproduce your data and results.
df = pd.DataFrame({
# some ways to create random data
'a':np.random.randn(6),
'b':np.random.choice( [5,7,np.nan], 6),
'c':np.random.choice( ['panda','python','shark'], 6),
# some ways to create systematic groups for indexing or groupby
# this is similar to r's expand.grid(), see note 2 below
'd':np.repeat( range(3), 2 ),
'e':np.tile( range(2), 3 ),
# a date range and set of random dates
'f':pd.date_range('1/1/2011', periods=6, freq='D'),
'g':np.random.choice( pd.date_range('1/1/2011', periods=365,
freq='D'), 6, replace=False)
})
np.repeat and np.tile (columns d and e) are very useful for creating groups and indices in a very regular way. For 2 columns, this can be used to easily duplicate r's expand.grid() but is also more flexible in ability to provide a subset of all permutations. However, for 3 or more columns the syntax quickly becomes unwieldy.
- For a more direct replacement for r's
expand.grid() see the itertools solution in the pandas cookbook or the np.meshgrid solution shown here. Those will allow any number of dimensions.
- You can do quite a bit with
np.random.choice. For example, in column g, we have a random selection of 6 dates from 2011. Additionally, by setting replace=False we can assure these dates are unique -- very handy if we want to use this as an index with unique values.
stocks = pd.DataFrame({
'ticker':np.repeat( ['aapl','goog','yhoo','msft'], 25 ),
'date':np.tile( pd.date_range('1/1/2011', periods=25, freq='D'), 4 ),
'price':(np.random.randn(100).cumsum() + 10) })
>>> stocks.head(5)
date price ticker
0 2011-01-01 9.497412 aapl
1 2011-01-02 10.261908 aapl
2 2011-01-03 9.438538 aapl
3 2011-01-04 9.515958 aapl
4 2011-01-05 7.554070 aapl
>>> stocks.groupby('ticker').head(2)
date price ticker
0 2011-01-01 9.497412 aapl
1 2011-01-02 10.261908 aapl
25 2011-01-01 8.277772 goog
26 2011-01-02 7.714916 goog
50 2011-01-01 5.613023 yhoo
51 2011-01-02 6.397686 yhoo
75 2011-01-01 11.736584 msft
76 2011-01-02 11.944519 msft