2

Given: a numpy array created from a string:

xy = np.array('4.9 3.5; 5.1 3.2; 4.7 3.1; 4.6 3.0; 5.0 5.4')

First off: is there a specific name for this construct?

Here is the datatype:

In [25]: xy
Out[25]:
array('4.9 3.5; 5.1 3.2; 4.7 3.1; 4.6 3.0; 5.0 5.4',
      dtype='|S43')

What is |S43 ..

So OK enough with internals.. So here is the real question: how do we use the generated array:

In [31]: cov(xy)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-6d999a60c1da> in <module>()
----> 1 cov(xy)

  .. 
TypeError: cannot perform reduce with flexible type

That result contrasts with the more standard usage of np.array:

In [33]: xy = np.array([[4.9, 3.5],[5.1, 3.2],[ 4.7, 3.1],[ 4.6, 3.0],[ 5.0, 5.4]], dtype=float)

In [35]: cov(xy)
Out[35]:
array([[ 0.98 ,  1.33 ,  1.12 ,  1.12 , -0.28 ],
       [ 1.33 ,  1.805,  1.52 ,  1.52 , -0.38 ],
       [ 1.12 ,  1.52 ,  1.28 ,  1.28 , -0.32 ],
       [ 1.12 ,  1.52 ,  1.28 ,  1.28 , -0.32 ],
       [-0.28 , -0.38 , -0.32 , -0.32 ,  0.08 ]])

So .. how to use the stringified numpy.array syntax to get that same result?

Update My bad here: i was mixing up numpy.array with numpy.matrix. The latter one does support the stringified syntax. See my answer below.

6
  • The |S43 means your type is a String with 43 chars Commented Nov 1, 2016 at 13:42
  • dtype='|S43' indicates that the array is a string array of length 43 (it has 43 characters). In other words, it is storing everything as a string, not as numbers. Commented Nov 1, 2016 at 13:42
  • You can't compute the covariance of a string. You have to use numbers (int, float ...) for computation. Commented Nov 1, 2016 at 13:49
  • can't compute cov of a string . Yea no kidding .. The assumption were that numpy performs the conversion. Maybe I am mixing up R with numpy, checking .. Commented Nov 1, 2016 at 13:52
  • The numpy array doesn't perform the conversion. Numpy arrays are generic types to store data of the same type. The type can be a string. In your case you create an array that contains one element (one string). Commented Nov 1, 2016 at 14:03

3 Answers 3

1

The problem: I was mixing numpy.array with numpy.matrix.

In [47]: np.matrix('1 2 3; 4 5 6')
Out[47]:
matrix([[1, 2, 3],
        [4, 5, 6]])
Sign up to request clarification or add additional context in comments.

1 Comment

Yes, this input style was added to np.matrix to give MATLAB users something familiar. Add a .A to make an array. Of course it's only useful for toy examples.
0

You need to parse the string to a usable format before passing it to numpy.array. Try this:

# original string
xy_str = '4.9 3.5; 5.1 3.2; 4.7 3.1; 4.6 3.0; 5.0 5.4'

# break into nested lists, pass to numpy.array
xy = numpy.array([list(map(float, v.split())) for v in  xy_str.split('; ')])

Comments

0

Convert the string into a list of lists like what's in your correct example.

orig_xy_str = '4.9 3.5; 5.1 3.2; 4.7 3.1; 4.6 3.0; 5.0 5.4'
new_xy = np.array([vals.split(' ') for vals in orig_xy_string.split('; ')], dtype=float)

>>> np.cov(new_xy)
array([[ 0.98 ,  1.33 ,  1.12 ,  1.12 , -0.28 ],
       [ 1.33 ,  1.805,  1.52 ,  1.52 , -0.38 ],
       [ 1.12 ,  1.52 ,  1.28 ,  1.28 , -0.32 ],
       [ 1.12 ,  1.52 ,  1.28 ,  1.28 , -0.32 ],
       [-0.28 , -0.38 , -0.32 , -0.32 ,  0.08 ]])

If you have no control over the initial input (as you say you are "given a numpy array created from a string"), first convert the array to a string with orig_xy_str = str(xy)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.