6

If I wanted to remove things like: .!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.

Allowed alphabetical characters should also include letters with diacritical marks including à or ç.

4
  • 1
    stackoverflow.com/questions/737475/… this will answer 50% of your question Commented Feb 12, 2012 at 3:21
  • 2
    Normally we talk about a String having characters and an Array having different data, e.g. objects, numbers, or strings. Do you really have an Array (perhaps an array of strings?) or just a String? Commented Feb 12, 2012 at 4:34
  • -1. Not so much for not trying to work it out, but because the question is nonsensical for the reason given by Phrogz. Commented Feb 12, 2012 at 8:32
  • A hash of arrays of strings. Nice. (What?) Commented Feb 12, 2012 at 17:47

5 Answers 5

18

You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):

"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"

For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:

"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"

For all character properties, you can refer to the doc.

Sign up to request clarification or add additional context in comments.

2 Comments

Note that you're answering how to gsub on a string, but the title and description use the word "array".
@Phrogz: Indeed. Hopefully the OP knows how to do a map.
3
string.gsub(/[^[:alnum:]]/, "")

Comments

3

The following will work for an array:

z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect

I borrowed Jeremy's suggested regex.

1 Comment

Thanks Phrogz, I totally missed that. :)
1

You might consider a regular expression.

http://www.regular-expressions.info/ruby.html

I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.

A regexp you might use might go something like this:

[^.!,^-#]

That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.

Comments

1

If you truly have an array (as you state) and it is an array of strings (I'm guessing), e.g.

foo = [ "hello", "42 cats!", "yöwza" ]

then I can imagine that you either want to update each string in the array with a new value, or that you want a modified array that only contains certain strings.

If the former (you want to 'clean' every string the array) you could do one of the following:

foo.each{ |s| s.gsub! /\p{^Alnum}/, '' }     # Change every string in place…
bar = foo.map{ |s| s.gsub /\p{^Alnum}/, '' } # …or make an array of new strings
#=> [ "hello", "42cats", "yöwza" ]

If the latter (you want to select a subset of the strings where each matches your criteria of holding only alphanumerics) you could use one of these:

# Select only those strings that contain ONLY alphanumerics
bar = foo.select{ |s| s =~ /\A\p{Alnum}+\z/ }
#=> [ "hello", "yöwza" ]

# Shorthand method for the same thing
bar = foo.grep /\A\p{Alnum}+\z/     
#=> [ "hello", "yöwza" ]

In Ruby, regular expressions of the form /\A………\z/ require the entire string to match, as \A anchors the regular expression to the start of the string and \z anchors to the end.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.