Sorted strings filter

Question

Input a list of strings a and a string s for search keyword. Find out all strings in a which contains s as subsequence. And sort them in the following order:

Exactly equals to s
Starts with s
Contains s as substring (continuous subsequence)
Contains s as subsequence

Detail

When two strings belongs to the same sorting group, you may sort them in any order you prefer.
String matching is case sensitive. "A" and "a" are different characters.
All strings will only contain printable ASCII (#32~#126).
All strings will not have leading or trailing whitespaces.
All strings will be non-empty.
List a does not contain duplicate strings.

Example

When the list is ["center","encounter","enter","enterprise","event"], and the search target is "enter", output should be ["enter","enterprise","center","encounter"]. "event" is not included in the output as it doesn't contain "enter" as subsequence.

Test cases

["center","encounter","enter","enterprise","event"]
"enter"
-> ["enter","enterprise","center","encounter"]

["celebration","cooperation","generation","operation","ratio"]
"ratio"
-> ["ratio","celebration","cooperation","generation","operation"]

["combination","explanation","international","nation","national","nomination","notation"]
"nation"
-> ["nation","national","combination","explanation","international","nomination","notation"]

["ever","every","here","very","where"]
"everywhere"
-> []

["interaction","traditional","train","training","transformation"]
"train"
-> ["train","training","interaction","traditional","transformation"]

["condition","confusion","construction","contribution","information","organization","recommendation","transportation"]
"onion"
-> ["condition","confusion","construction","contribution","organization","recommendation"]

["...","---",".-.-.-","..--","-..-"]
"--"
-> ["---","..--",".-.-.-","-..-"]

["#","##","###","####","#####"]
"####"
-> ["####","#####"]

["Another", "example", "with spaces", "and also", "question marks", "...??"]
"a"
-> ["and also", "example", "with spaces", "question marks"]

["/.\\", "...", "><", "[[]]", "~.~", ".1.2", "_[(("]
"."
-> ["...", ".1.2", "/.\\", "~.~"]

["(())", "()()", "((()))", "(())()", "()(())", "()()()"]
"(())"
-> ["(())", "(())()", "((()))", "()(())", "()()()"]

["]["]
"]["
-> ["]["]

["\\", "\\\\", "\\\\\\"] # Input is encoded as JSON, while "\\" means a string with a single backslash
"\\"
-> ["\\", "\\\\", "\\\\\\"]

Output from your program may be different from above test cases, as the order of words in same group is not required.

Rules

This is code-golf, shortest code wins.

Input / Output

Input / Output are flexible. For example, you may use any reasonable ways including but not limited to:

You may I/O string as
- Your languages built-in string in ASCII or any ASCII compatible encoding (e.g. UTF-8);
- Your languages built-in string in any codepage that supports all printable ASCII characters (e.g. UTF-16);
- NUL terminated array of characters;
- array of integers, each integer is the ASCII value of character
- 0 terminated integer array;
You may I/O the array of string as
- A collection (OrderedSet, LinkedList, Array, ...; or HashSet only for input) of strings
- A character (or ASCII value) matrix with NUL (0) padding at the ending to each short ones;
  - Output matrix may have unnecessarily extra 0 padding;
- Line break (CR / LF / CRLF) separated single string;
- JSON encoded array of string

I think a regex based solution would pass all test cases. Yet it would be invalid unless special characters are escaped properly. So maybe you should either add a test case that makes something like that fail ... or restrict the input to letters only. — Arnauld
– Arnauld, Commented Dec 15, 2022 at 8:10

u-ndefined · Accepted Answer · 2022-12-15 12:24:19Z

Vyxal, 13 bytes

≬ṗ⁰cFsµ₌ǎṗJ⁰ḟ

Try it online! Input as a then s. Inefficient approach.

≬             # a 3-element lambda:
 ṗ            #   get all subsets
   c          #   does it contain
  ⁰           #   the last input (s)?
    F         # filter according to lambda
     s        # sort alphabetically (handles ordering of 1 and 2)
      µ       # a sorting lambda:
       ₌      #   parallel apply:
        ǎ     #     substrings (handles 1, 2, and 3)
         ṗ    #     subsets (handles 4)
          J   #   join together
            ḟ #   find the first occurrence of
           ⁰  #   last input

Shaggy · Accepted Answer · 2022-12-15 15:39:37Z

Japt, 16 bytes

Outputs a 2D-array; add 1 byte if that isn't permitted. Works in theory for all tests cases but there is a bug in Japt that prevents it from being able to handle mismatched square brackets in strings within an input array.

fÈà øVÃüøV ÔËñbV

Try it

fÈà øVÃüøV ÔËñbV     :Implicit input of array U & string V
f                    :Filter U by
 È                   :Passing each element through the following function
  à                  :  Combinations
    øV               :  Contains V?
      Ã              :End filter
       ü             :Group & sort by
        øV           :  Contains V?
           Ô         :Reverse
            Ë        :Map
             ñ       :  Sort by
              bV     :    First index of V

Neil · Accepted Answer · 2022-12-15 11:03:06Z

Retina, 85 bytes

~(L$`^.+
$&¶mN$$`^(.$*$\$&)?.$*¶-$$#1¶mO$$`^(.$*?)$\$&.$*¶$$1
\G.
$\$&.$*
^
0A`¶O`¶G`

Try it online! Takes the keyword as the first line and the search strings on the remaining lines. Explanation:

~(`

Evaluate the output of the rest of the program on the original input.

L$`^.+

Match just the keyword but replace the entire input with...

$&¶mN$$`^(.$*$\$&)?.$*¶-$$#1¶mO$$`^(.$*?)$\$&.$*¶$$1

... the keyword followed by two sort commands (see below).

\G.
$\$&.$*

Escape the keyword and insert .* between each escaped character.

^
0A`¶O`¶G`

Prefix two commands to the keyword and turn the keyword into a filter command.

For the example of the input keyword of train, the program that gets evaluated is as follows:

0A`
O`
G`t.*r.*a.*i.*n.*
mN$`^(.*train)?.*
-$#1
mN$`^(.*?)train.*
$.1

Explanation:

0A`

Delete the keyword.

O`

Sort the input. (This ensures that train appears before training.)

G`t.*r.*a.*i.*n.*

Filter for strings that have train as a subsequence.

mN$`^(.*train)?.*
-$#1

Sort strings that contain train first.

mO$`^(.*?)train.*
$1

Sort strings that contain train by the (length of the) prefix of the word train.

U13-Forward · Accepted Answer · 2022-12-20 04:11:27Z

2

Python 3, 104 bytes

lambda a,s:sorted([i for i in a if all(j in i for j in s)],key=lambda x:(x==s)-(x[:len(s)]==s)-(x in s))

I used sorted to order the result of the nested list comprehension.

Try it online!

edited Dec 20, 2022 at 4:11

answered Dec 20, 2022 at 2:55

U13-Forward

2,03110 silver badges35 bronze badges

\$\begingroup\$ This doesn't seem to give the correct output. For example, the first string in the first example should be enter instead of center. \$\endgroup\$

chunes
– chunes

2022-12-20 03:53:02 +00:00
Commented Dec 20, 2022 at 3:53
\$\begingroup\$ @chunes It is in different order. But the results are the same. \$\endgroup\$

U13-Forward
– U13-Forward

2022-12-20 03:54:25 +00:00
Commented Dec 20, 2022 at 3:54
\$\begingroup\$ @chunes The OP mentioned in his question that "Output from your program may be different from above test cases, as the order of words in same group is not required." \$\endgroup\$

U13-Forward
– U13-Forward

2022-12-20 03:55:03 +00:00
Commented Dec 20, 2022 at 3:55
1

\$\begingroup\$ Words within the same group can be in any order, but the groups themselves must be in a particular order. enter should come before center because enter is in the 'exactly equal` group whereas center is in the substring group. \$\endgroup\$

chunes
– chunes

2022-12-20 03:58:14 +00:00
Commented Dec 20, 2022 at 3:58
\$\begingroup\$ @chunes Edited mine, it's now by order :) \$\endgroup\$

U13-Forward
– U13-Forward

2022-12-20 04:11:42 +00:00
Commented Dec 20, 2022 at 4:11

| Show 3 more comments

Neil · Accepted Answer · 2022-12-15 10:23:24Z

Charcoal, 50 bytes

Ｆθ«≔ηζＦιＦ¬⌕ζκ≔Φζνζ¿¬ζ⊞υ⟦¬№ιη⌕ιηι⟧»≔⟦⟧εＷ⁻υε⊞ε⌊ιＥε⊟ι

Try it online! Link is to verbose version of code. Explanation:

Ｆθ«

Loop through the list of strings.

≔ηζ

Start with none of the letters of the search keyword found.

Ｆι

Loop through the letters in the current string.

Ｆ¬⌕ζκ

If the search keyword starts with the current letter, then...

≔Φζνζ

... remove the first letter from the copy of the search keyword.

¿¬ζ

If the search keyword is now empty, then...

⊞υ⟦¬№ιη⌕ιηι⟧

... push a list of three terms to the predefined empty list: i) a flag that is zero if the keyword is contained in the string rather than just a subsequence ii) the position of the keyword in the string iii) the string.

»≔⟦⟧εＷ⁻υε⊞ε⌊ι

Sort the matches.

Ｅε⊟ι

Output just the strings.

Arnauld · Accepted Answer · 2022-12-20 09:27:04Z

1

JavaScript (ES10), 126 bytes

-7 thanks to @tsh

Expects (array)(string).

a=>s=>[`^${q=s.replace(/\W/g,'\\$&')}$`,"^"+q,q,q.replace(/\\?./g,'.*$&')].flatMap(e=>a.filter(s=>a[0+s]?0:a[0+s]=s.match(e)))

Try it online!

Escaping characters in the regular expressions costs 41 34 bytes ... :'(

edited Dec 20, 2022 at 9:27

answered Dec 15, 2022 at 22:00

Arnauld

206k21 gold badges187 silver badges670 bronze badges

1

\$\begingroup\$ a=>s=>[`^${q=s.replace(/\W/g,'\\$&')}$`,"^"+q,q,q.replace(/\\?./g,'.*$&')].flatMap(e=>a.filter(s=>a[s]?0:a[s]=s.match(e))) \$\endgroup\$

tsh
– tsh

2022-12-16 09:30:33 +00:00
Commented Dec 16, 2022 at 9:30
1

\$\begingroup\$ Fail ["0"], "0" \$\endgroup\$

l4m2
– l4m2

2022-12-20 04:26:54 +00:00
Commented Dec 20, 2022 at 4:26
1

\$\begingroup\$ And maybe can be fixed by adding 4 bytes a[s] -> a[0+s] \$\endgroup\$

tsh
– tsh

2022-12-20 06:24:36 +00:00
Commented Dec 20, 2022 at 6:24

Add a comment |

l4m2 · Accepted Answer · 2022-12-23 11:54:43Z

1

JavaScript (Node.js), 121 bytes

s=>a=>a.filter(a=>g(a,s),g=([c,...x],y)=>c?g(x,y.slice(c==y[0])):!y).sort((x,y)=>(g=i=>s==i?-1:i.indexOf(s)>>>0)(x)-g(y))

Try it online!

no regex is shorter

JavaScript (Node.js), 133 126 125 bytes

s=>a=>a.filter(x=>x.match(s.replace(/(\w)|./g,(c,d)=>'.*'+[d||'\\'+c]))).sort((x,y)=>(g=i=>s==i?-1:i.indexOf(s)>>>0)(x)-g(y))

Try it online!

Sort equality then the first occurrence

edited Dec 23, 2022 at 11:54

answered Dec 15, 2022 at 8:43

l4m2

32.6k2 gold badges27 silver badges116 bronze badges

\$\begingroup\$ @tsh Outputting [] should be intended \$\endgroup\$

l4m2
– l4m2

2022-12-15 09:03:48 +00:00
Commented Dec 15, 2022 at 9:03

Add a comment |

Kevin Cruijssen · Accepted Answer · 2022-12-27 09:39:39Z

05AB1E, 16 bytes

ʒæIå}ΣηR¬œ€Œ˜«Ik

Try it online. (Way too slow for a test suite of all test cases.)

Explanation:

ʒ             # Filter the first (implicit) input-list by:
 æ            #  Get the powerset of the current word
  Iå          #  Check if the second input is in this list
}Σ            # After the filter, sort by:
  η           #  Get the prefixes of the current word
   R          #  Reverse this list
    ¬œ        #  Get the permutations of the current word as well
      €Œ      #  Get the substrings of each permutation
        ˜     #  Flatten this list
         «    #  Merge the two lists together
          Ik  #  Get the index of the second input in this list
              # (after which the filtered and sorted list of words is output implicitly)

Try just ηR¬œ€Œ˜« online to see how its order corresponds to the order of \$1\$ through \$4\$.

Roman Czyborra · Accepted Answer · 2022-12-28 14:54:54Z

0

Haskell, 137 bytes

import Data.List
o s a=map snd.filter((<0).fst)$sort[(-length(
 takeWhile($x)[isSubsequenceOf s,isInfixOf s,isPrefixOf s,(==)s]),x)|x<-a]

Try it online!

edited Dec 28, 2022 at 14:54

answered Dec 28, 2022 at 14:49

Roman Czyborra

1,6588 silver badges16 bronze badges

Add a comment |

Sean · Accepted Answer · 2022-12-28 18:42:20Z

Raku, 98 bytes

{$^s;@^a.grep(/<{S:g[(\W)|.]='\\'x?$0~$/~'.*'
given $s}>/).sort:{first :k,$_~~*,$s,/^$s/,/$s/,!0}}

Try it online!

I'm dismayed to learn that Raku has lost Perl 5's quotemeta function and \Q quoting construct. Reproducing the concept cost many bytes.

Anyway...

The functions arguments are passed in the placeholder variables @^a (the list of strings) and $^s (the search keyword).
The grep call on @^a filters the list to those elements in which the search keyword appears as a subsequence. This is done by transforming the search keyword into a new string by prefixing every non-word character with a backslash and appending a .* to every character.
Then sort sorts the remaining strings. The sort function is first :k, $_ ~~ *, $s, /^$s/, /$s/, !0. This returns the index (due to the :k flag) of the first of $s, /^$s/, /$s/, or !0 (a shorter way to write True) against which the current word smart-matches. The current word matches the string $s if it's the same as the search keyword, matches the regex /^$s/ if the search keyword appears at the start of the string, matches the regex /$s/ if the search keyword appears anywhere inside the string, and always matches True.

lesobrod · Accepted Answer · 2023-06-15 06:45:54Z

0

Wolfram Language (Mathematica), 113 bytes

(s=#2;SortBy[Select[#1,StringContainsQ[#,s]&],(d=Last@LongestCommonSequencePositions[s,#];{Length@d,First@d})&])&

Try it online!

Thank you for the interesting challenge, golf it on Mathematica is hard, but I tried (-_-)
LongestCommonSequencePositions is one of longest function names in WM 8)

answered Jun 15, 2023 at 6:45

lesobrod

3,4481 gold badge8 silver badges28 bronze badges

Add a comment |

Stack Exchange Network

Sorted strings filter

Detail

Example

Test cases

Rules

Input / Output

11 Answers 11

Vyxal, 13 bytes

Japt, 16 bytes

Retina, 85 bytes

Python 3, 104 bytes

Charcoal, 50 bytes

JavaScript (ES10), 126 bytes

JavaScript (Node.js), 121 bytes

JavaScript (Node.js), 133 126 125 bytes

05AB1E, 16 bytes

Haskell, 137 bytes

Raku, 98 bytes

Wolfram Language (Mathematica), 113 bytes

Hot Network Questions

Sorted strings filter

Detail

Example

Test cases

Rules

Input / Output

11 Answers 11

Vyxal, 13 bytes

Japt, 16 bytes

Retina, 85 bytes

Python 3, 104 bytes

Charcoal, 50 bytes

JavaScript (ES10), 126 bytes

JavaScript (Node.js), 121 bytes

JavaScript (Node.js), 133 126 125 bytes

05AB1E, 16 bytes

Haskell, 137 bytes

Raku, 98 bytes

Wolfram Language (Mathematica), 113 bytes

Related

Hot Network Questions