13

I would like to create a fast/optimized fully expandable function that counts the number of spaces in an argument:

\documentclass{article}
\begin{document}
\countspaces{ A B } % Should return 3 (1 is ok too if leading and trailing spaces are removed)
\countspaces{A \mycommand B} % Should return 2 (\mycommand is not expanded)
\countspaces{A {a b c} B} % Should return 2 (spaces inside groups are not counted)
\end{document}

Explicit spaces should be counted too. How to achieve that?

3
  • 3
    FYI, {A \mycommand B} will only return 1, because the "space" after \mycommand is a terminator to the name, not a cat-10 space. Commented Mar 12 at 19:20
  • How do you want implicit spaces (e.g., \@sptoken) to be handled? As a space, or not? Commented Mar 12 at 19:26
  • Two versions with and without implicit spaces would be amazing. Commented Mar 12 at 19:31

6 Answers 6

12

You cannot count real number of spaces at macro level (expand processor) because spaces are specially treated by token processor and some spaces are not send from it to expand processor. For example multiple spaces or spaces which are separators of control sequence name. The question is if you want to count exact number of spaces (without spaces in {...}). If it is true, you need to read the input from a file with special catcode regime. If it is not true, you can try following macro which is based only on TeX primitives and plain TeX macros:

\newcount\tmpnum
\def\countspaces#1{\tmpnum=-1 \countspacesA #1 {\end} }
\def\countspacesA #1 {\ifx\end#1\countspacesB 
   \else \advance\tmpnum by1 \expandafter\countspacesA\fi}
\def\countspacesB{\the\tmpnum}

\countspaces{ A B } % prints 3
\countspaces{A \mycommand B} % prints 1 (space after \mycommand is removed by token procesor)
\countspaces{A {a b c} B} % prints 2 (spaces inside groups are not counted)

\bye

If somebody needs expandable \contspaces:

\def\countspaces#1{\countspacesA{-1}#1 {\end} }
\def\countspacesA #1#2 {\ifx\end#2#1%
   \else \expandafter\countspacesA\expandafter{\the\numexpr#1+1\expandafter}\fi}

\countspaces{ A B } % expands to 3
\countspaces{A \mycommand B} % expands to 1 (space after \mycommand is removed by token procesor)
\countspaces{A {a b c} B} % expands to 2 (spaces inside groups are not counted)

\bye
3
  • This command \countspaces is not expandable. For example, \edef\testA{\countspaces{ A B }}\testA gives 0. Commented Mar 12 at 23:50
  • 1
    @wipet it was (at least according to the question's history) always part of the assignment that this should be fully expandable. Commented Mar 13 at 21:47
  • @wipet The demand for expandability is in the first sentence of the question: "I would like to create a fast/optimized fully expandable function that counts the number of spaces in an argument:". Commented Mar 17 at 21:24
7

Only explicit spaces

This code is modelled after the internals of \str_count_spaces:n as defined by l3str. Changes are the marker (being \countspaces@mark here), the check whether we're at the end of the loop (faster than \ifx-based tests). The result is a macro that counts spaces expandably as fast as possible in e-TeX, though its argument must not contain the token \countspaces@mark.

\documentclass{article}

\makeatletter
\newcommand\countspaces[1]
  {%
    \the\numexpr
      \countspaces@#1       % these spaces are on purpose
        \countspaces@mark7  %
        \countspaces@mark6  %
        \countspaces@mark5  %
        \countspaces@mark4  %
        \countspaces@mark3  %
        \countspaces@mark2  %
        \countspaces@mark1  %
        \countspaces@mark0  %
        \countspaces@mark-1 %
      \countspaces@stop
    \relax
  }
\long\def\countspaces@#1 #2 #3 #4 #5 #6 #7 #8 #9 % <- spaces on purpose
  {%
    \countspaces@ifmark#9\countspaces@stop\countspaces@mark
    9+\countspaces@
  }
\def\countspaces@stop#1\countspaces@stop{}
\long\def\countspaces@ifmark#1\countspaces@mark{}
\makeatother

\begin{document}
\countspaces{ A B } % Should return 3 (1 is ok too if leading and trailing spaces are removed)
\countspaces{A \mycommand B} % Should return 1 (TeX tokenisation rules)
\countspaces{A {a b c} B} % Should return 2 (spaces inside groups are not counted)
\end{document}

Also implicit spaces

Just using classic TeX you can't get around a token-by-token loop here. etl (disclaimer: of which I'm the author) defines a fast one (we could get a bit faster here and there if we defined that loop completely on our own, but that'd be a bit excessive for this answer). See the etl documentation on tokens which shouldn't be in the argument (basically a bunch of package-private functions and markers used by etl internally).

\documentclass{article}

\usepackage{etl}

\ExplSyntaxOn
\cs_new:Npn \vincent_count_spaces:n
  {
    \the\numexpr % <- could use `\int_eval:w` but then we'd need 3 expansions
    \c_zero_int
    \etl_act:nnnnnn
      \__vincent_count_spaces:nN
      { +\c_one_int \use_none:n }
      \use_none:nn
      { \scan_stop: \use_none:nn }
      {}
  }
\cs_new:Npn \__vincent_count_spaces:nN #1#2
  { \if_meaning:w \c_space_token #2 +\c_one_int \fi: }
\cs_new_eq:NN \countspaces \vincent_count_spaces:n
\ExplSyntaxOff

\begin{document}
\countspaces{ A B } % Should return 3 (1 is ok too if leading and trailing spaces are removed)
\countspaces{A \mycommand B} % Should return 1 (TeX tokenisation rules)
\countspaces{A {a b c} B} % Should return 2 (spaces inside groups are not counted)

\makeatletter
\countspaces{A {a b c} \@sptoken B} % Should return 3
\makeatother
\end{document}
7

New version taking into account the comments of jps and egreg.

The code below counts the spaces. For example, A {a b c} B is interpreted as 0+1+0*(0+1+1)+1 and then evaluated with \int_eval:n as 2.

There is an optional star for the command \countspaces. It allows to decide case by case whether commands such as \space, \,, \ , \@sptoken or \! should be taken into account.

\documentclass[border=6pt]{standalone}
\usepackage{tabularray}
\usepackage{xcolor}
\usepackage{fvextra}
\ExplSyntaxOn
\cs_new:Npn \__Vincent_count_spaces_aux_i:n #1
  {
    \str_case:ne {#1}
      {
        { ~ } { + 1 }
        { \c_left_brace_str } { + 0 * ( 0 }
        { \c_right_brace_str } { ) }
      }
  }
\makeatletter
\cs_new:Npn \__Vincent_count_spaces_aux_ii:nn #1#2
  {
    \tl_if_single_token:nT {#2}
      {
        \token_if_control_word:NT #2 { - 1 }
        \IfBooleanT {#1}
          {
            \token_case_meaning:NnT #2
              {
                \space {}
                \, {}
%                \  {}%choose whether \  should be counted twice
                \  { - 1 }%choose whether \  should be counted twice
                \@sptoken {}
                \! {}
              }
              { + 1 }
          }
      }
  }
\makeatother
\NewExpandableDocumentCommand \countspaces { s m }
  {
    \int_eval:n
      {
        0
        \str_map_function:nN {#2} \__Vincent_count_spaces_aux_i:n
        \tl_map_tokens:nn {#2} { \__Vincent_count_spaces_aux_ii:nn {#1} }
      }
  }
\ExplSyntaxOff
\newcommand{\test}[1]{\Verb*{#1} & \edef\testA{\countspaces{#1}}\testA & \edef\testB{\countspaces*{#1}}\testB\\}
\begin{document}
\begin{tblr}[expand=\test]{
  colspec=lcc,
  hline{2}=1pt,
  row{odd}=lightgray,
  row{1}=gray
}
input & \Verb{\countspaces} & \Verb{\countspaces*}\\
\test{ A B }% Should return 3 (1 is ok too if leading and trailing spaces are removed)
\test{A \mycommand B}% Should return 2 (\mycommand is not expanded)
\test{A {a b c} B}% Should return 2 (spaces inside groups are not counted)
\test{a}
\test{\,}
\test{\@}
\test{\ }
\test{\@ a}
\test{ \,a}
\test{ C { k l {m n } } X Y }
\test{\x\y}
\test{ A \space B \, C \ D}
\test{ A \space B \, C \ D { \space} \! E }
\end{tblr}
\end{document}

count-spaces-in-latex_4

6
  • 1
    Because you use \str_map_function:nN this would count any spaces that are part of macro names and counts a space for every control sequence in the argument. Commented Mar 12 at 22:15
  • @jps Thank you for the additional comment. It was not clear to me what the specific behavior should be for such cases. For example, it was asked that \countspaces{A \mycommand B} % Should return 2. Commented Mar 12 at 22:27
  • 2
    @matexmatics There is just one space in A \mycommand B; true, the code in the answer produces 2 in this case, but it also would with \countspace{\x\y}. Commented Mar 12 at 22:54
  • @egreg Thank you for the comments. See the updated answer. Commented Mar 12 at 23:37
  • 2
    Sorry, but now \countspaces{\@ a} returns 0 instead of 1. Commented Mar 13 at 7:55
5

Here, I make space a catcode 12 before I absorb the argument. Then I count them using a \numexpr. But, as matexmatics noted, the result is not expandable, unless one changes the catcode of space in advance of the expansion, as in

{\catcode`\ =12 \edef\z{\countspaces{ A B C }}\z}

which yields the correct answer of 4. Here is the full MWE:

\documentclass{article}
\let\endcsp\empty
\makeatletter
\catcode`\_=10 %
\catcode`\ =12_%
\def\countspaces{\catcode`\ =12_\csp}%
\def\csp#1{\the\numexpr0\cspaux#1\endcsp\relax\catcode`\ =10_}%
\def\cspaux#1{\ifx#1\endcsp\else\ifx#1 +1\fi\expandafter\cspaux\fi}%
\catcode`\ =10_%
\catcode`\_=12 %
\makeatother
\begin{document}
\countspaces{ A B } Should return 3 (1 is ok too if leading and trailing spaces are removed)

\countspaces{A \mycommand B} Should return 2 (mycommand is not expanded)

\countspaces{A {a b c} B} Should return 2 (spaces inside groups are not counted)
\end{document}

enter image description here

2
  • 1
    This command \countspaces is not expandable. Commented Mar 12 at 23:41
  • @matexmatics I have edited to show how to make it expandable, by performing a catcode change in advance of the expansion. Commented Mar 13 at 2:11
5

If you're able to use LuaTeX, then it is possible to exactly count the number of spaces in an argument fully-expandably. The trick is that with LuaTeX, we're able to change catcodes in an expansion-only context, whereas I believe that this is impossible in other engines.

\documentclass{article}


%%%%%%%%%%%%%%%%%%%%%%
%%% Implementation %%%
%%%%%%%%%%%%%%%%%%%%%%

\ExplSyntaxOn
    %% Define a new catcode table that is almost like verbatim, except that
    %% the braces still have their ordinary meaning as grouping characters.
    \cctab_const:Nn \c__example_verbish_cctab {
        \cctab_select:N \c_other_cctab
        \char_set_catcode_group_begin:N \{
        \char_set_catcode_group_end:N \}
    }
\ExplSyntaxOff

\usepackage{luacode}
\begin{luacode*}
    -- Constants
    local verbish_cctab = luatexbase.registernumber("c__example_verbish_cctab")
    local chardef = token.command_id("char_given")

    -- Define the Lua function to count spaces in an argument
    local function count_spaces()
        -- Push the new catcode table
        local curent_catcode_table = tex.catcodetable
        tex.catcodetable = verbish_cctab

        -- Scan the argument as a string
        local argument = token.scan_string()

        -- Delete any brace groups
        argument = argument:gsub("%b{}", "")

        -- Count the number of spaces in the argument
        local space_count = argument:count(" ")

        -- Pop the catcode table to restore the previous state
        tex.catcodetable = curent_catcode_table

        -- Return the count like a TeX \chardef token (slightly faster)
        token.put_next(token.create(space_count, chardef))

        -- -- Return the count as a sequence of digit tokens (slightly slower)
        -- tex.sprint(-2, space_count)
    end

    -- Expose the Lua function as a TeX macro
    do
        local target_csname = "countspaces"
        local index = luatexbase.new_luafunction(target_csname)
        lua.get_functions_table()[index] = count_spaces
        token.set_lua(target_csname, index, "global")
    end
\end{luacode*}


%%%%%%%%%%%%%%%%%%%%%
%%% Demonstration %%%
%%%%%%%%%%%%%%%%%%%%%

\newcommand{\Header}[1]{%
    \vrule height 20pt depth 0pt width 0pt\relax%
    \hskip0.5\tabcolsep\relax%
    \clap{\bfseries#1}%
}
\def\Test#1{#1}

\pagestyle{empty}
\begin{document}
    \begin{tabular}{rl}
        \Header{Test cases given in the question}                  \\
        |\verb| A B ||          & \the\countspaces{ A B }          \\
        |\verb|A \mycommand B|| & \the\countspaces{A \mycommand B} \\
        |\verb|A {a b c} B||    & \the\countspaces{A {a b c} B}    \\

        \Header{My own test cases}                             \\
        |\verb|||             & \the\countspaces{}             \\
        |\verb| ||            & \the\countspaces{ }            \\
        |\verb|  ||           & \the\countspaces{  }           \\
        |\verb|   ||          & \the\countspaces{   }          \\
        |\verb|\@ \relax  #|| & \the\countspaces{\@ \relax  #} \\

        \Header{Unclear if this is the desired output or not} \\
        |\verb|a\ b|| & \the\countspaces{a\ b}                \\

        %% The command is semi-verbatim, so it works less well if inside of
        %% the argument to another macro (but still works in most cases).
        \Header{Inside a macro argument}                                         \\
        Top-level:   |\verb|A {a b c} B|| & \the\countspaces{A {a b c} B}        \\
        In argument: |\verb|A {a b c} B|| & \Test{\the\countspaces{A {a b c} B}} \\
        Top-level:   |\verb|\relax||      & \the\countspaces{\relax}             \\
        In argument: |\verb|\relax||      & \Test{\the\countspaces{\relax}}      \\
        Top-level:   |\verb|\relax ||     & \the\countspaces{\relax }            \\
        In argument: |\verb|\relax ||     & \Test{\the\countspaces{\relax }}     \\
        Top-level:   |\verb|      ||      & \the\countspaces{      }             \\
        In argument: |\verb|      ||      & \Test{\the\countspaces{      }}      \\

        \Header{Fully-expandable}                                                         \\
        |\verb| A B ||          & \edef\tmp{\the\countspaces{ A B }         }\meaning\tmp \\
        |\verb|A \mycommand B|| & \edef\tmp{\the\countspaces{A \mycommand B}}\meaning\tmp \\
        |\verb|A {a b c} B||    & \edef\tmp{\the\countspaces{A {a b c} B}   }\meaning\tmp \\
        |\verb|\relax||         & \edef\tmp{\the\countspaces{\relax}        }\meaning\tmp \\
        |\verb|\relax ||        & \edef\tmp{\the\countspaces{\relax }       }\meaning\tmp \\
        |\verb|||               & \edef\tmp{\the\countspaces{}              }\meaning\tmp \\
        |\verb| ||              & \edef\tmp{\the\countspaces{ }             }\meaning\tmp \\
        |\verb|  ||             & \edef\tmp{\the\countspaces{  }            }\meaning\tmp \\
        |\verb|   ||            & \edef\tmp{\the\countspaces{   }           }\meaning\tmp \\
    \end{tabular}
\end{document}

output

3

Since it may be useful for others, here is a tokmap-based solution that counts explicit spaces at all levels of nesting. It is fully expandable.

\input tokmap

\long\def\countspaces#1{\the\numexpr 0\tokmap\countspacesA{#1}\relax}
\long\def\countspacesA#1{\ifx\tokmap@space#1+1\fi}

\countspaces{ A{B C\@sptoken} }  % 3

\bye

If you want to count implicit spaces as well, simply add another branch of \ifx to test for \@sptokens.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.