3
\$\begingroup\$

Could somebody help me work out how to abstract the recursion from this code for me please? I'm trying to write something that takes a pandoc structure, and subdivides the tree whenever it finds a sentence, so:

<span>This sentence. Should split.</span>

becomes:

<span>
  <span { sentence }>This sentence.</span>
  <span { sentence }> Should split.</span>
</span>

I have a working implementation, but the recursive part is a bit repetitive:

segment :: [Inline] -> [Inline]
segment [] = []
segment [el] = [el]
segment els @ (head:tail) =
  case head of
    Emph ch -> map (Emph . toList) (segment ch) ++ segment tail
    Strong ch -> map (Strong . toList) (segment ch) ++ segment tail
    Strikeout ch -> map (Strikeout . toList) (segment ch) ++ segment tail
    Superscript ch -> map (Superscript . toList) (segment ch) ++ segment tail
    Subscript ch -> map (Subscript . toList) (segment ch) ++ segment tail
    SmallCaps ch -> map (SmallCaps. toList) (segment ch) ++ segment tail
    Quoted qt ch -> map (Quoted qt . toList) (segment ch) ++ segment tail
    Cite cs ch -> map (Cite cs . toList) (segment ch) ++ segment tail
    Span attr ch -> map (Span attr . toList) (segment ch) ++ segment tail
    _ ->
      maybe [sentence els] (\i -> (sentence $ take i els) : segment (drop i els))
        $ (1 +) <$> List.findIndex split els
  where
    sentence = Span ("sentence", [], [])

split :: Inline -> Bool
split (Str str) = endsWith "." str || endsWith "?" str || endsWith "!" str
split _         = False

endsWith :: String -> String -> Bool
endsWith suff str = suff `List.isSuffixOf` str

toList :: a -> [a]
toList = (: [])

I'm trying to work out a way to hand in some f that will model the part that splits the structure (i.e the bottom bit), but I'm struggling as the function has to be [Inline] -> [Inline], it's also as if it needs to be Inline -> [Inline] -> [Inline] as each of the recursive steps need the original constructor in order to preserve the structure of the document, ie:

<span><strong>This sentence. Should split.</strong></span>

becomes:

<span>
  <strong><span { sentence }>This sentence.</span></strong>
  <strong><span { sentence }> Should split.</span></strong>
</span>
\$\endgroup\$
2
  • \$\begingroup\$ You can probably avoid any manual recursion by using walk from Text.Pandoc.Walk (in package pandoc-types). \$\endgroup\$
    – tarleb
    Commented Dec 2, 2017 at 14:17
  • \$\begingroup\$ That one walks into the Image and Link constructors, which are omitted here. One of transformOf (each . inlinePrePlate) and transformOf (inlinePrePlate . each) could do that job, though. \$\endgroup\$
    – Gurkenglas
    Commented Dec 4, 2017 at 7:44

2 Answers 2

3
+50
\$\begingroup\$

pandoc-lens provides a matching Plated instance for Inline, but we want to modify the number of Inlines to one, so we'll need an inlinePrePlate :: Traversal' Inline [Inline]. Let's submit that. Usually template could have done our job, but we don't want to target the [Inline] arguments of the Link and Image constructors.

segment :: [Inline] -> [Inline]
segment [] = []
segment [el] = [el]
segment els @ (head:tail) = if has inlinePrePlate head
  then traverseOf inlinePrePlate (map (\x -> [x]) . segment) head ++ segment tail
  else case break1 split els of
    (sent, segs) -> Span ("sentence", [], []) sent : segment segs

break1 :: (a -> Bool) -> [a] -> ([a], [a])
break1 f [] = ([], [])
break1 f (x:xs) | f x = ([x], xs)
break1 f (x:xs) = let (y,ys) = break1 f xs in (x:y,ys)

(I don't understand the part of your post that talks about an f.)

segment [el] = [el] is superfluous, right?

No sentence contains anything that has children, right?

It appears that to me that

<span>This sentence. Should split.</span>

instead becomes:

<span> <span { sentence }>This sentence.</span> </span>
<span> <span { sentence }> Should split.</span> </span>
\$\endgroup\$
3
\$\begingroup\$

I'll start with something unrelated to recursion: A first cleanup step might be to reduce the repetition in segment:

segment els @ (head:tail) =
  let constr = case head of
                 Emph ch        -> Just (Emph, ch)
                 Strong ch      -> Just (Strong, ch)
                 Strikeout ch   -> Just (Strikeout, ch)
                 Superscript ch -> Just (Superscript, ch)
                 Subscript ch   -> Just (Subscript, ch)
                 SmallCaps ch   -> Just (SmallCaps, ch)
                 Quoted qt ch   -> Just (Quoted qt, ch)
                 Cite cs ch     -> Just (Cite cs, ch)
                 Span attr ch   -> Just (Span attr, ch)
                 _              -> Nothing
      sentence = Span ("", ["sentence"], [])
  in case constr of
    Just (inlnConstr, ch) -> map inlnConstr (segment ch) ++ segment tail
    _               ->
      maybe [sentence els] (\i -> (sentence $ take i els) : segment (drop i els))
        $ (1 +) <$> List.findIndex split els

Note that we not only cut away the repeating (segment ch) and ++ segment tail, but also the toList, which is just an id here (as the result of segment already is a List.

Next would be the splitting, which could be done using break instead of findIndex, drop, and take:

      case break split els of
        (_, [])    -> [sentence els]
        (xs, y:ys) -> sentence (xs ++ [y]) : segment ys

The last part of your post might better be asked on stackoverflow, as there is no working code for this yet.

\$\endgroup\$
4
  • 1
    \$\begingroup\$ ch is not in scope by the time of its use. \$\endgroup\$
    – Gurkenglas
    Commented Nov 29, 2017 at 23:43
  • 1
    \$\begingroup\$ To be fair you can then have constr be Maybe ([Inline] -> Inline, [Inline]) and have it return Just (Constructor, ch), at least that then separates the repetition in the recursive part. Still doesn't feel that elegant though. \$\endgroup\$
    – danbroooks
    Commented Nov 30, 2017 at 8:46
  • \$\begingroup\$ Thanks, should be fixed now. (Getting a good code review on one's code review is actually quite awesome). \$\endgroup\$
    – tarleb
    Commented Dec 2, 2017 at 14:09
  • \$\begingroup\$ toList is not id as OP defines it as (:[]). \$\endgroup\$
    – Gurkenglas
    Commented Dec 4, 2017 at 16:28

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.