I'd be weary of solutions that builds the HTML by hand. Work with the DOM and let it write out the HTML for you. HAP can do that for you.
Don't create one-time use extension methods. It appears you created an extension method for strings to encode your titles. If you can't use it anywhere else in your code, it doesn't belong as an extension. I'd argue that it should be a regular static method of your Header class as it might be specific to how you want your headers encoded. In this context, it is confusing to see that call there.
Your logic in your headers to get the "sibling number" and TOC prefix is more complicated than it needs to be. Especially the GetTocNumber() method, the logic is very confusing to glance at. I was having a hard enough time trying to figure out what it was doing. The string reversal at the end really killed it. They both could be done simpler. In fact, they could be calculated at once on construction with some refactoring.
That leads in to the critical thing that's missing in most of these methods, comments... there's not a lot of useful ones in there. Your comments should be explaining what is happening in the code that couldn't be determined at first glance. The code really should be self-documenting. When it isn't, you need to say what it's doing in comments. But no one cares that the next line will add some item to a list. You should me saying things like, "we need to ensure we don't have an empty list because..." or at least explain why some actions are needed.
I did a lot more that I thought I would do but I would rewrite it more like this.
p.s., I don't know what your HTML would look like so I don't know how the nesting actually worked. But this should give you an idea how it could be better implemented (IMHO).
//Does this really need to create instances of this class?
public static class TocParserEx
{
//Does this really need to be an instance method?
public static string InsertToc(string html)
{
var doc = new HtmlDocument();
doc.LoadHtml(html);
//only place the TOC if there is a TOC section labeled
var tocPlaceholder = doc.DocumentNode
.DescendantNodes()
.OfType<HtmlTextNode>()
.Where(t => t.Text == "{TOC}")
.FirstOrDefault();
if (tocPlaceholder != null)
{
var newToc = HtmlNode.CreateNode(@"<div class=""toc"">
<div class=""toc-title"">Contents [<a class=""toc-showhide"" href=""#"">hide</a>]</div>
<div class=""toc-list""></div>
</div>");
tocPlaceholder.ParentNode.ReplaceChild(newToc, tocPlaceholder);
AddHeaderAnchors(doc.DocumentNode, Header.Root);
AddTocEntries(Header.Root, newToc.Descendants("div").Last());
}
return doc.DocumentNode.WriteTo();
}
/// <summary>
/// Adds anchors to headers found in the node to the parent header.
/// </summary>
/// <param name="root">The root node which contains the headers</param>
/// <param name="parentHeader">The parent header</param>
private static void AddHeaderAnchors(HtmlNode root, Header parentHeader)
{
// Find all child headers
var headerName = "h" + (parentHeader.Level + 1);
var headers = root.ChildNodes
.Where(e => Header.IsHeader(e) && e.Name == headerName)
.Select(e => Header.FromNode(e, parentHeader))
.ToList();
foreach (var header in headers)
{
var replacement = HtmlNode.CreateNode(String.Format("<a name=\"{0}\"/>", header.Id));
//populate any subheaders
AddHeaderAnchors(header.Node, header);
//replace the found header with the wrapper
header.Node.ParentNode.ReplaceChild(replacement, header.Node);
replacement.AppendChild(header.Node);
}
}
/// <summary>
/// Adds the child headers to the TOC section.
/// </summary>
/// <param name="rootHeader">The header which contains the sections to be added</param>
/// <param name="tocSection">The TOC section to add to</param>
private static void AddTocEntries(Header rootHeader, HtmlNode tocSection)
{
var ul = tocSection.AppendChild(HtmlNode.CreateNode("<ul/>"));
foreach (var header in rootHeader.Children)
{
var entry = ul.AppendChild(CreateTocEntry(header));
if (header.Children.Any())
{
AddTocEntries(header, entry);
}
}
}
private static HtmlNode CreateTocEntry(Header header)
{
return HtmlNode.CreateNode(String.Format(@"<li>
<a href=""#{0}"">{1} {2}</a>
</li>", header.Id, header.Section, header.Title));
}
}
//this class really should be lightweight
public class Header
{
public static Header Root { get { return _root; } }
private static readonly Header _root = new Header();
public string Title { get; private set; }
public string Tag { get; private set; }
public string Id { get; private set; }
public int Level { get; private set; }
public HtmlNode Node { get; private set; }
public Header Parent { get; private set; }
public ReadOnlyCollection<Header> Children { get { return _children.AsReadOnly(); } }
public int EntryNumber { get; private set; }
public string Section { get; private set; }
private List<Header> _children;
private Header() : this(HtmlNode.CreateNode("<h0/>"), null) { }
private Header(HtmlNode node, Header parent)
{
Title = node.InnerText;
Tag = node.Name;
Id = EncodeTitle(Title) + ShortGuid.NewGuid();
Level = Int32.Parse(Tag.Substring(1));
Node = node;
Parent = parent ?? _root;
_children = new List<Header>();
if (parent == null)
{
EntryNumber = 1;
Section = "1";
}
else
{
parent._children.Add(this);
EntryNumber = parent.Children.Count;
Section = parent.Section + "." + EntryNumber;
}
}
public static Header FromNode(HtmlNode node, Header parent)
{
if (parent == null)
return _root;
if (node == null)
throw new ArgumentNullException("node");
return new Header(node, parent);
}
public static bool IsHeader(HtmlNode node)
{
return System.Text.RegularExpressions.Regex.IsMatch(node.Name, @"h\d");
}
private static string EncodeTitle(string title)
{
//encode the title (whatever your logic is)
return String.Concat(title.Where(Char.IsLetterOrDigit));
}
}
And the "HTML" I tested it on:
<p>{TOC}</p>
<h1>This is a title!!!</h1>
<h1>Here's another title!!!</h1>