8
\$\begingroup\$

This is posted in response to this request made in a discussion of my "C-ish" answer to a recent Code Review question regarding a Python solution to a Leetcode challenge. It was reported that the OP's Python code functioned correctly, but did not rate well in its memory used. (The entire Q&A of that Python posting can be found where this linked request appears.)


For completeness, here is that OP's version of the Leetcode problem statement:

Definition: A valid IP address is defined to consist of exactly four integers separated by dots. Each integer is inclusive between 0 and 255.

  • For example, '0.1.2.201' and '192.168.1.1' are valid IP addresses,
    while '0.011.255.245', '192.168.1.312' and '[email protected]' are all invalid IP addresses.

Goal: Given a string s containing only digits, return all possible valid IP addresses that can be formed by inserting dots in s. It is not allowed to remove/reorder digits in s, however the output can be returned in any order.

Constraints:

  1. \$1\le s.\operatorname{length}\le20\$
  2. All of the characters of \$s\$ are digits.

The first version of my answer proposed, in text, an alternative approach to solving the Leetcode challenge (perhaps being less cavalier about Python's "black box" memory/string allocations.) The updated version (added second half) was inspired by a comment mentioning "templates." (Thank you @Reinderien!)

"The proof is in the pudding", so posted here is the current 'testbed' evolution of a possible C language submission to Leetcode. For yesterday's answer, I'd used one extravagant array of 81x32bit elements and a second array of offsets and counts. This has been upgraded to a single 2D array of 16 bit elements. Each 'row' (major dimension) contains all the "octal templates" to be applied to a candidate digit string of fixed length (4 to 12 digits). Elements on the minor dimension ('cols') are a 'zero-terminated' collection of values (binary bits expressed in readable octal), each representing a distinct and unique "template" for apportioning of 'n' ASCII digits to each of the 4 octets of a legal IPv4 address.

My motivation for posting this is to receive feedback on the ("template") approach I have taken. The code is congested with explanatory comments for sharing here on Stack Exchange. A 'toy' project such as a Leetcode challenge would normally only be seen by the coder and by Leetcode's compiler.

A simple "driver" (main()) accepts an entry, does NO verification that it is a contiguous digit string, then hands it to genIPv4s() that attempts to make as many legal addresses as it can.

Consider the 8-digit candidate string "01212301".
Brute forcing will try to assemble 3x3x3x3 (81) arrangements.
Knowing there are 19 templates for any 8 digit candidate, we see 75% of that effort is predictably futile.

Recursive code (with short-circuiting) improves on this, able to 'early return' when it recognises 01.2xxx and 012.1xxx are futile (hopeless) variations that need no further probing.

The approach below contains 19 templates for any 8 digit candidate. These templates can be used to simply and quickly "pre-screen" their utility (low cost). Among those 19 octal templates are several 0xxx2 and 0xxx3 entries that correlate with, 2 digits and 3 digits respectively to makeup the FIRST octet.

Important: The simplest access to sequential fields of bits is to let the 'rightmost' bits correspond to the 'leftmost' position in an array (like an IPv4 octet). In this example, b2, b1 and b0 are interpreted as "how many digits (1, 2 or 3) to use to fill the leftmost octet of the IPv4 string." Take time to absorb this, please...
Next, recognise that only b1 and b0 of any trio of bits is used (ie. 0b001 to 0b011). At 'useful' 2 bits for each of 4 octets, a single 8-bit byte would suffice. However, the attraction of C's base8 octal being "built-in" is more attractive than scrimping to use only base4 bit regions. The octal based templates in source code can be readily recognised and understood.
Just have to remember what is "L-to-R" and what is "R-to-L". Eliminating this mirroring isn't hard but I didn't bother with that extra nicety in this code.

There are two more ways potential "execution time" savings have made by using templates.

The code contains a few prototypes of simple early detection trying to avoid the work of assembling an octet like ".01." only to discover conversion to an integer then back to a string does not yield the same string. This is easy to do with either end of string, and would be possible (but more difficult) for the inner two octets. Likewise, 1st or 4th octets that would be greater than "299" are very easy to detect and short-circuit. (Adopters are welcome to tighten that up so any 1st or 4th octet greater than "255" would short-circuit without any form of atoi().) The few oct#_0XX() functions herein provide a glimpse toward other simple "early detection" possibilities; this is not all that can be done, nor optimised for "least amount of code reproduction".

Please note that sequentially populating the four octets of an IPv4 string means sequentially using source string digits (validating-as-we-go OR validating-after-assembly). The predetermined template allows the code to pre-screen for 'invalidity' of even the 4th octet BEFORE any significant work has been done.

A final (significant?) advantage of using templates has been included in the code...
Consider a candidate string of 8 digits...
There are 19 templates (arrangements) of 8 digits that could be valid IPv4 strings.
Imagine early detection of, say, ".456" as the invalid 4th octet from this template.
NOW, it is trivial to scan to the end of this collection of templates, somehow 'erasing' any that call for 3 digits in the 4th octet. (For 8 digits, there would be 6 more ill-fated templates coming-up.) When a given candidate string of digits presenting difficulties can be detected, an attempt is made to "compress out of existence" other templates having the same characteristic.
Note: Processing of 1 to 19 templates appropriate to the current candidate's "length" is performed on a temporary COPY of those templates, renewed appropriately for each new candidate string.

There's a lot happening here...
The code is full of explanatory comments of dubious utility and tracing print statements for demonstration purposes only.

Thank you for any/all feedback.
(FWIW: I've no intention of signing-up for Leetcode.)

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef unsigned int u32;
typedef unsigned short u16;

// NB: with max of 19, dimension 20 ensures each 'row' is 0 terminated
// Allows simple dectection of end of a flavour's collection (and more!)
const u16 templates[][20] = { // 3x3x3x3 => 81 templates
    // 4 digits -  1 template
    {   01111, },

    // 5 digits -  4 templates
    {   01112, 01121, 01211, 02111, },

    // 6 - 10 templates
    {   01113, 01122, 01131, 01212, 01221,
        01311, 02112, 02121, 02211, 03111, },

    // 7 - 16
    {   01123, 01132, 01213, 01222, 01231, 01312, 01321, 02113,
        02122, 02131, 02212, 02221, 02311, 03112, 03121, 03211, },

    // 8 - 19
    {   01133, 01223, 01232, 01313, 01322, 01331, 02123, 02132, 02213,
        02222,
        02231, 02312, 02321, 03113, 03122, 03131, 03212, 03221, 03311, },

    // 9 - 16
    {   01233, 01323, 01332, 02133, 02223, 02232, 02313, 02322,
        02331, 03123, 03132, 03213, 03222, 03231, 03312, 03321, },

    // 10 - 10
    {   01333, 02233, 02323, 02332, 03133,
        03223, 03232, 03313, 03322, 03331, },

    // 11 - 4
    {   02333, 03233, 03323, 03332, },

    // 12 - 1 template
    {   03333, },
};

void erase( u16 *tpd, int (*fnc)( u16 ) ) {
// Template is to be screened out, try to screen out same others that follow.
// Current template is overwritten, so caller's pointer must remain same for next attempt.
// May leave abandoned original templates beyond relocated sentinel value. Who cares?

    u16 *tps = tpd + 1;// point to next in collection
    while( (*tpd = *tps++) != 0 ) // copy until exhausted. (May overwrite erasing others)
        tpd += !fnc( *tpd ); // only increment if transplanted something different
// NB: 'Compresses' WORKING COPY buffer, not original
}

// octet 1
int oct1_01x( u16 t ) { return (t & 02) == 02; } // 0b010 works for 2 and 3
int oct1_011( u16 t ) { return (t & 03) == 03; } // 0b011 works for 3

// octet 4
int oct4_010( u16 t ) { return (t & 03000) == 02000; }
int oct4_011( u16 t ) { return (t & 03000) == 03000; }

void genIPv4s( char *s, u32 l ) { // digit string, its length
printf( "'%s' - length %u\n", s, l ); // tracing

    u16 wrk[ sizeof templates[ 0 ] ]; // working copy of this flavour's templates
    memcpy( wrk, templates[ l - 4 ], sizeof wrk );

    for( u16 *tp = wrk; *tp; ) { // each template for 'l' digits

        // skip pre-emptive screening for len = 4, 5. nothing to be gained.
        if( l > 5 ) {
            u16 started = *tp; // template bit pattern BEFORE possible compress/erasure

// WARNING: 'comma operator' used in this block. 
            if( s[l-2] == '0' && oct4_010( *tp ) )
printf( "%o ... %s\n", *tp, "rejecting because last octet would be '0x'" ),
                erase( tp, oct4_010 );

            if( s[l-3] >= '3' && oct4_011( *tp ) )
printf( "%o ... %s\n", *tp, "rejecting because last octet would be '>299'" ),
                erase( tp, oct4_011 );

            if( s[l-3] == '0' && oct4_011( *tp ) )
printf( "%o ... %s\n", *tp, "rejecting because last octet would be '0xx'" ),
                erase( tp, oct4_011 );

            // screen for "0x" or "0xx" in 1st octet
            if( s[0] == '0' && oct1_01x( *tp ) )
printf( "%o ... %s\n", *tp, "rejecting because first octet leading zero" ),
                erase( tp, oct1_01x );

            // crude screen for "[3-9]xx". won't catch "257". Do work and catch later
            if( s[0] >= '3' && oct1_011( *tp ) )
printf( "%o ... %s\n", *tp, "rejecting because first octet would be >299" ),
                erase( tp, oct1_011 );

            if( started != *tp ) // didn't work! switched template!
                continue; // go back and start over with this guy
        }
printf( "%o ... ", *tp ); // tracing

        char *ps = s; // beginning of digit string
        char obuf[ 3+1 + 3+1 + 3+1 + 3+1 ] = {0}, *po = obuf; // IPv4 assembly area
        int ok = 1; // optimism (could be bool)

        // for each octet
        for( u16 tmpl = *tp; ok && tmpl; tmpl >>= 3 ) { // R-to-L thru pattern
            u32 nd = tmpl & 7; // # of digits in this octet

            // easy, coarse-grain filtering before forming bad octet
            if( ( nd > 1 && *ps == '0' ) // 2 or 3 digits with leading zero? Bad!
            ||  ( nd > 2 && *ps > '2' ) ) { // 3xx, 4xx, ...? Bad!
printf( "Got '%s' but '%.*s' won't work", obuf, nd, ps ); // tracing
                ok = 0;
                break;
            }

            memcpy( po, ps, nd ); // copy 1-3 digits
            ok = strtol( po, NULL, 0 ) < 256; // binary value in range
            ps += nd, po += nd, *po++ = '.'; // advance pointers and append '.'
        }
        po = ok ? po - 1 : obuf; // terminate, or clobber
        *po = '\0'; // always terminated
puts( obuf ); // for tracing, good or bad...
        // for Leetcode would only "remember" okay IPv4 strings
        tp++; // NB: on to next template...
    }
}

int main( void ) {
    char buf[ 64 ]; // big enough?

    while( printf( "?: " ) && fgets( buf, sizeof buf, stdin ) ) {
        u32 l = (u32)strcspn( buf, "\n" );
        buf[ l ] = '\0'; // trim the LF
        if( l == 0 ) break;
        if( l < 4 || l > 12 )
            printf( "length (%u) must be 4 to 12 digits\n", l );
        else
            genIPv4s( buf, l ); // no verification of all digits, only digits.
    }

    return 0;
}

Sample output for 3 slightly different 8 digit candidates:

?: 02312301
'02312301' - length 8
1133 ... rejecting because first octet leading zero  // Consider how many templates have been 'erased'
1331 ... 0.231.230.1
2231 ... rejecting because last octet would be '0x'
3131 ... rejecting because last octet would be '>299'
?: 12312301
'12312301' - length 8     // for comparison
1133 ... 123.123.0.1
1223 ... 123.12.30.1
1232 ... Got '12.' but '312' won't work   // example of "futile" partial effort aborted
1313 ... 123.1.230.1
1322 ... 12.31.230.1
1331 ... 1.231.230.1
2123 ... rejecting because last octet would be '0x'
3113 ... rejecting because last octet would be '>299'
?: 12121212
'12121212' - length 8    // All templates used
1133 ... 121.212.1.2
1223 ... 121.21.21.2
1232 ... 12.121.21.2
1313 ... 121.2.121.2
1322 ... 12.12.121.2
1331 ... 1.212.121.2
2123 ... 121.21.2.12
2132 ... 12.121.2.12
2213 ... 121.2.12.12
2222 ... 12.12.12.12
2231 ... 1.212.12.12
2312 ... 12.1.212.12
2321 ... 1.21.212.12
3113 ... 121.2.1.212
3122 ... 12.12.1.212
3131 ... 1.212.1.212
3212 ... 12.1.21.212
3221 ... 1.21.21.212
3311 ... 1.2.121.212
?:
```
\$\endgroup\$
5
  • \$\begingroup\$ Doh! In main() should change to u32 l = (u32)strspn( buf, "0123456789" ); to simply use whatever contiguous digits appear at the beginning of the user's input. If someone enters "1234apple\n" then work with "1234"... Doh!... \$\endgroup\$
    – Fe2O3
    Commented Jan 21 at 9:31
  • 1
    \$\begingroup\$ hey there hematite, i think you have couple of problems in your code, first in while( (*tpd = *tps++) != NULL ) you are comparing an integer to a pointer (NULL), second is strtol( po, 0, NULL ) the 3rd param is the base of the number, i hope you can fix this please, because im unable to run your code for testing and debugging \$\endgroup\$
    – HellBoy
    Commented Jan 21 at 14:42
  • \$\begingroup\$ @HellBoy Yes. Thank you. The first was just dumb, written in haste and 'invisible' because the program worked. The last was from my laziness, presuming I remembered the parameter sequence to strtol()... Sh*t happens, eh? My (older) compiler knew what I meant, and didn't flag these issues... \$\endgroup\$
    – Fe2O3
    Commented Jan 21 at 19:23
  • 1
    \$\begingroup\$ @Fe2O3 "for any/all feedback." looks like the goal: A request for any and all feedback. Consider leading with that. As it is now, it takes a long read to know your review goal. \$\endgroup\$
    – chux
    Commented Jan 22 at 7:29
  • \$\begingroup\$ Hey @chux !! Thank you for the tip! Thing is, my long-winded explanation and exposition of how this got to be what it is seems (to me) to be central to this posting. While I did write "any/all feedback", pointing out where const or static are absent wouldn't address what I feel is my main concern: "When/where and how might anyone else ever use an approach similar to this outside of a 'single user' code-site challenge?" The lists of possible transformations feel similar to one slice of an Enigma device... Nothing new, I'm sure, but fun to code nonetheless... :-) Thank you again... \$\endgroup\$
    – Fe2O3
    Commented Jan 22 at 8:07

2 Answers 2

4
\$\begingroup\$

Alternate

Just for fun, a non-template recursive approach.

Additional buffer memory size: 24 bytes.

Also a more expansive test suite.

Easy to expand to more than 4 octets.

#include <assert.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>

static int leet93r(char *buffer, size_t current, const char *s,
    int octets_to_add) {
  if (octets_to_add <= 0) {  // Any need to add more octets?
    if (s[0]) {
      return 0;
    }
    buffer[current] = '\0';
    printf("\"%s\"\n", buffer);
    return 1;
  }

  if (current > 0) {  // Is a separator needed?
    buffer[current++] = '.';
  }

  octets_to_add--;

  // Octet value of 0 only allowed with 1 digit.
  if (*s == '0') {
    buffer[current++] = *s++;
    return leet93r(buffer, current, s, octets_to_add);
  }

  //octets_to_add--;
  int count = 0;
  int value = 0;
  while (*s) {
    value = value * 10 + (*s - '0');
    if (value > 255) {
      break;
    }

    buffer[current++] = *s++;
    count += leet93r(buffer, current, s, octets_to_add);
  }

  return count;
}

#define leet93r_LENGTH_MIN 1 // This could be as large as 4.
#define leet93r_LENGTH_MAX 20 // This could be as small as 12.
#define leet93r_OCTET_N 4
#define leet93r_BUFFER_SIZE (leet93r_LENGTH_MAX + (leet93r_OCTET_N - 1) + 1)

int leet93(const char *s) {
  // Better to call `is...()` functions with a certain unsigned char value.
  const unsigned char *us = (const unsigned char*) s;

  int length = 0;
  while (isdigit(us[length])) {
    length++;
    if (length > leet93r_LENGTH_MAX) { // Too long?
      return 0;
    }
  }

  if (length < leet93r_LENGTH_MIN) { // Too short?
    return 0;
  }

  if (us[length] != '\0') { // Non-numeric?
    return 0;
  }

  char buf[leet93r_BUFFER_SIZE] = "";
  return leet93r(buf, 0, s, leet93r_OCTET_N);
}

int leet93_test(int expected, const char *s) {
  int count = leet93(s);
  printf("Combination count:%2d, expected count:%2d of \"%s\"\n\n", //
      count, expected, s);
  return count != expected;
}

int main() {
  int failures = 0;
  failures += leet93_test(0, "");
  failures += leet93_test(0, "123456789012345678901234567890");
  failures += leet93_test(0, "192168@11");

  failures += leet93_test(1, "02312301");
  failures += leet93_test(19, "12121212");

  failures += leet93_test(4, "012201");
  failures += leet93_test(9, "19216811");
  failures += leet93_test(0, "0011255245");
  failures += leet93_test(2, "1921681312");

  failures += leet93_test(0, "000");
  failures += leet93_test(1, "0000");
  failures += leet93_test(0, "00000");

  printf("Failures count: %d\n", failures);
  return failures ? EXIT_FAILURE : EXIT_SUCCESS;
}

Output

Combination count: 0, expected count: 0 of ""

Combination count: 0, expected count: 0 of "123456789012345678901234567890"

Combination count: 0, expected count: 0 of "192168@11"

"0.231.230.1"
Combination count: 1, expected count: 1 of "02312301"

"1.2.121.212"
"1.21.21.212"
"1.21.212.12"
"1.212.1.212"
"1.212.12.12"
"1.212.121.2"
"12.1.21.212"
"12.1.212.12"
"12.12.1.212"
"12.12.12.12"
"12.12.121.2"
"12.121.2.12"
"12.121.21.2"
"121.2.1.212"
"121.2.12.12"
"121.2.121.2"
"121.21.2.12"
"121.21.21.2"
"121.212.1.2"
Combination count:19, expected count:19 of "12121212"

"0.1.2.201"
"0.1.220.1"
"0.12.20.1"
"0.122.0.1"
Combination count: 4, expected count: 4 of "012201"

"1.92.168.11"
"19.2.168.11"
"19.21.68.11"
"19.216.8.11"
"19.216.81.1"
"192.1.68.11"
"192.16.8.11"
"192.16.81.1"
"192.168.1.1"
Combination count: 9, expected count: 9 of "19216811"

Combination count: 0, expected count: 0 of "0011255245"

"192.168.13.12"
"192.168.131.2"
Combination count: 2, expected count: 2 of "1921681312"

Combination count: 0, expected count: 0 of "000"

"0.0.0.0"
Combination count: 1, expected count: 1 of "0000"

Combination count: 0, expected count: 0 of "00000"

Failures count: 0

Rev 2

OP also wanted to reduce the probe count.

Update leaves the recursive function early if the remaining_length < octets_to_add or remaining_length > 3*octets_to_add.

Below is some cobbled together code (it deserves later clean-up) that meets that additional goal.

int probe = 0;

static int leet93r(int remaining, char *buffer, size_t current, const char *s,
    int octets_to_add) {

  if (octets_to_add <= 0) {  // Any need to add more octets?
    probe++;
    if (s[0]) {
      return 0;
    }
    buffer[current] = '\0';
    printf("\"%s\"\n", buffer);
    return 1;
  }

//  int remaining = length - current;
  if (remaining < octets_to_add || remaining > octets_to_add*3) {
    return 0;
  }

  if (current > 0) {  // Is a separator needed?
    buffer[current++] = '.';
  }

  octets_to_add--;

  // Octet value of 0 only allowed with 1 digit.
  if (*s == '0') {
    probe++;
    buffer[current++] = *s++;
    remaining--;
    return leet93r(remaining, buffer, current, s, octets_to_add);
  }

  //octets_to_add--;
  int count = 0;
  int value = 0;
  while (*s) {
    probe++;
    value = value * 10 + (*s - '0');
    if (value > 255) {
      break;
    }

    remaining--;
    buffer[current++] = *s++;
    count += leet93r(remaining, buffer, current, s, octets_to_add);
  }

  return count;
}

#define leet93r_LENGTH_MIN 1 // This could be as large as 4.
#define leet93r_LENGTH_MAX 20 // This could be as small as 12.
#define leet93r_OCTET_N 4
#define leet93r_BUFFER_SIZE (leet93r_LENGTH_MAX + (leet93r_OCTET_N - 1) + 1)

int leet93(const char *s) {
  // Better to call `is...()` functions with a certain unsigned char value.
  const unsigned char *us = (const unsigned char*) s;

  int length = 0;
  while (isdigit(us[length])) {
    length++;
    if (length > leet93r_LENGTH_MAX) { // Too long?
      return 0;
    }
  }

  if (length < leet93r_LENGTH_MIN) { // Too short?
    return 0;
  }

  if (us[length] != '\0') { // Non-numeric?
    return 0;
  }

  char buf[leet93r_BUFFER_SIZE] = "";
  return leet93r(length, buf, 0, s, leet93r_OCTET_N);
}
```
\$\endgroup\$
4
  • \$\begingroup\$ Nice! Feels "light weight" and zippy! Kool, with a Kapital 'K'... :-) Still 'probes' all 81 prospects for a 12 digit string, though. I'm mucking with IPv6 version. 81 becomes 729... Brute force is brutal!! :-) Thanks for posting (for posterity.) Nice and clean!! :-) \$\endgroup\$
    – Fe2O3
    Commented Jan 25 at 20:13
  • 1
    \$\begingroup\$ Hmm, idea for a 2nd edition takes into account the string length. Fails early when remaining_length < octets_to_add or remaining_length > 3*octets_to_add. \$\endgroup\$
    – chux
    Commented Jan 25 at 21:22
  • \$\begingroup\$ Yes. When length = 12, grabbing 1 or 2 means 11 or 10 left over. 3 more octets will only hold 9, maximum... Some like gardening. 'Pruning' in algorithms is almost as delightful! :-) Cheers! \$\endgroup\$
    – Fe2O3
    Commented Jan 25 at 21:26
  • 1
    \$\begingroup\$ Some pruning code posted. \$\endgroup\$
    – chux
    Commented Jan 25 at 21:37
3
\$\begingroup\$

This is a self-answer after seeking out some bad coding and some micro-optimisations


Get current

The two problems reported by @HellBoy would have been corrected much earlier if the OP had been using a modern (more stringent) compiler. Don't go on living in the past.
(Thanks extended to @HellBoy!)


DRY reduces hazards

The block of code (and functions) that are pre-screening a candidate digit string and its current template(s) could be improved.

While this project is for the author's eyes only, this section of code is hard to read.
Multiple unusual (tracing) invocations of printf() and the use of the comma operator merely to avoid using braces. Tut, tut... What you do on your own time is your own affair, but...

void chkCmprs( int (*fnc)( u16 ), u16 *pd, char *str ) { // renamed & another parameter
    if( fnc( *pd ) ) {
        // Single instance of "tracing" output
        // Marked and indented. Easier to read
/**/    printf( "%o ... rejecting: %s\n", *pd, str );
        for( u16 *ps = pd; (*pd = *++ps) != 0; pd += !fnc( *pd ) ) {/**/}
    }
}
...
// Two sample calls
            if( s[0] == '0' )
                chkCmprs( oct1_01x, tp, "1st octet would be '0x' or '0xx'" );

            if( s[l-2] == '0' )
                chkCmprs( oct4_010, tp, "4th octet would be '0x'" );

DRY! This revision removes the critical repeat of calling the oct#_???() function.


Micro-optimise for speed

Having recognised that 'length = 4' will always yield one legal IPv4 string (with no chance for an invalid leading '0'), perhaps this special case can be dealt with as special:

void genIPv4s( char *s, u32 l ) {
/**/printf( "'%s' - length %u\n", s, l );

    if( l == 4 ) {
        char out4[ sizeof "#.#.#.#" ]; // exclusive for special IPv4
        sprintf( out4, "%c.%c.%c.%c", s[0], s[1], s[2], s[3] ); // create string
/**/    printf( "%o ... %s\n", 01111, out4 ); // tracing OR add to collection
        return;
    }

Note how the allocation of the required bytes for the buffer is handled. Let the compiler workout how much is required (rather than risking human frailty getting the numbers wrong.)

The obuf[] used below should implement a '12 digit' version of this sizeof technique.

Then go back and remove (or, better, simply // comment out) the first "length = 4" row of templates (there's just 1 template and 19 implicit 0s), and tweak the following line:

memcpy( wrk, templates[ l - 5 ], sizeof wrk ); // 1st template group for length = 5

Sidenote: in this case, casually using sprintf() from the ?printf() family of functions may be using a bazooka to kill a fly. While they are fantastic functions written by wizards, their generality & flexibility come at a cost (analogous to Python's many 'black box' conveniences).
Eg: finding the (several?) '%' character(s), then dispatching each to the appropriate 'handler'.
For raw speed, specific assignment statements like
obuf[0] = s[0];
obuf[2] = s[1];
...
obuf[1] = obuf[3] = obuf[5] = '.';
would likely execute more quickly as they are 'built-for-purpose' and, therefore, extremely restricted.


Micro-micro-optimise for speed

Replace strtol() with strtoul().

The problem deals only with positive integers. The 'black box' of the former possibly uses extra processing to account for a sign character. The project knows there will only be positive values...

Further development:
The maximum decimal value for any octet is 255, while the source digits are 1, 2 or 3 ASCII characters. It's tempting to consider reinventing ASCII to int conversion specific to this challenge to avoid, what in the olden days were costly *10 operations. (Modern processors are blindingly fast!)

Just to mention another dusty concept that could be used: Binary Coded Decimal.

Some trivial subraction and shifting and OR'ing (especially involving a union) could quickly collect 2 or 3 ASCII digits into a packed BCD representation to compare against 0x10 and 0x255. This would prove no leading zero for 2 digit octets, and ensure the ceiling hasn't been exceeded for 3 digit octets. All without involving (used to be 'expensive') multiplication. (With only 3 digits in play, even a 32 bit 'unpacked BCD' could be derived and compared to a constant without shifting or OR'ing.) Something to think about.


Scope (needs mentioning)

templates[][] is declared at file scope but only referenced within genIPv4s().

This is forgivable (perhaps) in this tiny, toy project, but would be objectionable were this part of a larger project, or with more functions performing different tasks.

Just something to keep in mind.


Need for (even more) speed

The suggested chkCmprs() improves (may shorten) execution time by possibly compressing-out the current and subsequent templates that are preordained to fail to yield valid IPv4 strings. (Possibly) fewer rough cuts are formed when some don't pass final QC.

"For speed and simplicity", there is an 'cheap'n'dirty' test that 3 digits destined to fill the 1st or 4th octet will be > '299'. This is okay, as far as it goes, but (without doing the work) a 4th octet ("xx.285") is allowed through (work being done) until it is detected and the output built is rejected. And, this foolishness may be repeated when applying subsequent templates.

When a template calls for 3 digits in the 1st octet it is guaranteed a subsequent template will call for 3 digits in the 4th octet. Preventing a "leading zero" is trivial. The more costly atoi()-ish evaluation of 3 digit 'bookend' octets need only be done once, and the consequences 'remembered' for each iteration trying another template. The reader is challenged to implement, using a two 3-state 'flags' capable of indicating "unknown | le255good | gt255bad", optimisation of the "pre-screening" squeezing the most performance out of this 'toy' project.

A great many real world projects present opportunities and/or requirements for ultra high-performance but delicate processing.


Wood and trees

Inside the major loop, pre-screening occurs when a template is considered for the (unchanging) string of digits passed to the function. If an unworkable characteristic of a template is discovered, this one and subsequent templates with the same characteristic are "compressed" out of the array.

Consider: with "len = 4" out of the picture, every "length flavour" group of templates will call for a 2 digit 1st & 4th octet. The "leading zero" is useful. Every "flavour" "len >= 6" will call for at least one 3 digit 1st and 4th octet. This is predictable.

Consider promoting the "pre-screening" of templates to happen once prior to entering the major loop. Done properly, the (potentially shrunken) collection of templates to be applied inside the loop will have been 'optimised' to not require that picket fence of if() conditions to be applied over-and-over. (chkCmpr() would be changed to srchNelim().)

When s = "0xxxx..", pro-actively squelch templates calling for 2 or 3 digits in the 1st octet.
When s = "..xxx0x", pro-actively squelch templates calling for 2 digits in the 4th octet.
Do the detection and possible elimination ONCE outside the loop.

"DRY" applies to source code AND to the efficient execution of an algorithm.

\$\endgroup\$
5
  • \$\begingroup\$ I see the template idea in novel, but looks like non-trivial work to expand to 6 or 8 octets. \$\endgroup\$
    – chux
    Commented Jan 25 at 21:53
  • \$\begingroup\$ IPv4 tops out at ~4.3billion addresses, with (as shown) 81 templates. Length 4 or 12 need try "fitting into" only 1 template each. Easy knowledge of (worst case) length 8 need only try "fitting into" 19 different templates. All are better than brute force attempting 81 arrangements. IPv6 lifts the ceiling from (3^4) 81 to (3^6) 729 permutations. Ouch!! IPv6 worst case (length = 12) works with 141 templates (instead of 729 'probing' trials). ... This old man STILL thinks multiplication requires 'n' machine cycles (1960's ALUs)... [1/2] \$\endgroup\$
    – Fe2O3
    Commented Jan 25 at 22:07
  • \$\begingroup\$ [2/2] Improving "pre-screening" for 'bad' interior octets in order to cull unworkable templates for "this particular candidate digit string" is an interesting exercise to fill otherwise vacant hours. :-) (Don't want to even think about IPv8... Shudder!!!) :-) \$\endgroup\$
    – Fe2O3
    Commented Jan 25 at 22:09
  • \$\begingroup\$ @chux After lots of pondering, seems to me the biggest advantage of templates (with opportunities to 'cull') comes from easy validation of the last (4th) octet... This can be determined before assembling interior octets. "No point in doing work if it will fail during the last steps." Almost makes one want to 'build' from each end, to meet in the middle... :-) How bizarre, how bizzare \$\endgroup\$
    – Fe2O3
    Commented Jan 25 at 22:52
  • \$\begingroup\$ @chux ... or maybe "middle out" might hold promise... :-) \$\endgroup\$
    – Fe2O3
    Commented Jan 28 at 8:30

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.