This is posted in response to this request made in a discussion of my "C-ish" answer to a recent Code Review question regarding a Python solution to a Leetcode challenge. It was reported that the OP's Python code functioned correctly, but did not rate well in its memory used. (The entire Q&A of that Python posting can be found where this linked request appears.)
For completeness, here is that OP's version of the Leetcode problem statement:
Definition: A valid IP address is defined to consist of exactly four integers separated by dots. Each integer is inclusive between 0 and 255.
- For example,
'0.1.2.201'
and'192.168.1.1'
are valid IP addresses,
while'0.011.255.245'
,'192.168.1.312'
and'[email protected]'
are all invalid IP addresses.Goal: Given a string
s
containing only digits, return all possible valid IP addresses that can be formed by inserting dots ins
. It is not allowed to remove/reorder digits ins
, however the output can be returned in any order.Constraints:
- \$1\le s.\operatorname{length}\le20\$
- All of the characters of \$s\$ are digits.
The first version of my answer proposed, in text, an alternative approach to solving the Leetcode challenge (perhaps being less cavalier about Python's "black box" memory/string allocations.) The updated version (added second half) was inspired by a comment mentioning "templates." (Thank you @Reinderien!)
"The proof is in the pudding", so posted here is the current 'testbed' evolution of a possible C language submission to Leetcode. For yesterday's answer, I'd used one extravagant array of 81x32bit elements and a second array of offsets and counts. This has been upgraded to a single 2D array of 16 bit elements. Each 'row' (major dimension) contains all the "octal templates" to be applied to a candidate digit string of fixed length (4 to 12 digits). Elements on the minor dimension ('cols') are a 'zero-terminated' collection of values (binary bits expressed in readable octal), each representing a distinct and unique "template" for apportioning of 'n' ASCII digits to each of the 4 octets of a legal IPv4 address.
My motivation for posting this is to receive feedback on the ("template") approach I have taken. The code is congested with explanatory comments for sharing here on Stack Exchange. A 'toy' project such as a Leetcode challenge would normally only be seen by the coder and by Leetcode's compiler.
A simple "driver" (main()
) accepts an entry, does NO verification that it is a contiguous digit string, then hands it to genIPv4s()
that attempts to make as many legal addresses as it can.
Consider the 8-digit candidate string "01212301"
.
Brute forcing will try to assemble 3x3x3x3 (81) arrangements.
Knowing there are 19 templates for any 8 digit candidate, we see 75% of that effort is predictably futile.
Recursive code (with short-circuiting) improves on this, able to 'early return' when it recognises 01.2xxx
and 012.1xxx
are futile (hopeless) variations that need no further probing.
The approach below contains 19 templates for any 8 digit candidate. These templates can be used to simply and quickly "pre-screen" their utility (low cost). Among those 19 octal templates are several 0xxx2
and 0xxx3
entries that correlate with, 2 digits and 3 digits respectively to makeup the FIRST octet.
Important: The simplest access to sequential fields of bits is to let the 'rightmost' bits correspond to the 'leftmost' position in an array (like an IPv4 octet). In this example, b2
, b1
and b0
are interpreted as "how many digits (1, 2 or 3) to use to fill the leftmost octet of the IPv4 string." Take time to absorb this, please...
Next, recognise that only b1
and b0
of any trio of bits is used (ie. 0b001
to 0b011
). At 'useful' 2 bits for each of 4 octets, a single 8-bit byte would suffice. However, the attraction of C's base8 octal being "built-in" is more attractive than scrimping to use only base4 bit regions. The octal based templates in source code can be readily recognised and understood.
Just have to remember what is "L-to-R" and what is "R-to-L". Eliminating this mirroring isn't hard but I didn't bother with that extra nicety in this code.
There are two more ways potential "execution time" savings have made by using templates.
The code contains a few prototypes of simple early detection trying to avoid the work of assembling an octet like ".01." only to discover conversion to an integer then back to a string does not yield the same string. This is easy to do with either end of string, and would be possible (but more difficult) for the inner two octets. Likewise, 1st or 4th octets that would be greater than "299"
are very easy to detect and short-circuit. (Adopters are welcome to tighten that up so any 1st or 4th octet greater than "255"
would short-circuit without any form of atoi()
.) The few oct#_0XX()
functions herein provide a glimpse toward other simple "early detection" possibilities; this is not all that can be done, nor optimised for "least amount of code reproduction".
Please note that sequentially populating the four octets of an IPv4 string means sequentially using source string digits (validating-as-we-go OR validating-after-assembly). The predetermined template allows the code to pre-screen for 'invalidity' of even the 4th octet BEFORE any significant work has been done.
A final (significant?) advantage of using templates has been included in the code...
Consider a candidate string of 8 digits...
There are 19 templates (arrangements) of 8 digits that could be valid IPv4 strings.
Imagine early detection of, say, ".456"
as the invalid 4th octet from this template.
NOW, it is trivial to scan to the end of this collection of templates, somehow 'erasing' any that call for 3 digits in the 4th octet. (For 8 digits, there would be 6 more ill-fated templates coming-up.) When a given candidate string of digits presenting difficulties can be detected, an attempt is made to "compress out of existence" other templates having the same characteristic.
Note: Processing of 1 to 19 templates appropriate to the current candidate's "length" is performed on a temporary COPY of those templates, renewed appropriately for each new candidate string.
There's a lot happening here...
The code is full of explanatory comments of dubious utility and tracing print statements for demonstration purposes only.
Thank you for any/all feedback.
(FWIW: I've no intention of signing-up for Leetcode.)
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
typedef unsigned int u32;
typedef unsigned short u16;
// NB: with max of 19, dimension 20 ensures each 'row' is 0 terminated
// Allows simple dectection of end of a flavour's collection (and more!)
const u16 templates[][20] = { // 3x3x3x3 => 81 templates
// 4 digits - 1 template
{ 01111, },
// 5 digits - 4 templates
{ 01112, 01121, 01211, 02111, },
// 6 - 10 templates
{ 01113, 01122, 01131, 01212, 01221,
01311, 02112, 02121, 02211, 03111, },
// 7 - 16
{ 01123, 01132, 01213, 01222, 01231, 01312, 01321, 02113,
02122, 02131, 02212, 02221, 02311, 03112, 03121, 03211, },
// 8 - 19
{ 01133, 01223, 01232, 01313, 01322, 01331, 02123, 02132, 02213,
02222,
02231, 02312, 02321, 03113, 03122, 03131, 03212, 03221, 03311, },
// 9 - 16
{ 01233, 01323, 01332, 02133, 02223, 02232, 02313, 02322,
02331, 03123, 03132, 03213, 03222, 03231, 03312, 03321, },
// 10 - 10
{ 01333, 02233, 02323, 02332, 03133,
03223, 03232, 03313, 03322, 03331, },
// 11 - 4
{ 02333, 03233, 03323, 03332, },
// 12 - 1 template
{ 03333, },
};
void erase( u16 *tpd, int (*fnc)( u16 ) ) {
// Template is to be screened out, try to screen out same others that follow.
// Current template is overwritten, so caller's pointer must remain same for next attempt.
// May leave abandoned original templates beyond relocated sentinel value. Who cares?
u16 *tps = tpd + 1;// point to next in collection
while( (*tpd = *tps++) != 0 ) // copy until exhausted. (May overwrite erasing others)
tpd += !fnc( *tpd ); // only increment if transplanted something different
// NB: 'Compresses' WORKING COPY buffer, not original
}
// octet 1
int oct1_01x( u16 t ) { return (t & 02) == 02; } // 0b010 works for 2 and 3
int oct1_011( u16 t ) { return (t & 03) == 03; } // 0b011 works for 3
// octet 4
int oct4_010( u16 t ) { return (t & 03000) == 02000; }
int oct4_011( u16 t ) { return (t & 03000) == 03000; }
void genIPv4s( char *s, u32 l ) { // digit string, its length
printf( "'%s' - length %u\n", s, l ); // tracing
u16 wrk[ sizeof templates[ 0 ] ]; // working copy of this flavour's templates
memcpy( wrk, templates[ l - 4 ], sizeof wrk );
for( u16 *tp = wrk; *tp; ) { // each template for 'l' digits
// skip pre-emptive screening for len = 4, 5. nothing to be gained.
if( l > 5 ) {
u16 started = *tp; // template bit pattern BEFORE possible compress/erasure
// WARNING: 'comma operator' used in this block.
if( s[l-2] == '0' && oct4_010( *tp ) )
printf( "%o ... %s\n", *tp, "rejecting because last octet would be '0x'" ),
erase( tp, oct4_010 );
if( s[l-3] >= '3' && oct4_011( *tp ) )
printf( "%o ... %s\n", *tp, "rejecting because last octet would be '>299'" ),
erase( tp, oct4_011 );
if( s[l-3] == '0' && oct4_011( *tp ) )
printf( "%o ... %s\n", *tp, "rejecting because last octet would be '0xx'" ),
erase( tp, oct4_011 );
// screen for "0x" or "0xx" in 1st octet
if( s[0] == '0' && oct1_01x( *tp ) )
printf( "%o ... %s\n", *tp, "rejecting because first octet leading zero" ),
erase( tp, oct1_01x );
// crude screen for "[3-9]xx". won't catch "257". Do work and catch later
if( s[0] >= '3' && oct1_011( *tp ) )
printf( "%o ... %s\n", *tp, "rejecting because first octet would be >299" ),
erase( tp, oct1_011 );
if( started != *tp ) // didn't work! switched template!
continue; // go back and start over with this guy
}
printf( "%o ... ", *tp ); // tracing
char *ps = s; // beginning of digit string
char obuf[ 3+1 + 3+1 + 3+1 + 3+1 ] = {0}, *po = obuf; // IPv4 assembly area
int ok = 1; // optimism (could be bool)
// for each octet
for( u16 tmpl = *tp; ok && tmpl; tmpl >>= 3 ) { // R-to-L thru pattern
u32 nd = tmpl & 7; // # of digits in this octet
// easy, coarse-grain filtering before forming bad octet
if( ( nd > 1 && *ps == '0' ) // 2 or 3 digits with leading zero? Bad!
|| ( nd > 2 && *ps > '2' ) ) { // 3xx, 4xx, ...? Bad!
printf( "Got '%s' but '%.*s' won't work", obuf, nd, ps ); // tracing
ok = 0;
break;
}
memcpy( po, ps, nd ); // copy 1-3 digits
ok = strtol( po, NULL, 0 ) < 256; // binary value in range
ps += nd, po += nd, *po++ = '.'; // advance pointers and append '.'
}
po = ok ? po - 1 : obuf; // terminate, or clobber
*po = '\0'; // always terminated
puts( obuf ); // for tracing, good or bad...
// for Leetcode would only "remember" okay IPv4 strings
tp++; // NB: on to next template...
}
}
int main( void ) {
char buf[ 64 ]; // big enough?
while( printf( "?: " ) && fgets( buf, sizeof buf, stdin ) ) {
u32 l = (u32)strcspn( buf, "\n" );
buf[ l ] = '\0'; // trim the LF
if( l == 0 ) break;
if( l < 4 || l > 12 )
printf( "length (%u) must be 4 to 12 digits\n", l );
else
genIPv4s( buf, l ); // no verification of all digits, only digits.
}
return 0;
}
Sample output for 3 slightly different 8 digit candidates:
?: 02312301
'02312301' - length 8
1133 ... rejecting because first octet leading zero // Consider how many templates have been 'erased'
1331 ... 0.231.230.1
2231 ... rejecting because last octet would be '0x'
3131 ... rejecting because last octet would be '>299'
?: 12312301
'12312301' - length 8 // for comparison
1133 ... 123.123.0.1
1223 ... 123.12.30.1
1232 ... Got '12.' but '312' won't work // example of "futile" partial effort aborted
1313 ... 123.1.230.1
1322 ... 12.31.230.1
1331 ... 1.231.230.1
2123 ... rejecting because last octet would be '0x'
3113 ... rejecting because last octet would be '>299'
?: 12121212
'12121212' - length 8 // All templates used
1133 ... 121.212.1.2
1223 ... 121.21.21.2
1232 ... 12.121.21.2
1313 ... 121.2.121.2
1322 ... 12.12.121.2
1331 ... 1.212.121.2
2123 ... 121.21.2.12
2132 ... 12.121.2.12
2213 ... 121.2.12.12
2222 ... 12.12.12.12
2231 ... 1.212.12.12
2312 ... 12.1.212.12
2321 ... 1.21.212.12
3113 ... 121.2.1.212
3122 ... 12.12.1.212
3131 ... 1.212.1.212
3212 ... 12.1.21.212
3221 ... 1.21.21.212
3311 ... 1.2.121.212
?:
```
main()
should change tou32 l = (u32)strspn( buf, "0123456789" );
to simply use whatever contiguous digits appear at the beginning of the user's input. If someone enters"1234apple\n"
then work with"1234"
... Doh!... \$\endgroup\$while( (*tpd = *tps++) != NULL )
you are comparing an integer to a pointer (NULL), second isstrtol( po, 0, NULL )
the 3rd param is the base of the number, i hope you can fix this please, because im unable to run your code for testing and debugging \$\endgroup\$const
orstatic
are absent wouldn't address what I feel is my main concern: "When/where and how might anyone else ever use an approach similar to this outside of a 'single user' code-site challenge?" The lists of possible transformations feel similar to one slice of an Enigma device... Nothing new, I'm sure, but fun to code nonetheless...:-)
Thank you again... \$\endgroup\$