2
\$\begingroup\$

I have to reproduce some basic functions in C and one of them is strlcat.

After reading the man and testing the strlcat from <bsd/string.h> library, I wrote the following code which seems to be a working copy of the function:

#include <string.h>

size_t  ft_strlcat(char *dst, const char *src, size_t size)
{
    size_t  lsize;
    size_t  dsize;

    lsize = 0;
    dsize = strlen(dst);
    while (*dst && size > 0 && size--)
    {
        dst++;
        lsize++;
    }
    while (*src && size > 1 && size--)
        *dst++ = *src++;
    if (size == 1)
        *dst = '\0';
    return (dsize + lsize);
}

Is this implementation missing something?

\$\endgroup\$
2
  • 4
    \$\begingroup\$ This would be easier to review if you linked to the man page on the internet and quoted the most important parts here. It seems like strlcat keeps up to the first l characters from the combined destination and source strings. There are oddities here that may or may not be intended. \$\endgroup\$ Commented Nov 16, 2016 at 19:54
  • \$\begingroup\$ Here is the man: freebsd.org/cgi/… the most important part is the return value, it has to be right \$\endgroup\$ Commented Nov 16, 2016 at 20:26

4 Answers 4

6
\$\begingroup\$

Using FreeBSD strlcat as a guide.

  1. "length" with C strings typical does not include the null character. "size" usually does include the null character. Suggest name change

     // dsize = strlen(dst);
     dlen = strlen(dst);
    
  2. Code walks the array and repeatedly adjust 3 variables. Could be done adjusting only 2.

    while (*dst && size > 0 && size--) {
        dst++;
        lsize++;
    }
    // vs.
    size_t remaining_size = size;
    while (*dst && remaining_size > 0) {
      dst++;
      remaining_size--;
    }
    size_t lsize = size - remaining_size;
    
  3. The below could use memcpy() or strcpy(). These functions are typically optimized for speedy execution, especially long strings. The key performance assessment should be based on long strings and with sufficient space. Little need to optimize for undersized values of size.

     while (*src && size > 1 && size--)
       *dst++ = *src++;
     // vs.
     memcpy(dst, src, ...);
    
  4. Code does not handle overlapping strings. Should use restrict to indicate that.

     size_t ft_strlcat(char * restrict dst, const char * restrict src, size_t size);
    
  5. Pedantic code would test for overflow in dsize + lsize, but let us skip that corner case for now.

  6. Initialize when declaring, if possible.

     // size_t  lsize;
     // lsize = 0;
     size_t  lsize = 0;
    

Some lightly tested code follows.

size_t ft2_strlcat(char *restrict dst, const char *restrict src, size_t size) {
  size_t s_length = strlen(src);
  size_t remaining_size = size;

  while (*dst && remaining_size > 0) {
    dst++;
    remaining_size--;
  }
  // Length of destination is the lesser of `size` and 
  // offset of the null character, if any. 
  size_t d_length = size - remaining_size;

  if (d_length < remaining_size) {
    // We know there is at least room to write the \0

    size_t copy_length = remaining_size - 1;
    if (s_length < copy_length) {
      copy_length = s_length;
    }
    memcpy(dst, src, copy_length);
    dst[copy_length] = '\0';
  }
  return d_length + s_length;
}
\$\endgroup\$
7
  • \$\begingroup\$ I do like how you simplified and optimized the code but there is one problem. The return value is not right, you return the src length + dst length, the man says "The strlcpy() and strlcat() functions return the total length of the string they tried to create." So there should be returned the len of the new string and if no string is created it should return the src len. \$\endgroup\$ Commented Nov 16, 2016 at 20:39
  • \$\begingroup\$ @UnixCode Disagree. For strlcat() that means the initial length of dst plus the length of src.. Provide link to your man page. \$\endgroup\$ Commented Nov 16, 2016 at 20:43
  • \$\begingroup\$ freebsd.org/cgi/… \$\endgroup\$ Commented Nov 16, 2016 at 20:46
  • \$\begingroup\$ I just tested the code you provided with the original strlcat from bsd and the return value differs. Please check maybe I am worng \$\endgroup\$ Commented Nov 16, 2016 at 20:47
  • \$\begingroup\$ The issue is this "Note however, that if strlcat() traverses size characters without finding a NUL, the length of the string is considered to be size and the destination string will not be NUL-terminated (since there was no space for the NUL)." I'll look at that later. \$\endgroup\$ Commented Nov 16, 2016 at 20:50
4
\$\begingroup\$

Minor loop edits

Your first loop has a bit of a strange condition:

    while (*dst && size > 0 && size--)
    {
        dst++;
        len++;
    }

If size > 0 is true then size-- must always be true as well. So instead of adding size-- as a third condition, I would put it in the loop:

    while (*dst && size > 0)
    {
        dst++;
        len++;
        size--;
    }

Your second loop has the same kind of construct:

    while (*src && size > 1 && size--)
        *dst++ = *src++;

In this case, since the condition is size > 1, we can just simplify to this:

    while (*src && size-- > 1)
        *dst++ = *src++;

This change wasn't possible for the first loop because size-- > 0 would allow size to reach -1, which would be bad because size is an unsigned type, and you use its value in the next loop.

\$\endgroup\$
0
3
\$\begingroup\$

Always, always, always check for buffer overflows! One slipped through this code review.

One problem I don’t see others having mentioned, and that remains in your revised version, is that in the conditional,

while (*dst && size > 0 && size--)

and in the edited version:

while (*dst && size > 0)

You dereference *dst before checking that you are at the end of the array. This potentially causes the program to read the element one past the end of the buffer, which is Undefined Behavior. An example of what might go wrong is, if the buffer ends at a page boundary, one past the end could read the first byte of an unmapped page of memory and cause a segmentation fault.

As for the algorithm: On a modern CPU, copying a known number of bytes is more optimal than checking each character of a string and branching. I suggest you find the length of both src and dest (use strnlen_s() if you are allowed to, to avoid overrunning a possibly-unterminated input string, and otherwise reimplement that), calculate the number of bytes of src to copy and where to copy to with a bit of arithmetic, and use memcpy() (or memmove() if you really, truly need to allow the strings to overlap). That is both faster and safer.

\$\endgroup\$
0
\$\begingroup\$

The following code seems to be working fine (updated!):

size_t  ft_strlcat(char *dst, const char *src, size_t size)
{
    size_t  len;
    size_t  slen;

    len = 0;
    slen = strlen(src);
    while (*dst && size > 0) // added @JS1 edit
    {
        dst++;
        len++;
        size--;
    }
    while (*src && size-- > 1) //added @JS1 edit
        *dst++ = *src++;
    if (size == 1 || *src == 0) // **VERY IMPORTANT:** read update below
        *dst = '\0';
    return (slen + len);
}

I also renamed the variables as suggested. I tried to implement memcpy but that needs 2 extra variables and an other strlen call.

Tests:

My function

./a.out abc efgh 0

Concatenated string:abc Return value:4

./a.out abc efgh 5

Concatenated string:abce Return value:7

./a.out abc efgh 7

Concatenated string:abcefg Return value:7

./a.out abc efgh 8

Concatenated string:abcefgh Return value:7

./a.out abc efgh 9

Concatenated string:abcefgh Return value:7

Bsd function:

/a.out abc efgh 0

Concatenated string:abc Return value:4

./a.out abc efgh 5

Concatenated string:abce Return value:7

/a.out abc efgh 7

Concatenated string:abcefg Return value:7

./a.out abc efgh 8

Concatenated string:abcefgh Return value:7

./a.out abc efgh 9

Concatenated string:abcefgh Return value:7

It seems right now, please if you have any other suggestions or more simple methods post it.

Update:

After running tests this with big sizes: ./a.out abc efgh 102910 | cat -e

the output was strange:

Concatenated string:abcefghM-^? $

Return value:7 $

This was fixed by forcing the Terminator char by adding the condition *src == 0 to the if statement

\$\endgroup\$
2
  • \$\begingroup\$ Man used: freebsd.org/cgi/… \$\endgroup\$ Commented Nov 16, 2016 at 21:26
  • \$\begingroup\$ The while (*src && size-- > 1) *dst++ = *src++; does not need to test *src in the loop as size and slen are known prior and an adjustment to size could first be determined before the loop. \$\endgroup\$ Commented Nov 16, 2016 at 23:21

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.