Skip to main content
added 62 characters in body
Source Link
Sparr
  • 6.1k
  • 17
  • 14

Bitstring Family Trees

This challenge is reproduced from memory and my own solution, from a challenge that was posted in the job-application section of http://itasoftware.com before they were bought by Google. I reached out to ITA and Google a few years ago, after the acquisition, to ask to re-post this here (and on codegolf.com when it existed) and never heard back from them.

A bitstring is a string of 1s and 0s. Bitstrings reproduce asexually through a mutation-prone process, producing a child that is a copy of its parent but with each bit flipped with 25% probability. Starting with a list containing one bitstring, we repeatedly select one bitstring from the list at random, produce its child, and add that child to the list. This produces a tree where each bitstring has one parentlist of bitstrings, except for the originaleach of which (the root ofexcept the treefirst), and each bitstring may have 0 or more children has somewhere earlier in the list a parent from which it was mutated.

Now, the challenge. Your program will be presented with a list of bitstrings produced as described above, but the order of the list will be shuffled. You are to calculate the most likelyleast improbable family tree for the given bitstrings. If there are two or more such trees, choose any of them.

Your input can be in any useful format, including as a list/array of lists/strings as a function parameter or in a variable, already existing on the stack for a stack based language, or from stdin with delimiters but not operators, so four four-bit strings might be "1010\n1001\n1011\n0010" or "[1010,1001,1011,0010]" or even "4 1010100110110010".

Your output can be in any unambiguous format. The canonical format is a list of integers, where the nth integer in the list is the index of the nth provided bitstring's parent in the original list, and a sentinel value for the root entry. Another acceptable form could be an actual tree data structure. Either of these might be returned from a function, printed to stdout, left in a variable, or left on the stack of a stack based language.

The above two provisions should be interpreted with the context that this challenge is not about golfing the input and output code. It's about golfing the algorithmic logic.

For the example input above of 1010,1001,1011,0010 the most likely family tree is that the first entry is the root, the last two are children of the first, and the second is a child of the third, all three mutations involving a single bit flip out of four bits.

With the challenge I will provide a few data sets of different sizes (10 10-bit strings, 100 100-bit strings, maybe bigger) with their solutions.

Bitstring Family Trees

This challenge is reproduced from memory and my own solution, from a challenge that was posted in the job-application section of http://itasoftware.com before they were bought by Google. I reached out to ITA and Google a few years ago, after the acquisition, to ask to re-post this here (and on codegolf.com when it existed) and never heard back from them.

A bitstring is a string of 1s and 0s. Bitstrings reproduce asexually through a mutation-prone process, producing a child that is a copy of its parent but with each bit flipped with 25% probability. Starting with a list containing one bitstring, we repeatedly select one bitstring from the list at random, produce its child, and add that child to the list. This produces a tree where each bitstring has one parent, except for the original (the root of the tree), and each bitstring may have 0 or more children.

Now, the challenge. Your program will be presented with a list of bitstrings produced as described above, but the order of the list will be shuffled. You are to calculate the most likely family tree for the given bitstrings.

Your input can be in any useful format, including as a list/array of lists/strings as a function parameter or in a variable, already existing on the stack for a stack based language, or from stdin with delimiters but not operators, so four four-bit strings might be "1010\n1001\n1011\n0010" or "[1010,1001,1011,0010]" or even "4 1010100110110010".

Your output can be in any unambiguous format. The canonical format is a list of integers, where the nth integer in the list is the index of the nth provided bitstring's parent in the original list, and a sentinel value for the root entry. Another acceptable form could be an actual tree data structure. Either of these might be returned from a function, printed to stdout, left in a variable, or left on the stack of a stack based language.

The above two provisions should be interpreted with the context that this challenge is not about golfing the input and output code. It's about golfing the algorithmic logic.

For the example input above of 1010,1001,1011,0010 the most likely family tree is that the first entry is the root, the last two are children of the first, and the second is a child of the third, all three mutations involving a single bit flip out of four bits.

With the challenge I will provide a few data sets of different sizes (10 10-bit strings, 100 100-bit strings, maybe bigger) with their solutions.

Bitstring Family Trees

This challenge is reproduced from memory and my own solution, from a challenge that was posted in the job-application section of http://itasoftware.com before they were bought by Google. I reached out to ITA and Google a few years ago, after the acquisition, to ask to re-post this here (and on codegolf.com when it existed) and never heard back from them.

A bitstring is a string of 1s and 0s. Bitstrings reproduce asexually through a mutation-prone process, producing a child that is a copy of its parent but with each bit flipped with 25% probability. Starting with a list containing one bitstring, we repeatedly select one bitstring from the list at random, produce its child, and add that child to the list. This produces a list of bitstrings, each of which (except the first) has somewhere earlier in the list a parent from which it was mutated.

Now, the challenge. Your program will be presented with a list of bitstrings produced as described above, but the order of the list will be shuffled. You are to calculate the least improbable family tree for the given bitstrings. If there are two or more such trees, choose any of them.

Your input can be in any useful format, including as a list/array of lists/strings as a function parameter or in a variable, already existing on the stack for a stack based language, or from stdin with delimiters but not operators, so four four-bit strings might be "1010\n1001\n1011\n0010" or "[1010,1001,1011,0010]" or even "4 1010100110110010".

Your output can be in any unambiguous format. The canonical format is a list of integers, where the nth integer in the list is the index of the nth provided bitstring's parent in the original list, and a sentinel value for the root entry. Another acceptable form could be an actual tree data structure. Either of these might be returned from a function, printed to stdout, left in a variable, or left on the stack of a stack based language.

The above two provisions should be interpreted with the context that this challenge is not about golfing the input and output code. It's about golfing the algorithmic logic.

For the example input above of 1010,1001,1011,0010 the most likely family tree is that the first entry is the root, the last two are children of the first, and the second is a child of the third, all three mutations involving a single bit flip out of four bits.

With the challenge I will provide a few data sets of different sizes (10 10-bit strings, 100 100-bit strings, maybe bigger) with their solutions.

Source Link
Sparr
  • 6.1k
  • 17
  • 14

Bitstring Family Trees

This challenge is reproduced from memory and my own solution, from a challenge that was posted in the job-application section of http://itasoftware.com before they were bought by Google. I reached out to ITA and Google a few years ago, after the acquisition, to ask to re-post this here (and on codegolf.com when it existed) and never heard back from them.

A bitstring is a string of 1s and 0s. Bitstrings reproduce asexually through a mutation-prone process, producing a child that is a copy of its parent but with each bit flipped with 25% probability. Starting with a list containing one bitstring, we repeatedly select one bitstring from the list at random, produce its child, and add that child to the list. This produces a tree where each bitstring has one parent, except for the original (the root of the tree), and each bitstring may have 0 or more children.

Now, the challenge. Your program will be presented with a list of bitstrings produced as described above, but the order of the list will be shuffled. You are to calculate the most likely family tree for the given bitstrings.

Your input can be in any useful format, including as a list/array of lists/strings as a function parameter or in a variable, already existing on the stack for a stack based language, or from stdin with delimiters but not operators, so four four-bit strings might be "1010\n1001\n1011\n0010" or "[1010,1001,1011,0010]" or even "4 1010100110110010".

Your output can be in any unambiguous format. The canonical format is a list of integers, where the nth integer in the list is the index of the nth provided bitstring's parent in the original list, and a sentinel value for the root entry. Another acceptable form could be an actual tree data structure. Either of these might be returned from a function, printed to stdout, left in a variable, or left on the stack of a stack based language.

The above two provisions should be interpreted with the context that this challenge is not about golfing the input and output code. It's about golfing the algorithmic logic.

For the example input above of 1010,1001,1011,0010 the most likely family tree is that the first entry is the root, the last two are children of the first, and the second is a child of the third, all three mutations involving a single bit flip out of four bits.

With the challenge I will provide a few data sets of different sizes (10 10-bit strings, 100 100-bit strings, maybe bigger) with their solutions.