2
$\begingroup$

I have a data set with 5 variables,

a b c d e
1 0 0 1 0
0 1 0 1 1
0 1 1 0 0
0 0 0 1 0
1 1 1 0 0
0 1 1 0 1
1 0 1 0 0
1 0 0 1 1
0 1 0 1 1
0 0 1 1 0

I am only interested in the percentages of occurrence,

occurrence,

| a | b | c | d | e |
.4 | .5 | .5 | .6 | .4

BUT, I would like to visualize in such a way that I can see the overlap, or not, among all the different groups.

Any idea?

$\endgroup$
2
  • $\begingroup$ so something like a frequency graph for all possible combinations? like a, b, c, d, e, ab, ac.....? $\endgroup$ Commented Oct 1, 2020 at 16:39
  • $\begingroup$ Well, yes. But that's the problem, how to visualize 32 combinations. $\endgroup$ Commented Oct 1, 2020 at 16:47

3 Answers 3

3
$\begingroup$

enter image description hereIf you have richer data (ie more than 10 rows), you will want an upset plot. Upset plots are a way to view information in an intuitive way like a Venn diagram, but is more useful for 4+ categories.

Some references which may give you some ideas and implementation in R:

$\endgroup$
1
$\begingroup$

Since the combinations are known, we can use some knowledge of binary numbers and use this to find come up with a frequency plot

Basically - convert the binary string to integer and get a frequency plot based on the integer values

import numpy as np
import pandas as pd
from itertools import product
import matplotlib.pyplot as plt

# test data, 1 of every 32 combinations
combs = np.array(map(list, product([0, 1], repeat=5)))
# store in dataframe
df = pd.DataFrame(data={'a': combs[:, 0], 'b': combs[:, 1], 'c': combs[:, 2], 'd': combs[:, 3], 'e': combs[:, 4]})
# concatenate the binary sequences to strings
df['concatenate'] = df[list('abcde')].astype(str).apply(''.join, axis=1)

# to convert binary strings to integers
def int2(x):
    return int(x, 2)

# every combination has a unique value
df['unique_values'] = df['concatenate'].apply(int2)

# prepare labels for the frequency plot
variables = list('abcde')
labels = []
for combination in df.concatenate:
    tmp = ''.join([variables[i] for i, x in enumerate(combination) if x != '0'])
    labels.append(tmp)

fig, ax = plt.subplots()
counts, bins, patches = ax.hist(df.unique_values, bins=32, rwidth=0.8)

# turn of the
plt.tick_params(
    axis='x',          # changes apply to the x-axis
    which='both',      # both major and minor ticks are affected
    top=False,         # ticks along the top edge are off
    labelbottom=False)

# calculate the bin centers
bin_centers = 0.5 * np.diff(bins) + bins[:-1]
ax.set_xticks(bin_centers)
for label, x in zip(labels, bin_centers):
    # replace integer mapping with the labels
    ax.annotate(str(label), xy=(x, 0), xycoords=('data', 'axes fraction'),
        xytext=(0, -5), textcoords='offset points', va='top', ha='center', rotation='30')

plt.show()

enter image description here

$\endgroup$
1
  • $\begingroup$ +1 Indeed this is s possibility but I was looking for something simpler to visualize. $\endgroup$ Commented Oct 3, 2020 at 16:14
1
$\begingroup$

With Wolfram Language you may use AbsoluteCorrelation.

With

t = {
     {1, 0, 0, 1, 0}, {0, 1, 0, 1, 1}, 
     {0, 1, 1, 0, 0}, {0, 0, 0, 1, 0}, 
     {1, 1, 1, 0, 0}, {0, 1, 1, 0, 1}, 
     {1, 0, 1, 0, 0}, {1, 0, 0, 1, 1}, 
     {0, 1, 0, 1, 1}, {0, 0, 1, 1, 0}
    }

Then

MatrixForm[ac = AbsoluteCorrelation[t]] 

Mathematica graphics

Where the diagonals are the marginal column frequencies and the off-diagonals the joint frequencies. That is for ac[[1,1]] variable a occurs with frequency 0.4 and for ac[[1,2]] (row 1, column 2) variable a occurs jointly with variable b with frequency 0.1

This can be visualised with MatrixPlot or ArrayPlot.

MatrixPlot[
 ac 
 , FrameTicks -> {Transpose@{Range@5, CharacterRange["a", "e"]}}
 , PlotLegends -> Automatic]

Mathematica graphics

Hope this helps.

$\endgroup$
2
  • $\begingroup$ But isn't this a pairwise correlation? So, the matrix has 25 elements which are pair combinations of occurrence but has no information beyond pairwise. $\endgroup$ Commented Oct 7, 2020 at 6:18
  • $\begingroup$ @myradio Correct. I took your "overlap" to mean joint frequency. You only need the upper or lower triangle of the matrix since it is symmetric. $\endgroup$ Commented Oct 7, 2020 at 9:56

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.