The ID3 algorithm

09 Aug 2022 - tsp
Last update 11 Aug 2022
Reading time 87 mins

TL;DR:

Introduction

The ID3 algorithm is a powerful older algorithm that allows one to build discrete decision trees. It’s a greedy machine learning algorithm that generates a decision tree out of a training dataset. ID3 has many different areas of application in data mining - for example it can be used to build a decision tree for medical diagnosis, medical therapy selection, discovering hidden classification rules in teaching out of students mass evaluations, determining the factors that influence purchasing behavior of customers, performing coarse weather predictions, deciding if a system should water plants or if one should plant at a given time under given conditions in agriculture. The iterative dichotomiser - as the name suggests - tries to determine which feature might drive a decision most (i.e. which leads to the best segmentation of one’s target set).

So what is a decision tree anyways? Basically it’s a tree that one can traverses from top to bottom to try to answer a given question based on usually binary or discrete attributes. It’s an supervised learning algorithm that requires training data set that one can first built a model out of. Usually this is done by splitting the training data set into a training and a verification set (for example by random partitioning) to gain information on how good the trained tree works out when applied to known situations with a known outcome.

Building the tree is a pretty expensive operation - where one iteratively reduces the attribute space. For each iteration one first separates the remaining training data of the given subtree into all result classes and calculates the information gain of all attributes (i.e. one calculates how well which attributes separate / correlate with the given class association in the given subtree). This of course means that a full non pruned binary tree would have $2^n$ levels in case $n$ is the number of attributes - for non binary attributes the tree even grows faster.

To achieve this goal ID3 uses two basic metrics:

Shannon entropy measures the amount of uncertainty or information content in a set. In case $S$ is the dataset for which entropy is calculated and $X$ is the set of classes in $S$ as well as $p(x)$ the proportion of the number of elements in class $x$ in $S$:

[ H(S) = \sum_{x \in S} - p(x) * \log_2 p(x) \\ p(x) = \frac{n(x)}{N} ]

Note that I’ve used $n(x)$ as the number of elements that are assigned to class $x$ and $N$ as the total number of elements in a class. To perform this calculation in a meaningful way one uses the mathematical incorrect definition of $log(0) = 0$ which basically just means that one ignores empty classes - which is basically also already mentioned in the $\sum_{x \in S}$.

Information gain. The information gain describes how well a given attribute $A$ separates the classes. The (discrete) attribute values will be denoted as $a \in A$. The set $S_a$ will be all elements from $S_a \subset S$ that have the attribute value $a$ assigned. The term $\mid S \mid$ determines the number of elements in a given set. So the term

[ \frac{\mid S_a \mid}{\mid S \mid} ]

calculates the probability of the attribute value $a$ to occur in the subset $S$.

[ G(S, A) = H(S) - \sum_{a \in A} \frac{\mid S_a \mid}{\mid S \mid} * H(S_a) ]

Thus one basically calculates the information gain as the difference between the entropy of the full data set and the sum of entropies in splitted data sets weighted by their probability. This is done for each remaining attribute on each iteration during building of the tree (thus there will be $O(n^2)$ iterations only linearly decreasing in complexity). One then selects the attribute with maximum information gain. If there are attributes with equal information gain one usually randomly selects one of those attributes and performs the split.

In case information gain is minimal one might also simply prune the tree at this position and record the probabilities of all result classes. In this case one can also determine which attributes do not contribute to the decision when looking at the resulting tree.

The tree built thus is a tree that specifies which attribute is split at with as many branches as attribute values are present. If one extends the algorithm to continuous cases one could still use ID3 when one builds binned buckets of values (i.e. builds classes - for example using an approach such as k-means or other clustering algorithms - or simply separating in equal class sizes before). There are better methods to tackle problems with continuous attributes though.

Popular extensions to the ID3 algorithm are:

Many software implementations currently implement either ID3, CART or minor variations of those algorithms.

Confidence intervals used

The mentioned Wilson interval is an extension of the simple Wald interval that would build a confidence interval by doing a simple normal distribution approximation assuming a binomial distribution between the result outcome and all other values.

Doing a simple Wald interval would assume as known from the Binomial distribution that:

[ P_{l,u} = \hat{p} \pm z * \sqrt{\frac{\hat{p} * (1 - \hat{p})}{n}} \\ z = \phi^{-1}(1 - \frac{\alpha}{2}) ]

Using the Wald interval requires one to clamp the results to the $[0,1]$ interval at the end and usually undershoots the real confidence interval (i.e. results in an interval with smaller confidence level). A simple extension to this interval would be the Agresti-Coull interval - but a way better alternative is the Wilson score interval. This does not yield symmetric limits though - but solves the problem of over and undershooting and works perfectly well for small samples and skewed observations - and unlike even better scores like the Clopper-Pearson interval that would yield even better results it can be directly calculated by a simple formula. The derivation of this interval is nicely described on Wikipedia so I’m going to skip this here. The main result is the used formula:

[ p_{l,u} = \frac{1}{1 + \frac{z^2}{n}} \left(\hat{p} + \frac{z^2}{2n}\right) \pm \frac{z}{1 + \frac{z^2}{n}} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n} + \frac{z^2}{4n^2}} ]

This is what I’m going to use to determine the confidence intervals.

Simple example data

For demonstration purposes I’m using a pretty simple dataset - that’s often found on the net when one discusses ID3 trees - that describes if someone got outside depending on environmental conditions:

Outlook Temperature Humidity Wind Decision
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No

Building the decision tree for the test dataset is pretty straight forward. First one has to determine the Shanon entropy for the unfiltered data. I’m also calculating the target probabilities and their confidence intervals at every level. The idea behind the latter is that one might only be interested in a result with a given probability and thus might want to terminate the tree search or prune the tree later on.

Entropy: 0.9402859586706309, gain: 0.2467498197744391
Decision = No: 35.71429% [12.73086%, 67.90469%]
Decision = Yes: 64.28571% [32.09531%, 87.26914000000001%]

As one can see the example dataset is somewhat balanced and the confidence intervals are also massively overlapping.

The next step is to calculate the branching ratios as well as the entropy for each subtree in case one would split for each of the attributes:

Possible gain for Outlook: 0.2467498197744391
Possible gain for Temperature: 0.029222565658954647
Possible gain for Humidity: 0.15183550136234142
Possible gain for Wind: 0.04812703040826932
Selected maximum gain 0.2467498197744391 for candidate Outlook

As one can see the information gain differs radically between the various attributes. The largest gain is achieved by selecting the outlook (0.246750) followed by the humidity (0.151836). Wind condition (0.048127) and temperature (0.029223) provide the least information. The first branch will thus be formed at the temperature class.

The algorithm has decided to branch at attribute the outlook column since it’s information gain is the maximum encountered. Then the algorithm recurses into the first possible attribute value for the outlook. The first attribute is overcast:

Inside subtree Outlook = Overcast
Branching candidates Temperature, Humidity, Wind
Our entropy is already 0 - finishing up

Outlook = Overcast:
| | Terminal:
| | Decision = No: 0.0% [0.0%, 62.46387%]
| | Decision = Yes: 100.0% [37.53613%, 100.0%]

As one can see the entropy is zero - the decision is always yes. The width of the confidence interval is determined by the amount of cases that support this decision even in case we only encountered records containing this decision. Since entropy is zero and thus information gain by any further steps would also vanish this will create a terminal and stop further splitting.

The case is different for an outlook of rain. The algorithm again calculates entropy and information gain for all possible remaining attributes:

Inside subtree Outlook = Rain
Branching candidates Temperature, Humidity, Wind
Element count: 5, Shanon entropy: 0.9709505944546686
Possible gain for Temperature: 0.01997309402197489
Possible gain for Humidity: 0.01997309402197489
Possible gain for Wind: 0.9709505944546686
Selected maximum gain 0.9709505944546686 for candidate Wind

At the end it selects branching at the wind column.

Outlook = Rain:
| | Entropy: 0.9709505944546686, gain: 0.9709505944546686
| | Decision = No: 40.0% [8.2521%, 83.16892%]
| | Decision = Yes: 60.0% [16.83108%, 91.7479%]
| | Branching on Wind
| | Wind = Weak:
| | | | Terminal:
| | | | Decision = No: 0.0% [0.0%, 68.93252%]
| | | | Decision = Yes: 100.0% [31.067479999999996%, 100.0%]
| | Wind = Strong:
| | | | Terminal:
| | | | Decision = No: 100.0% [23.10429%, 100.0%]
| | | | Decision = Yes: 0.0% [-0.0%, 76.89571%]

Recursing into weak and strong subtrees will yield nodes with zero entropy again - thus there will be terminal nodes.

At the end the algorithm will have generated a simple decision tree only two levels deep. The topmost level will split at the outlook column - for overcast no more recursion will be required. In case of rainy outlook the algorithm will choose to further discriminate based on wind condition and produce terminals for strong and weak wind predictions. In case of sunny outlook on the other hand it will discriminate based on humidity and produce terminals for high and normal humidity.

| Entropy: 0.9402859586706309, gain: 0.2467498197744391
| Decision = No: 35.71429% [12.73086%, 67.90469%]
| Decision = Yes: 64.28571% [32.09531%, 87.26914000000001%]
| Branching on Outlook
| Outlook = Sunny:
| | | Entropy: 0.9709505944546686, gain: 0.9709505944546686
| | | Decision = No: 60.0% [16.83108%, 91.7479%]
| | | Decision = Yes: 40.0% [8.2521%, 83.16892%]
| | | Branching on Humidity
| | | Humidity = High:
| | | | | Terminal:
| | | | | Decision = No: 100.0% [31.067479999999996%, 100.0%]
| | | | | Decision = Yes: 0.0% [0.0%, 68.93252%]
| | | Humidity = Normal:
| | | | | Terminal:
| | | | | Decision = No: 0.0% [-0.0%, 76.89571%]
| | | | | Decision = Yes: 100.0% [23.10429%, 100.0%]
| Outlook = Overcast:
| | | Terminal:
| | | Decision = No: 0.0% [0.0%, 62.46387%]
| | | Decision = Yes: 100.0% [37.53613%, 100.0%]
| Outlook = Rain:
| | | Entropy: 0.9709505944546686, gain: 0.9709505944546686
| | | Decision = No: 40.0% [8.2521%, 83.16892%]
| | | Decision = Yes: 60.0% [16.83108%, 91.7479%]
| | | Branching on Wind
| | | Wind = Weak:
| | | | | Terminal:
| | | | | Decision = No: 0.0% [0.0%, 68.93252%]
| | | | | Decision = Yes: 100.0% [31.067479999999996%, 100.0%]
| | | Wind = Strong:
| | | | | Terminal:
| | | | | Decision = No: 100.0% [23.10429%, 100.0%]
| | | | | Decision = Yes: 0.0% [-0.0%, 76.89571%]

In case of this example the algorithm built a tree that leads to decisions with probability 1.0 - i.e. it provides a decision whose point estimator looks like a sure conclusion. Note that the confidence intervals are still not point like due to the limited amount of data that has been used to build the trees. In general case there might be different probabilities for both outcomes - one can then decide if one wants to work with the point estimators, the confidence intervals or wants to specify any threshold, add an inconclusive outcome, etc.

The tree looks different when one does not include Humidity:

| Entropy: 0.9402859586706309, gain: 0.2467498197744391
| Decision = No: 35.71429% [12.73086%, 67.90469%]
| Decision = Yes: 64.28571% [32.09531%, 87.26914000000001%]
| Branching on Outlook
| Outlook = Sunny:
| | | Entropy: 0.9709505944546686, gain: 0.5709505944546686
| | | Decision = No: 60.0% [16.83108%, 91.7479%]
| | | Decision = Yes: 40.0% [8.2521%, 83.16892%]
| | | Branching on Temperature
| | | Temperature = Hot:
| | | | | Terminal:
| | | | | Decision = No: 100.0% [23.10429%, 100.0%]
| | | | | Decision = Yes: 0.0% [-0.0%, 76.89571%]
| | | Temperature = Mild:
| | | | | Entropy: 1.0, gain: 1.0
| | | | | Decision = No: 50.0% [6.1549%, 93.8451%]
| | | | | Decision = Yes: 50.0% [6.1549%, 93.8451%]
| | | | | Branching on Wind
| | | | | Wind = Weak:
| | | | | | | Terminal:
| | | | | | | Decision = No: 100.0% [13.06097%, 100.0%]
| | | | | | | Decision = Yes: 0.0% [-0.0%, 86.93902999999999%]
| | | | | Wind = Strong:
| | | | | | | Terminal:
| | | | | | | Decision = No: 0.0% [-0.0%, 86.93902999999999%]
| | | | | | | Decision = Yes: 100.0% [13.06097%, 100.0%]
| | | Temperature = Cool:
| | | | | Terminal:
| | | | | Decision = No: 0.0% [-0.0%, 86.93902999999999%]
| | | | | Decision = Yes: 100.0% [13.06097%, 100.0%]
| Outlook = Overcast:
| | | Terminal:
| | | Decision = No: 0.0% [0.0%, 62.46387%]
| | | Decision = Yes: 100.0% [37.53613%, 100.0%]
| Outlook = Rain:
| | | Entropy: 0.9709505944546686, gain: 0.9709505944546686
| | | Decision = No: 40.0% [8.2521%, 83.16892%]
| | | Decision = Yes: 60.0% [16.83108%, 91.7479%]
| | | Branching on Wind
| | | Wind = Weak:
| | | | | Terminal:
| | | | | Decision = No: 0.0% [0.0%, 68.93252%]
| | | | | Decision = Yes: 100.0% [31.067479999999996%, 100.0%]
| | | Wind = Strong:
| | | | | Terminal:
| | | | | Decision = No: 100.0% [23.10429%, 100.0%]
| | | | | Decision = Yes: 0.0% [-0.0%, 76.89571%]

In case one decides to remove the Outlook attribute which had the highest initial gain it gets more interesting - then one can see that not every case leads to a 100% conclusion. For example in case one has high humidity but only mild temperature there is a 50% chance for yes and a 50% chance for no. The same is the case for high humidity, hot temperature and weak wind and in a third case for normal humidity. Such inconclusive outcomes are of course much more likely when one builds trees over complex datasets with enough entries. Having a tree with exact conclusions is usually an example for an overfitting situation for complex situations (though of course also a possible correct outcome)

| Entropy: 0.9402859586706309, gain: 0.15183550136234142
| Decision = No: 35.71429% [12.73086%, 67.90469%]
| Decision = Yes: 64.28571% [32.09531%, 87.26914000000001%]
| Branching on Humidity
| Humidity = High:
| | | Terminal:
| | | Decision = No: 57.14286% [18.93662%, 88.38595000000001%]
| | | Decision = Yes: 42.85714% [11.614049999999999%, 81.06338%]
| Humidity = Normal:
| | | Entropy: 0.5916727785823275, gain: 0.19811742113040343
| | | Decision = No: 14.285709999999998% [1.6956700000000002%, 61.69146%]
| | | Decision = Yes: 85.71429% [38.30854%, 98.30433%]
| | | Branching on Wind
| | | Wind = Weak:
| | | | | Terminal:
| | | | | Decision = No: 0.0% [0.0%, 62.46387%]
| | | | | Decision = Yes: 100.0% [37.53613%, 100.0%]
| | | Wind = Strong:
| | | | | Entropy: 0.9182958340544896, gain: 0.2516291673878229
| | | | | Decision = No: 33.33333% [4.03207%, 85.6121%]
| | | | | Decision = Yes: 66.66667% [14.3879%, 95.96793%]
| | | | | Branching on Temperature
| | | | | Temperature = Hot:
| | | | | | | Terminal:
| | | | | | | Not possible according to known data
| | | | | Temperature = Mild:
| | | | | | | Terminal:
| | | | | | | Decision = No: 0.0% [-0.0%, 86.93902999999999%]
| | | | | | | Decision = Yes: 100.0% [13.06097%, 100.0%]
| | | | | Temperature = Cool:
| | | | | | | Terminal:
| | | | | | | Decision = No: 50.0% [6.1549%, 93.8451%]
| | | | | | | Decision = Yes: 50.0% [6.1549%, 93.8451%]

Description of the implementation

So what does one require for a simple implementation of ID3? Note this will be one of the most simple implementations that I came up with without offering much performance enhancements. It has been designed to tackle some specific problems though so the code has been designed to process data stored in CSV files as well as in various database backends (these have been excluded from the open sourced version due to a myriad of different reasons).

Datastore: CSV datastore

First one will need a way to access the data. This is what I’m going to call the datastore layer. The datastore layer will provide access to the subsets of data - and might be implemented either as a simple iterator or using some SQL database logic. The methods required are:

For sake of simplicity I’ve to created a simple CSV/TSV data store at first for which one has to set attribute types manually by selecting the specific attribute descriptor class. The datastore can be created specifying a filename as well as the column index and the type (discrete labels and continuous attributes) type. For continuous attributes one has to specify the classes since they won’t be automatically determined.

The CSV datastore will one also allow to automatically build the list of attributes in case one really wants to consume all fields.

The ID3 tree

Now it comes to the ID3 tree itself. This is basically a direct implementation of the algorithm described above. It only requires to know which attributes it should use as input and which attribute contains the target classes. Since attributes are already selected by the datastore configuration the ID3 tree builder is not required to receive any additional information.

The output then is a tree that splits at each level by one attribute - for each of the discrete values there exists one subtree. On each level the tree records:

The ID3 tree is built by a simple utility function that only requires some minimal arguments:

So to take the decision at a given level the task can be split into multiple steps:

Some nice output for the ID3 tree

Since I also wanted to further process the tree I’ve decided to support various outputs on the small CLI utility or inside the Jupyter notebook:

Applying to real word data

Now that I’ve described the algorithm and it’s implementation let’s apply it to some real world data. There is a number of nice datasets that can be used as an example. I’ve decided on the following ones:

Classifying edible and poisonous mushrooms

There is a section about Agaricus and Lepiota inside the Audobon Society Field Guide from which Jeff Schlimmer has extracted 22 different characteristics of the 8124 mushrooms from those two families to allow one to train decision trees to decide if mushrooms are edible or not. Please not this should not be used as a guide. There is no guarantee this decision tree won’t kill you

Training for the decision tree takes a while due to the huge number of parameters. The attributes provided are:

As one can see the algorithm recurses only into a maximum of four attributes and then terminates since all other properties do not provide any more information gain. Even though the problem is pretty simple the calculation took nearly a minute to come to a conclusion:

| Entropy: 0.9968038285222955, gain: 0.9054400254210326
| Edible = EDIBLE: 53.327000000000005% [51.921870000000006%, 54.726870000000005%]
| Edible = POISONOUS: 46.672999999999995% [45.27313%, 48.07813%]
| Branching on Odor
| Odor = ALMOND:
| | | Terminal:
| | | Edible = EDIBLE: 100.0% [98.36314%, 100.0%]
| | | Edible = POISONOUS: 0.0% [0.0%, 1.63686%]
| Odor = ANISE:
| | | Terminal:
| | | Edible = EDIBLE: 100.0% [98.36314%, 100.0%]
| | | Edible = POISONOUS: 0.0% [0.0%, 1.63686%]
| Odor = NONE:
| | | Entropy: 0.20192168248430362, gain: 0.1370967382781343
| | | Edible = EDIBLE: 96.84874% [96.03266%, 97.50132%]
| | | Edible = POISONOUS: 3.15126% [2.49868%, 3.9673399999999996%]
| | | Branching on Spore print color
| | | Spore print color = PURPLE:
| | | | | Terminal:
| | | | | Not possible according to known data
| | | Spore print color = BROWN:
| | | | | Terminal:
| | | | | Edible = EDIBLE: 100.0% [99.54983%, 100.0%]
| | | | | Edible = POISONOUS: 0.0% [0.0%, 0.45017%]
| | | Spore print color = BLACK:
| | | | | Terminal:
| | | | | Edible = EDIBLE: 100.0% [99.53473000000001%, 100.0%]
| | | | | Edible = POISONOUS: 0.0% [0.0%, 0.46527%]
| | | Spore print color = CHOCOLATE:
| | | | | Terminal:
| | | | | Edible = EDIBLE: 100.0% [87.82137%, 100.0%]
| | | | | Edible = POISONOUS: 0.0% [0.0%, 12.17863%]
| | | Spore print color = GREEN:
| | | | | Terminal:
| | | | | Edible = EDIBLE: 0.0% [0.0%, 8.46263%]
| | | | | Edible = POISONOUS: 100.0% [91.53737%, 100.0%]
| | | Spore print color = WHITE:
| | | | | Entropy: 0.3809465857053901, gain: 0.2434890145144485
| | | | | Edible = EDIBLE: 92.59259% [89.48345%, 94.83559%]
| | | | | Edible = POISONOUS: 7.4074100000000005% [5.16441%, 10.516549999999999%]
| | | | | Branching on Habitat
| | | | | Habitat = WOODS:
| | | | | | | Entropy: 0.7219280948873623, gain: 0.7219280948873623
| | | | | | | Edible = EDIBLE: 20.0% [8.576920000000001%, 39.98319%]
| | | | | | | Edible = POISONOUS: 80.0% [60.01681%, 91.42308%]
| | | | | | | Branching on Gill size
| | | | | | | Gill size = NARROW:
| | | | | | | | | Terminal:
| | | | | | | | | Edible = EDIBLE: 0.0% [-0.0%, 17.2194%]
| | | | | | | | | Edible = POISONOUS: 100.0% [82.7806%, 100.0%]
| | | | | | | Gill size = BROAD:
| | | | | | | | | Terminal:
| | | | | | | | | Edible = EDIBLE: 100.0% [54.58366%, 100.0%]
| | | | | | | | | Edible = POISONOUS: 0.0% [0.0%, 45.41634%]
| | | | | Habitat = MEADOWS:
| | | | | | | Terminal:
| | | | | | | Not possible according to known data
| | | | | Habitat = GRASSES:
| | | | | | | Terminal:
| | | | | | | Edible = EDIBLE: 100.0% [97.74096%, 100.0%]
| | | | | | | Edible = POISONOUS: 0.0% [-0.0%, 2.25904%]
| | | | | Habitat = PATHS:
| | | | | | | Terminal:
| | | | | | | Edible = EDIBLE: 100.0% [85.73315000000001%, 100.0%]
| | | | | | | Edible = POISONOUS: 0.0% [-0.0%, 14.26685%]
| | | | | Habitat = URBAN:
| | | | | | | Terminal:
| | | | | | | Not possible according to known data
| | | | | Habitat = LEAVES:
| | | | | | | Entropy: 0.6840384356390417, gain: 0.6840384356390417
| | | | | | | Edible = EDIBLE: 81.81818% [69.11085%, 90.0505%]
| | | | | | | Edible = POISONOUS: 18.181820000000002% [9.9495%, 30.889149999999997%]
| | | | | | | Branching on Cap Color
| | | | | | | Cap Color = WHITE:
| | | | | | | | | Terminal:
| | | | | | | | | Edible = EDIBLE: 0.0% [0.0%, 45.41634%]
| | | | | | | | | Edible = POISONOUS: 100.0% [54.58366%, 100.0%]
| | | | | | | Cap Color = YELLOW:
| | | | | | | | | Terminal:
| | | | | | | | | Edible = EDIBLE: 0.0% [0.0%, 45.41634%]
| | | | | | | | | Edible = POISONOUS: 100.0% [54.58366%, 100.0%]
| | | | | | | Cap Color = BROWN:
| | | | | | | | | Terminal:
| | | | | | | | | Edible = EDIBLE: 100.0% [87.82137%, 100.0%]
| | | | | | | | | Edible = POISONOUS: 0.0% [0.0%, 12.17863%]
| | | | | | | Cap Color = GRAY:
| | | | | | | | | Terminal:
| | | | | | | | | Not possible according to known data
| | | | | | | Cap Color = RED:
| | | | | | | | | Terminal:
| | | | | | | | | Not possible according to known data
| | | | | | | Cap Color = PINK:
| | | | | | | | | Terminal:
| | | | | | | | | Not possible according to known data
| | | | | | | Cap Color = PURPLE:
| | | | | | | | | Terminal:
| | | | | | | | | Not possible according to known data
| | | | | | | Cap Color = GREEN:
| | | | | | | | | Terminal:
| | | | | | | | | Not possible according to known data
| | | | | | | Cap Color = BUFF:
| | | | | | | | | Terminal:
| | | | | | | | | Not possible according to known data
| | | | | | | Cap Color = CINNAMON:
| | | | | | | | | Terminal:
| | | | | | | | | Edible = EDIBLE: 100.0% [78.28708%, 100.0%]
| | | | | | | | | Edible = POISONOUS: 0.0% [0.0%, 21.71292%]
| | | | | Habitat = WASTE:
| | | | | | | Terminal:
| | | | | | | Edible = EDIBLE: 100.0% [96.64929%, 100.0%]
| | | | | | | Edible = POISONOUS: 0.0% [0.0%, 3.35071%]
| | | Spore print color = YELLOW:
| | | | | Terminal:
| | | | | Edible = EDIBLE: 100.0% [87.82137%, 100.0%]
| | | | | Edible = POISONOUS: 0.0% [0.0%, 12.17863%]
| | | Spore print color = ORANGE:
| | | | | Terminal:
| | | | | Edible = EDIBLE: 100.0% [87.82137%, 100.0%]
| | | | | Edible = POISONOUS: 0.0% [0.0%, 12.17863%]
| | | Spore print color = BUFF:
| | | | | Terminal:
| | | | | Edible = EDIBLE: 100.0% [87.82137%, 100.0%]
| | | | | Edible = POISONOUS: 0.0% [0.0%, 12.17863%]
| Odor = PUNGENT:
| | | Terminal:
| | | Edible = EDIBLE: 0.0% [-0.0%, 2.53426%]
| | | Edible = POISONOUS: 100.0% [97.46574%, 100.0%]
| Odor = CREOSOTE:
| | | Terminal:
| | | Edible = EDIBLE: 0.0% [0.0%, 3.35071%]
| | | Edible = POISONOUS: 100.0% [96.64929%, 100.0%]
| Odor = FOUL:
| | | Terminal:
| | | Edible = EDIBLE: 0.0% [0.0%, 0.30722%]
| | | Edible = POISONOUS: 100.0% [99.69278%, 100.0%]
| Odor = FISHY:
| | | Terminal:
| | | Edible = EDIBLE: 0.0% [0.0%, 1.14242%]
| | | Edible = POISONOUS: 100.0% [98.85758%, 100.0%]
| Odor = SPICY:
| | | Terminal:
| | | Edible = EDIBLE: 0.0% [0.0%, 1.14242%]
| | | Edible = POISONOUS: 100.0% [98.85758%, 100.0%]
| Odor = MUSTY:
| | | Terminal:
| | | Edible = EDIBLE: 0.0% [0.0%, 12.17863%]
| | | Edible = POISONOUS: 100.0% [87.82137%, 100.0%]

Classifying symptoms and disease

To get a little bit more into the medical regime and illustrate what the idea of applying decisions tree in diagnosis might look like (still keep in mind this is just an example, this is not handling the data in any way complete enough) I’ve also gathered another dataset from Kaggle. This dataset contains a list of disease as diagnosed by a medical professional as well as a list of symptoms. The dataset also would allow to weight the symptoms according to severity - I’ve not used this here for demonstration purposes. The same decision tree algorithm as above can be applied to a simple modified dataset that just contains binary attributes - has a patient shown given symptoms or not.

The datasource currently contains only around 4900 entries for 131 symptoms which of course feels pretty limited - and indeed it is. But it should illustrate the idea behind applying decision trees in the world of medicine pretty well - and also some limits when training ID3 trees directly. The calculation using this direct inefficient implementation takes around 19 hours …

| Entropy: 5.357552004618081, gain: 0.8458372108970891
| fatigue = NO:
| | | Entropy: 4.786255977156568, gain: 0.8175104414185426
| | | vomiting = NO:
| | | | | Entropy: 4.2634667101159796, gain: 0.7724397067592355
| | | | | skin_rash = YES:
| | | | | | | Entropy: 2.3817253589307477, gain: 0.7884441273192315
| | | | | | | itching = YES:
| | | | | | | | | Entropy: 1.161378479448699, gain: 0.7287131042890482
| | | | | | | | | stomach_pain = NO:
| | | | | | | | | | | Entropy: 0.7742433029172697, gain: 0.48546076074591343
| | | | | | | | | | | burning_micturition = NO:
| | | | | | | | | | | | | Entropy: 0.3227569588973982, gain: 0.3227569588973982
| | | | | | | | | | | | | loss_of_appetite = NO:
| | | | | | | | | | | | | | | Disease = Fungal infection: 100.0% [93.51585%, 100.0%]
| | | | | | | | | | | | | loss_of_appetite = YES:
| | | | | | | | | | | | | | | Disease = Chicken pox: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | burning_micturition = YES:
| | | | | | | | | | | | | Disease = Drug Reaction: 100.0% [64.32109%, 100.0%]
| | | | | | | | | stomach_pain = YES:
| | | | | | | | | | | Disease = Drug Reaction: 100.0% [93.11334%, 100.0%]
| | | | | | | itching = NO:
| | | | | | | | | Entropy: 1.838026124503779, gain: 0.7870913537395843
| | | | | | | | | joint_pain = NO:
| | | | | | | | | | | Entropy: 1.5013353868059924, gain: 0.8506573567612395
| | | | | | | | | | | blister = NO:
| | | | | | | | | | | | | Entropy: 1.1386865525783176, gain: 0.48654136697818307
| | | | | | | | | | | | | pus_filled_pimples = NO:
| | | | | | | | | | | | | | | Entropy: 2.2359263506290326, gain: 0.863120568566631
| | | | | | | | | | | | | | | nodal_skin_eruptions = YES:
| | | | | | | | | | | | | | | | | Disease = Fungal infection: 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | | | nodal_skin_eruptions = NO:
| | | | | | | | | | | | | | | | | Entropy: 1.9219280948873623, gain: 0.9709505944546687
| | | | | | | | | | | | | | | | | blackheads = NO:
| | | | | | | | | | | | | | | | | | | Entropy: 1.584962500721156, gain: 0.9182958340544894
| | | | | | | | | | | | | | | | | | | stomach_pain = NO:
| | | | | | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | high_fever = NO:
| | | | | | | | | | | | | | | | | | | | | | | Disease = Psoriasis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | high_fever = YES:
| | | | | | | | | | | | | | | | | | | | | | | Disease = Impetigo: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | stomach_pain = YES:
| | | | | | | | | | | | | | | | | | | | | Disease = Drug Reaction: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | blackheads = YES:
| | | | | | | | | | | | | | | | | | | Disease = Acne: 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | pus_filled_pimples = YES:
| | | | | | | | | | | | | | | Disease = Acne: 100.0% [93.87389999999999%, 100.0%]
| | | | | | | | | | | blister = YES:
| | | | | | | | | | | | | Disease = Impetigo: 100.0% [94.19448%, 100.0%]
| | | | | | | | | joint_pain = YES:
| | | | | | | | | | | Disease = Psoriasis: 100.0% [94.19448%, 100.0%]
| | | | | skin_rash = NO:
| | | | | | | Entropy: 3.9828871664512895, gain: 0.6468749738357373
| | | | | | | headache = NO:
| | | | | | | | | Entropy: 3.7560383874069343, gain: 0.6991724211329374
| | | | | | | | | swelling_joints = NO:
| | | | | | | | | | | Entropy: 3.648994047474087, gain: 0.6239247592651546
| | | | | | | | | | | dizziness = NO:
| | | | | | | | | | | | | Entropy: 3.4860352643091366, gain: 0.6908574161896694
| | | | | | | | | | | | | high_fever = NO:
| | | | | | | | | | | | | | | Entropy: 3.2946870140830082, gain: 0.6952585028600295
| | | | | | | | | | | | | | | constipation = NO:
| | | | | | | | | | | | | | | | | Entropy: 3.336579880077256, gain: 0.7747942362434017
| | | | | | | | | | | | | | | | | bladder_discomfort = NO:
| | | | | | | | | | | | | | | | | | | Entropy: 3.5758257945180882, gain: 0.7590191722627639
| | | | | | | | | | | | | | | | | | | continuous_sneezing = NO:
| | | | | | | | | | | | | | | | | | | | | Entropy: 4.506890595608519, gain: 0.7219280948873625
| | | | | | | | | | | | | | | | | | | | | itching = YES:
| | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.9182958340544893, gain: 0.9182958340544893
| | | | | | | | | | | | | | | | | | | | | | | nodal_skin_eruptions = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | Disease = Fungal infection: 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | nodal_skin_eruptions = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.5, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | | | | | stomach_pain = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | | | | | | | nausea = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Hepatitis B: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | nausea = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Chronic cholestasis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | stomach_pain = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Drug Reaction: 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | itching = NO:
| | | | | | | | | | | | | | | | | | | | | | | Entropy: 4.251629167387823, gain: 0.6500224216483548
| | | | | | | | | | | | | | | | | | | | | | | diarrhoea = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 4.021928094887362, gain: 0.7219280948873619
| | | | | | | | | | | | | | | | | | | | | | | | | chest_pain = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 3.875, gain: 0.5435644431995956
| | | | | | | | | | | | | | | | | | | | | | | | | | | yellowish_skin = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 3.6644977792004623, gain: 0.5916727785823288
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | shivering = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 3.584962500721156, gain: 0.6500224216483542
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | indigestion = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 3.321928094887362, gain: 0.7219280948873619
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | obesity = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 3.0, gain: 0.8112781244591329
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | neck_pain = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 2.584962500721156, gain: 0.6500224216483541
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | burning_micturition = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 2.321928094887362, gain: 0.7219280948873621
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | muscle_wasting = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 2.0, gain: 0.8112781244591329
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stiff_neck = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.584962500721156, gain: 0.9182958340544894
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | joint_pain = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pain_during_bowel_movements = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Acne: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pain_during_bowel_movements = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Dimorphic hemmorhoids(piles): 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | joint_pain = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Psoriasis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stiff_neck = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Arthritis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | muscle_wasting = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = AIDS: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | burning_micturition = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Urinary tract infection: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | neck_pain = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | loss_of_balance = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Osteoarthristis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | loss_of_balance = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Cervical spondylosis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | obesity = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | weight_loss = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Varicose veins: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | weight_loss = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Diabetes: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | indigestion = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | acidity = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Peptic ulcer diseae: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | acidity = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Migraine: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | shivering = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Allergy: 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | yellowish_skin = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | nausea = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Alcoholic hepatitis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | nausea = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Hepatitis C: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | chest_pain = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | | | | | | | stomach_pain = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Heart attack: 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | stomach_pain = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = GERD: 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | diarrhoea = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.5, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | | | | | sunken_eyes = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | | | | | | | yellowish_skin = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Hyperthyroidism: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | | | yellowish_skin = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = hepatitis A: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | | | sunken_eyes = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | | | Disease = Gastroenteritis: 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | | | | | | | continuous_sneezing = YES:
| | | | | | | | | | | | | | | | | | | | | Disease = Allergy: 100.0% [94.19448%, 100.0%]
| | | | | | | | | | | | | | | | | bladder_discomfort = YES:
| | | | | | | | | | | | | | | | | | | Disease = Urinary tract infection: 100.0% [94.48318%, 100.0%]
| | | | | | | | | | | | | | | constipation = YES:
| | | | | | | | | | | | | | | | | Disease = Dimorphic hemmorhoids(piles): 100.0% [94.48318%, 100.0%]
| | | | | | | | | | | | | high_fever = YES:
| | | | | | | | | | | | | | | Entropy: 0.9274479232123118, gain: 0.558629373452199
| | | | | | | | | | | | | | | cough = NO:
| | | | | | | | | | | | | | | | | Entropy: 0.28639695711595625, gain: 0.28639695711595625
| | | | | | | | | | | | | | | | | blister = NO:
| | | | | | | | | | | | | | | | | | | Disease = AIDS: 100.0% [94.48318%, 100.0%]
| | | | | | | | | | | | | | | | | blister = YES:
| | | | | | | | | | | | | | | | | | | Disease = Impetigo: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | cough = YES:
| | | | | | | | | | | | | | | | | Entropy: 0.9182958340544896, gain: 0.9182958340544896
| | | | | | | | | | | | | | | | | chills = NO:
| | | | | | | | | | | | | | | | | | | Disease = Bronchial Asthma: 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | | | | | chills = YES:
| | | | | | | | | | | | | | | | | | | Disease = Pneumonia: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | dizziness = YES:
| | | | | | | | | | | | | Entropy: 0.8404914014731815, gain: 0.5096374678020158
| | | | | | | | | | | | | neck_pain = NO:
| | | | | | | | | | | | | | | Entropy: 1.5219280948873621, gain: 0.9709505944546685
| | | | | | | | | | | | | | | chest_pain = NO:
| | | | | | | | | | | | | | | | | Entropy: 0.9182958340544896, gain: 0.9182958340544896
| | | | | | | | | | | | | | | | | lethargy = NO:
| | | | | | | | | | | | | | | | | | | Disease = Cervical spondylosis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | lethargy = YES:
| | | | | | | | | | | | | | | | | | | Disease = Hypothyroidism: 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | | | chest_pain = YES:
| | | | | | | | | | | | | | | | | Disease = Hypertension: 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | neck_pain = YES:
| | | | | | | | | | | | | | | Disease = Cervical spondylosis: 100.0% [94.19448%, 100.0%]
| | | | | | | | | swelling_joints = YES:
| | | | | | | | | | | Entropy: 1.0, gain: 0.8492647594126546
| | | | | | | | | | | stiff_neck = NO:
| | | | | | | | | | | | | Entropy: 0.28639695711595625, gain: 0.28639695711595625
| | | | | | | | | | | | | muscle_weakness = NO:
| | | | | | | | | | | | | | | Disease = Osteoarthristis: 100.0% [94.48318%, 100.0%]
| | | | | | | | | | | | | muscle_weakness = YES:
| | | | | | | | | | | | | | | Disease = Arthritis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | stiff_neck = YES:
| | | | | | | | | | | | | Disease = Arthritis: 100.0% [94.19448%, 100.0%]
| | | | | | | headache = YES:
| | | | | | | | | Entropy: 1.6359061660790049, gain: 0.8525666663983983
| | | | | | | | | loss_of_balance = NO:
| | | | | | | | | | | Entropy: 1.1386865525783176, gain: 0.575779260731362
| | | | | | | | | | | acidity = NO:
| | | | | | | | | | | | | Entropy: 2.2516291673878226, gain: 0.9182958340544893
| | | | | | | | | | | | | chills = NO:
| | | | | | | | | | | | | | | Entropy: 1.5, gain: 1.0
| | | | | | | | | | | | | | | weakness_of_one_body_side = NO:
| | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | chest_pain = NO:
| | | | | | | | | | | | | | | | | | | Disease = Migraine: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | chest_pain = YES:
| | | | | | | | | | | | | | | | | | | Disease = Hypertension: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | weakness_of_one_body_side = YES:
| | | | | | | | | | | | | | | | | Disease = Paralysis (brain hemorrhage): 100.0% [64.32109%, 100.0%]
| | | | | | | | | | | | | chills = YES:
| | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | continuous_sneezing = NO:
| | | | | | | | | | | | | | | | | Disease = Malaria: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | continuous_sneezing = YES:
| | | | | | | | | | | | | | | | | Disease = Common Cold: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | acidity = YES:
| | | | | | | | | | | | | Disease = Migraine: 100.0% [94.19448%, 100.0%]
| | | | | | | | | loss_of_balance = YES:
| | | | | | | | | | | Entropy: 0.3095434291503252, gain: 0.3095434291503252
| | | | | | | | | | | nausea = NO:
| | | | | | | | | | | | | Disease = Hypertension: 100.0% [93.87389999999999%, 100.0%]
| | | | | | | | | | | nausea = YES:
| | | | | | | | | | | | | Disease = (vertigo) Paroymsal  Positional Vertigo: 100.0% [47.40685%, 100.0%]
| | | vomiting = YES:
| | | | | Entropy: 3.4990336640731607, gain: 0.8507115768962774
| | | | | nausea = NO:
| | | | | | | Entropy: 2.8781892225870314, gain: 0.8236948259200888
| | | | | | | abdominal_pain = NO:
| | | | | | | | | Entropy: 2.367635889995596, gain: 0.849308608237843
| | | | | | | | | chest_pain = NO:
| | | | | | | | | | | Entropy: 1.8180959929710643, gain: 0.8525666663983981
| | | | | | | | | | | diarrhoea = NO:
| | | | | | | | | | | | | Entropy: 1.457518749639422, gain: 0.6387068973726207
| | | | | | | | | | | | | altered_sensorium = NO:
| | | | | | | | | | | | | | | Entropy: 2.807354922057604, gain: 0.8631205685666311
| | | | | | | | | | | | | | | headache = NO:
| | | | | | | | | | | | | | | | | Entropy: 2.321928094887362, gain: 0.7219280948873621
| | | | | | | | | | | | | | | | | stomach_pain = NO:
| | | | | | | | | | | | | | | | | | | Entropy: 2.0, gain: 0.8112781244591329
| | | | | | | | | | | | | | | | | | | yellowish_skin = NO:
| | | | | | | | | | | | | | | | | | | | | Entropy: 1.584962500721156, gain: 0.9182958340544894
| | | | | | | | | | | | | | | | | | | | | loss_of_appetite = NO:
| | | | | | | | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | | | | | | | sunken_eyes = NO:
| | | | | | | | | | | | | | | | | | | | | | | | | Disease = Heart attack: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | | | sunken_eyes = YES:
| | | | | | | | | | | | | | | | | | | | | | | | | Disease = Gastroenteritis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | | | loss_of_appetite = YES:
| | | | | | | | | | | | | | | | | | | | | | | Disease = Peptic ulcer diseae: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | | | yellowish_skin = YES:
| | | | | | | | | | | | | | | | | | | | | Disease = Alcoholic hepatitis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | stomach_pain = YES:
| | | | | | | | | | | | | | | | | | | Disease = GERD: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | headache = YES:
| | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | loss_of_balance = NO:
| | | | | | | | | | | | | | | | | | | Disease = Paralysis (brain hemorrhage): 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | loss_of_balance = YES:
| | | | | | | | | | | | | | | | | | | Disease = (vertigo) Paroymsal  Positional Vertigo: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | altered_sensorium = YES:
| | | | | | | | | | | | | | | Disease = Paralysis (brain hemorrhage): 100.0% [93.87389999999999%, 100.0%]
| | | | | | | | | | | diarrhoea = YES:
| | | | | | | | | | | | | Entropy: 0.3095434291503252, gain: 0.3095434291503252
| | | | | | | | | | | | | chills = NO:
| | | | | | | | | | | | | | | Disease = Gastroenteritis: 100.0% [93.87389999999999%, 100.0%]
| | | | | | | | | | | | | chills = YES:
| | | | | | | | | | | | | | | Disease = Malaria: 100.0% [47.40685%, 100.0%]
| | | | | | | | | chest_pain = YES:
| | | | | | | | | | | Entropy: 1.1586048283017796, gain: 0.8426433989885903
| | | | | | | | | | | cough = NO:
| | | | | | | | | | | | | Entropy: 0.3095434291503252, gain: 0.3095434291503252
| | | | | | | | | | | | | stomach_pain = NO:
| | | | | | | | | | | | | | | Disease = Heart attack: 100.0% [93.87389999999999%, 100.0%]
| | | | | | | | | | | | | stomach_pain = YES:
| | | | | | | | | | | | | | | Disease = GERD: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | cough = YES:
| | | | | | | | | | | | | Entropy: 0.3227569588973982, gain: 0.3227569588973982
| | | | | | | | | | | | | chills = NO:
| | | | | | | | | | | | | | | Disease = GERD: 100.0% [93.51585%, 100.0%]
| | | | | | | | | | | | | chills = YES:
| | | | | | | | | | | | | | | Disease = Tuberculosis: 100.0% [47.40685%, 100.0%]
| | | | | | | abdominal_pain = YES:
| | | | | | | | | Entropy: 1.4362406790693445, gain: 0.8566594912242682
| | | | | | | | | yellowish_skin = NO:
| | | | | | | | | | | Entropy: 0.2974722489192896, gain: 0.2974722489192896
| | | | | | | | | | | swelling_of_stomach = NO:
| | | | | | | | | | | | | Disease = Peptic ulcer diseae: 100.0% [94.19448%, 100.0%]
| | | | | | | | | | | swelling_of_stomach = YES:
| | | | | | | | | | | | | Disease = Alcoholic hepatitis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | yellowish_skin = YES:
| | | | | | | | | | | Entropy: 0.847584679824574, gain: 0.46899559358928133
| | | | | | | | | | | itching = YES:
| | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | loss_of_appetite = NO:
| | | | | | | | | | | | | | | Disease = Jaundice: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | loss_of_appetite = YES:
| | | | | | | | | | | | | | | Disease = Chronic cholestasis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | itching = NO:
| | | | | | | | | | | | | Entropy: 0.3095434291503252, gain: 0.3095434291503252
| | | | | | | | | | | | | loss_of_appetite = NO:
| | | | | | | | | | | | | | | Disease = Alcoholic hepatitis: 100.0% [93.87389999999999%, 100.0%]
| | | | | | | | | | | | | loss_of_appetite = YES:
| | | | | | | | | | | | | | | Disease = hepatitis A: 100.0% [47.40685%, 100.0%]
| | | | | nausea = YES:
| | | | | | | Entropy: 2.297472248919289, gain: 0.9995003941817583
| | | | | | | muscle_pain = NO:
| | | | | | | | | Entropy: 1.4362406790693445, gain: 0.8566594912242681
| | | | | | | | | yellowish_skin = NO:
| | | | | | | | | | | Entropy: 0.5689955935892812, gain: 0.33125121848110783
| | | | | | | | | | | loss_of_balance = NO:
| | | | | | | | | | | | | Entropy: 1.584962500721156, gain: 0.9182958340544894
| | | | | | | | | | | | | itching = YES:
| | | | | | | | | | | | | | | Disease = Chronic cholestasis: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | itching = NO:
| | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | blurred_and_distorted_vision = NO:
| | | | | | | | | | | | | | | | | Disease = (vertigo) Paroymsal  Positional Vertigo: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | blurred_and_distorted_vision = YES:
| | | | | | | | | | | | | | | | | Disease = Hypoglycemia: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | loss_of_balance = YES:
| | | | | | | | | | | | | Disease = (vertigo) Paroymsal  Positional Vertigo: 100.0% [93.87389999999999%, 100.0%]
| | | | | | | | | yellowish_skin = YES:
| | | | | | | | | | | Entropy: 0.5907239186406502, gain: 0.4854607607459134
| | | | | | | | | | | dark_urine = NO:
| | | | | | | | | | | | | Disease = Chronic cholestasis: 100.0% [93.87389999999999%, 100.0%]
| | | | | | | | | | | dark_urine = YES:
| | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | high_fever = NO:
| | | | | | | | | | | | | | | Disease = Hepatitis D: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | high_fever = YES:
| | | | | | | | | | | | | | | Disease = Hepatitis E: 100.0% [47.40685%, 100.0%]
| | | | | | | muscle_pain = YES:
| | | | | | | | | Entropy: 1.1522290399012944, gain: 0.9994730201859836
| | | | | | | | | yellowing_of_eyes = NO:
| | | | | | | | | | | Entropy: 0.2974722489192896, gain: 0.2974722489192896
| | | | | | | | | | | skin_rash = YES:
| | | | | | | | | | | | | Disease = Dengue: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | skin_rash = NO:
| | | | | | | | | | | | | Disease = Malaria: 100.0% [94.19448%, 100.0%]
| | | | | | | | | yellowing_of_eyes = YES:
| | | | | | | | | | | Disease = hepatitis A: 100.0% [94.19448%, 100.0%]
| fatigue = YES:
| | | Entropy: 4.087113833003859, gain: 0.9011019245114962
| | | loss_of_appetite = NO:
| | | | | Entropy: 3.4394507096117723, gain: 0.8817069873806092
| | | | | high_fever = NO:
| | | | | | | Entropy: 2.7185962248540316, gain: 0.9914266810680207
| | | | | | | irritability = NO:
| | | | | | | | | Entropy: 1.9047143071995363, gain: 0.9824740868386415
| | | | | | | | | increased_appetite = NO:
| | | | | | | | | | | Entropy: 1.596184996778472, gain: 0.6731080737015489
| | | | | | | | | | | obesity = NO:
| | | | | | | | | | | | | Entropy: 3.0, gain: 1.0
| | | | | | | | | | | | | yellowish_skin = NO:
| | | | | | | | | | | | | | | Entropy: 2.0, gain: 1.0
| | | | | | | | | | | | | | | chills = NO:
| | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | cough = NO:
| | | | | | | | | | | | | | | | | | | Disease = Varicose veins: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | cough = YES:
| | | | | | | | | | | | | | | | | | | Disease = Bronchial Asthma: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | chills = YES:
| | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | continuous_sneezing = NO:
| | | | | | | | | | | | | | | | | | | Disease = Pneumonia: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | continuous_sneezing = YES:
| | | | | | | | | | | | | | | | | | | Disease = Common Cold: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | yellowish_skin = YES:
| | | | | | | | | | | | | | | Entropy: 2.0, gain: 1.0
| | | | | | | | | | | | | | | itching = YES:
| | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | vomiting = NO:
| | | | | | | | | | | | | | | | | | | Disease = Hepatitis B: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | vomiting = YES:
| | | | | | | | | | | | | | | | | | | Disease = Jaundice: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | itching = NO:
| | | | | | | | | | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | | | | | | | | | vomiting = NO:
| | | | | | | | | | | | | | | | | | | Disease = Hepatitis C: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | | | | | vomiting = YES:
| | | | | | | | | | | | | | | | | | | Disease = Hepatitis D: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | obesity = YES:
| | | | | | | | | | | | | Disease = Varicose veins: 100.0% [94.19448%, 100.0%]
| | | | | | | | | increased_appetite = YES:
| | | | | | | | | | | Disease = Diabetes: 100.0% [94.48318%, 100.0%]
| | | | | | | irritability = YES:
| | | | | | | | | Entropy: 1.5844996446144277, gain: 0.9241335419915457
| | | | | | | | | abnormal_menstruation = NO:
| | | | | | | | | | | Disease = Hypoglycemia: 100.0% [94.48318%, 100.0%]
| | | | | | | | | abnormal_menstruation = YES:
| | | | | | | | | | | Entropy: 0.9994730201859836, gain: 0.9994730201859836
| | | | | | | | | | | depression = NO:
| | | | | | | | | | | | | Disease = Hyperthyroidism: 100.0% [94.48318%, 100.0%]
| | | | | | | | | | | depression = YES:
| | | | | | | | | | | | | Disease = Hypothyroidism: 100.0% [94.19448%, 100.0%]
| | | | | high_fever = YES:
| | | | | | | Entropy: 2.381155648699536, gain: 0.9656361333706103
| | | | | | | chest_pain = NO:
| | | | | | | | | Entropy: 1.6826392037546638, gain: 0.9402859586706308
| | | | | | | | | chills = NO:
| | | | | | | | | | | Entropy: 1.1547717145751624, gain: 0.8452282854248372
| | | | | | | | | | | itching = YES:
| | | | | | | | | | | | | Entropy: 0.3095434291503252, gain: 0.3095434291503252
| | | | | | | | | | | | | skin_rash = YES:
| | | | | | | | | | | | | | | Disease = Chicken pox: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | | | skin_rash = NO:
| | | | | | | | | | | | | | | Disease = Jaundice: 100.0% [93.87389999999999%, 100.0%]
| | | | | | | | | | | itching = NO:
| | | | | | | | | | | | | Entropy: 0.3095434291503252, gain: 0.3095434291503252
| | | | | | | | | | | | | vomiting = NO:
| | | | | | | | | | | | | | | Disease = Bronchial Asthma: 100.0% [93.87389999999999%, 100.0%]
| | | | | | | | | | | | | vomiting = YES:
| | | | | | | | | | | | | | | Disease = Jaundice: 100.0% [47.40685%, 100.0%]
| | | | | | | | | chills = YES:
| | | | | | | | | | | Disease = Typhoid: 100.0% [94.74452%, 100.0%]
| | | | | | | chest_pain = YES:
| | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | muscle_pain = NO:
| | | | | | | | | | | Disease = Pneumonia: 100.0% [94.19448%, 100.0%]
| | | | | | | | | muscle_pain = YES:
| | | | | | | | | | | Disease = Common Cold: 100.0% [94.19448%, 100.0%]
| | | loss_of_appetite = YES:
| | | | | Entropy: 2.806836027747821, gain: 0.943622285167955
| | | | | malaise = NO:
| | | | | | | Entropy: 1.6854277290691868, gain: 0.9241335419915458
| | | | | | | coma = NO:
| | | | | | | | | Entropy: 1.1522290399012944, gain: 0.8488843249236633
| | | | | | | | | vomiting = NO:
| | | | | | | | | | | Entropy: 0.2974722489192896, gain: 0.2974722489192896
| | | | | | | | | | | abdominal_pain = NO:
| | | | | | | | | | | | | Disease = Hepatitis C: 100.0% [94.19448%, 100.0%]
| | | | | | | | | | | abdominal_pain = YES:
| | | | | | | | | | | | | Disease = Hepatitis D: 100.0% [47.40685%, 100.0%]
| | | | | | | | | vomiting = YES:
| | | | | | | | | | | Entropy: 0.3095434291503252, gain: 0.3095434291503252
| | | | | | | | | | | skin_rash = YES:
| | | | | | | | | | | | | Disease = Dengue: 100.0% [47.40685%, 100.0%]
| | | | | | | | | | | skin_rash = NO:
| | | | | | | | | | | | | Disease = Hepatitis D: 100.0% [93.87389999999999%, 100.0%]
| | | | | | | coma = YES:
| | | | | | | | | Disease = Hepatitis E: 100.0% [94.48318%, 100.0%]
| | | | | malaise = YES:
| | | | | | | Entropy: 1.9995975337661407, gain: 0.9998646331239298
| | | | | | | yellowing_of_eyes = NO:
| | | | | | | | | Entropy: 1.0, gain: 1.0
| | | | | | | | | nausea = NO:
| | | | | | | | | | | Disease = Chicken pox: 100.0% [94.19448%, 100.0%]
| | | | | | | | | nausea = YES:
| | | | | | | | | | | Disease = Dengue: 100.0% [94.19448%, 100.0%]
| | | | | | | yellowing_of_eyes = YES:
| | | | | | | | | Entropy: 0.9994730201859836, gain: 0.9994730201859836
| | | | | | | | | chest_pain = NO:
| | | | | | | | | | | Disease = Hepatitis B: 100.0% [94.19448%, 100.0%]
| | | | | | | | | chest_pain = YES:
| | | | | | | | | | | Disease = Tuberculosis: 100.0% [94.48318%, 100.0%]

Short demonstration in Python

The code used to generate the trees on this page is available in a Jupyter Notebook as a GitHub GIST as well as in PDF format.

This article is tagged: Programming, Statistics, Math, Data Mining


Data protection policy

Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)

This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/

Valid HTML 4.01 Strict Powered by FreeBSD IPv6 support