Hello there ā, As we know that Each and every data structure and algorithm is created or developed to solve some specific problem so that we can improve the time and space complexity of the problem. For Example, When we started Learning sorting algorithms to sort a large amount of data in a particular manner, we first came to know about brute force algorithm, then insertion sort, then quicksort algorithm, and then merge sort algorithm. Thus After facing some inefficiency in the brute force algorithm, we got the Insertion sort algorithm, then we got quick sort, then merge sort to sort the data. Similarly, today I am going to tell you some useful and interesting concepts of Trie Data Structure which are created to remove the problem of searching when a bunch of strings is given.
Trie is generally defined as a prefix tree or search tree for finding a specific key from a set of keys. the key can be taken as āStringā. Trie is basically first described in a computer context by RenĆ© de la Briandais in 1959. He proposed this data structure when he found some inconsistency while searching a string from a given set of strings.
Basically Trie can be described as āA tree like data structure, where each node contains some alphabetic character, and all the characters of a word are linked with each other in a Depth First search form. ā ( As shown in the figure below ) For example, If i have given some words = [āappleā, āapeā ,ātreeā] , then below is the prefix tree for the words.
Prefix tree Representation.
Donāt worry? I will explain, how the tree is being created.
Before diving deep into Trie, Let us first understand that why do we need this kind of the data structure?
Letās take an example, suppose you are having some set of strings likeā¦. string = [āapppleā, āappplerā, āappā, ātreeā, ātrieā, āarticleā ]. Now I was asked to extract all those words which start with ap.
So What will be our normal approach? We will start traversing each and every word of the set and compare it with the given word. If I find that if the given word is matching with some words of set, then we print them as our answer.
So what is happening here?
- We are executing a loop over the given set of words.
- Now, we are comparing the first word of the set with the given word (ap).
- If that word matched with any prefix of the set word, then we include that word into our answer.
String Comparison
Here ( In the above picture ), we start with the word apple. We compare its first letter with the given word. As we can see apple starts with ap, so we include it in the answer. Similarly, we can check for other words also. So in the end, we get answer \= [āapppleā, āappplerā, āappā].
In the above approach, we are comparing each word with the given word. So this complete processās time complexity will be as follows:
- Firstly length of the searched word (m).
- Second, the total length of the array (n)because we are taking each and every element present in that array.
- So Time complexity = O(n*m)
Did you notice one thing? Here, some strings like apple, apple, and app are having the same prefix app. Similarly, tree and trie are also having the same prefix tr.
So, when some strings are having the same prefix, then why do need to, again and again, compare that same part?
Here, Trie comes into the picture. Trie helps us to avoid this type of unnecessary thing. What does it do? It stores each and every character as a node ( As shown in the prefix tree Representation picture ), and thus it avoids the same prefix comparison again and again. So Letās Understand the trieās structure.
Basic Structure of Trie
A Trie is nothing, but a tree only. A tree contains some nodes. Similarly, trie also contains the node ( Each character of the string is considered as a node ).
A Basic Node contains the following things:
- Node value ( Optional )
- Children
- IsEnd ( A Boolean Flag to show the end of a word )
An important is that the number of child nodes in a trie depends completely upon the total number of values possible. For example, the total number of child nodes is directly connected to the total number of letters possible. Here, we considering only for lowercase english letters. So, there are 26 letters, so the total number of child nodes will be 26.
The code for the above node structure will be as follows ( In Python )
Trie Node Structure
Here, for making children nodes, we are taking a dictionary that stores the key (keys are always unique ) and its corresponding value.
Inserting a word into the trie
One Important thing to note that trie always start with an empty tree node (as shown in prefix tree picture )
Insertion of Apple in trie
As shown here, initially we have a root node that has children = {}. Now we have to insert character a as a first child of it.
So, firstly we will check if a is already a child of root or not.
If not, then a new node will be created with key = āaā. So, root.children[āaā] = TrieNode().
Now, we are inserting āpā as a child of āaā. So,
TrieNode(a).children[āpā] = TrieNode().
Similarly, for other characters also. Thus we insert a new word inside a trie.
Below is the code implementation for Insertion in the trie ( In Python).
Trie Code for Insertion a word
Letās store another word āapplerā inside the tree and see how it behaves.
Insertion of another word.
See, when we insert appler inside the tree, it used the complete prefix = āappleā and created another node below it.
To represent the end of a word, we used isEnd flag.
Whenever we insert last character of any word, we make that nodeās isEnd flag as True ( As shown in the code).
Thus, you can understand how useful are they to reduce memory as well as time. Now, Insert all other words app, tree, trie, and article inside the trie.
We get the following tree.
Trie Representation
Thus, we can see how all the words are inserted inside the trie.
The word which contains the same prefix, they are stored only once.
Now, let's do some searching inside the trie. Suppose we have to find if any word present in the above list which starts with ap.
So, what will be the steps for searching? See below steps.
- start from the root node.
- Check if the current character of the word is present in the above nodeās child.
3. a. If the child is present, then go to the next character of the searching word ( Got to step 2 until the searched word length ).
3. b. If the child is not present, then return False ( No word starts with the given word ).
See, here we have to traverse till the length of the searching word [ O(m) ] in comparison to previous searching [O(n*m) ].
This is the magic of using Trie Data Structure.
Some Coding Challenges which you can solve on Leetcode related to Trie.
So, Where we use Trie Data Structure.
The most popular use of trie is used in Googleās search engine as shown below. Whenever we start typing something, we get results as shown below.
Autocompletion using trie structure
Some useful Resources
Thank You very much š.