Understanding/visualizing TF-IDF concept
When we hear the word TF-IDF, it looks like some code word of James bond movie, and we get frighten that how we are going to crack this code. but in real this is just a conversion technique or to be more precise we can say that TF-IDF is used to convert strings into some numerical ( vector ) notation that machine can understand.
As we are very well aware that machines to not understand simple literature words, to get machine understand our simple English ( or any other language) we need to convert them into the form machine understands. And TF-IDF is just a tool to convert that.
let me take an example to understand how this conversion happens.
suppose we have below mentioned 3 sentences.
- Good Morning
2. Good Evening
3. Good Morning and Evening
we will be using TF-IDF to convert above sentences into numerical (vector) forms that model can use for its analysis.
Basically TF-IDF is combination of two words
- TF ( Term Frequency) :
- IDF ( Inverse Document frequency)
So now we will convert above 3 sentence into TF and IDF.
In above mentioned image, we can see how words has been converted into numerical formats.
Hopes this will clear give an better understanding of TF-IDF.