I'm Vishnu.
What do i do for Living ?
I work as a Data Science Engineer.
What do I Do?
What do I work on?
Workflow of Analytics Project / System
SAAS--PAAS--IAAS
IBM Bluemix
Zero Infrastructure, Lower Risk
Lower cost and improved profitability
Easy and quick development, Monetize quickly
Reusable code and business logics
Integration with other web services
Setup and basics
Goal: Find the line such that distance from line to each point is minimized.
We will 'fit' the points with a line, so that an 'objective function' is minimized. The line we thus obtain would minimize the sum of squared residues (least squares).
A regression model where the dependent variable (DV) is categorical.
Logistic regression is technically a classification technique; do not get confused by the word 'Regression'
We will 'fit' the points with a line, so that an 'objective function' is minimized. The line we thus obtain would minimize the sum of squared residues (least squares).
Find k closest training examples, and poll their class values
k-NN is a type of instance-based learning , or lazy learning , where the function is only approximated locally and all computation is deferred until classification.
One of the simplest machine learning algorithms.
Find a model for class attribute as a function of the values of other attributes.
Goal: Build a tree; At each node, split the data on the basis of one attribute which provides the maximum split
> If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt
> If Dt is an empty set, then t is a leaf node labeled by the default class, yd
> If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset.
Decision Trees – Travel Time to Office
Ensemble classifier containing many decision trees and outputs the class that is the mode of the class's output by individual trees.
Apply Bayes’ theorem with the “naive” assumption of independence between every pair of features
Draw inferences from datasets consisting of input data without labeled responses. Clustering is used for exploratory data analysis to find hidden patterns or grouping in data