Blog's by Achintya Tripathi
My 6 months at Sabudh Foundation
Here's a time line of my Internship:
- 6th July 2020 - Introduction Session
Meeting our respective mentors and the head of the Foundation and teacher Dr. Sarab Anand. -
7th July 2020 - 28th July 2020 - Introduction to Machine Learning
We started with basis of stastics and maths that will be required for starting with Machine Learning.
We wrote python scripts from scratch to understand basics of stastics and Linear Regression along with Lasso and Rigid Regression .
Linear Regression
Logistic Reg. Data gen. to modelling from Scratch
Part 2: L1 L2 Regularisation in Log. Regression -
28th July 2020 - 3rd Aug 2020 - Model Evaluation and Selection
After learning about the basics and implementing Linear Regression we learned more about Model Selection using confusion matrix, bootstrap method. and other test like z-test,t-test, ROC etc.
HighCharts EDA+Logistic Reg. Prediction -
4th Aug 2020 - 25th Aug 2020 - Unsupervised Learning
After supervised learning we moved to Unsupervised Learning and learned about clustering techniques. We learned about K-means clustering,DBSCAN,Hierechial clustering,etc and validating it with elbow curve, Silhoutte score.
Kmeans from Scratch with Silhoutte and elbow curve -
26th Aug 2020 - 15thSep 2020 - Text Analysis
We learned about how NLP playes an important role in Text Analysis. We also learned about different vectorizers Glove vectorizers, Word-to-Vec, and how to create one from cosine similarities or tf-idf vectorizers.
Gensim Word2Vec usage with t-SNE plot . -
16th Sep 2020 - 15thOct 2020 - Recommender Systems
So after 2 months into the internship we started with solving real-world problems.
We formed groups of 3-4 people and created a News Recommender Systems where everything had to be done from scratch.
The data had to collected using Web scraping, then decide the pipeline for how we are going to tackle this problem.
And also do the POC about which all algorithms to include while create a recommender and how to accomodate the users also.
We had been tasked with the job of building two intelligent bots.
1.The article recommender: This bot selects articles to serve a user. Inputs to the bot is the corpus of new articles and a user profile if available.
2.The user profiler: Once the user starts consuming news stories, (s)he leaves behind a clickstream of the form below:
The bot must extract user interests from such data that can then be used for further personalisation for (her)his news feed.
The ultimate objective is to increase clickthrough and the frequency with which the user opens the app to consume stories. However, the objective in the first visit is to:
Reduce bias in data collection (Example Bias: Stories that get served often and ranked higher, have a higher likelihood of being consumed (obtaining a clickthrough))
Learn as much as possible about the users on their first visit Maximise coverage of the news corpus https://github.com/achintyatripathi/Recommendation-Systems - Oct 2020 - Dec 2020 - We started with our major project i.e. Dashboard for Proactive Policing Project. The work on the Dashboard kept on moving with other case studies and projects we did during the internship.
-
Oct 2020 - Dec 2020 - BitCoin stock market data scraping
This was another side project which few interns voluntereed where we had to select a bitcoin exchange and try scraping trade book data, kline data etc, using websockets and api's provided the exchange. we had regular meetings other than the general meeting with out project mentors and session mentors. Nothing more can be shared but it was a great experience as we learned more about python to collect data and exception handeling and how to counter real world problem that occures while collecting data. - 16th Oct 2020 - 20thOct 2020 - Graph Analysis We got a master class about Graph Theory and how to use Neo4j
-
21thNov 2020 - 20thDec 2020 - Neural Networks
We started with the basics of NN's and then moved to CNNs and had few masters from Gurcharan Singh Jr. Data Scientist at Sabudh
Foundation.
Frame Level Speech Recognition with Neural Networks --
Your job is to identify the phoneme state label for each frame in the test data set. It is important to note that utterances are of variable length. We are providing you code to load and parse the raw files into the expected format. For now we are only providing dev data files as the training file is very large
And then on RNN from Dr. Paritosh how gave a master class on how to use RNN for time-series Analysis and tensorflow advance to use GRU, and LSTM and different type of RNN one can use. -
This is the link for the Proactive Policing Dashboard
This is the final major project we did. This is an active link for the site deployed at netlify.
Note:- For better experience use chrome to open it.
"Interning at Sabudh Foundation was a great experience.I can to learn about so many new technologies.As well as learning about how important maths and stastics are in the feild of ML/DS. This oppurtunity gave me a new perspective on how we can use ML/AI for betterment of the society."
-- Achintya Tripathi