Linkedin - product

Silhouette

The silhouette value is a measure of how similar an object is to its own cluster compared to other clusters. The silhouette ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. If most objects have a high value, then the clustering configuration is appropriate. If many points have a low or negative value, then the clustering configuration may have too many or too few clusters.

The silhouette can be calculated with any distance metric, such as the Euclidean distance or the Manhattan distance.

  • 你认为avg number of location changed per week 的distribution是什么样?描述看看?

The distribution should be right skewed, because the majority users don't have travel, that means the median number of location changed per week for majority users would be 0 or a very small number, the vaerage will be greater than median because it affected by the outliers.

 

How to improve linkedin?

job searching functionarlity

metrics: engagement, retention

initiatives: active candidates - application dashboard, application in timeline, slef update status;

    passive candidates - save contact/recruiter; save company

 

  • estimate sales population 

Firstly I want to know if we are looking at LinkedIn US only or global.

In the first step, I will layout the whole structure starting from the top box (the final objective) going down to the leaves (pieces of information that are easier to estimate).

The second step is to start making assumptions from the leaves at the bottom and calculating the boxes of your tree towards the top (your final objective).

  1. Total salesman in USA
  2. 80% of salesman using Linkedin 
  3. # of companies - 20 M
  4. % of companies in different category, small(< 50), medium(50 - 250) and large(>250): small 85% 17M, medium 14.9% 2.98M, large 0.1% 20k
  5. average # of salesman in small, medium, and large company respectively: 1 sales in small, 10 sales in medium, 50 sales in large 
  6. total sales = 1*17m + 10 *2.98m + 50*20k = 47.8m
  7. Total salesman in USA = 0.8 * 47.8m = 38m

 

 

  • how would you classify job seekers among linkedin users. 

If we classify job seekers based on the activity, they can be classified into active job seeker and passive job seeker.

We can also combine behavior data(# searches, # job views, # applications, # meesages with recruiter), profile(occupation, industry, yoe) or connection data(# connections) together and build a clustering model to classify the job seeker.

Login failure

问了我一个brain storming的问题,说很多人登录linkedin不成功是因为之前后缀是前公司或前学校的email失效了,她问我as a data scientist, how do i solve this problem? 

My approach will start with checking the current status. I will look at the data and check how many users or proportion of users have been impacted. The impact will determine how much we want to spend on the solution. If a large proportion of users are facing this issue, we should invest more in helping users who are facing issue now and also we should come up with solution that will prevent this issue happen in the future.

To help users who cannot login due to expeired email account, we would build a new feature, which will ask those users questions such as last login date, last login device and location (working experience, start date and end date, school… )based on these info and ip address or device id, the new feature will search for similar account in the system. We don’t want to retrive wrong account, so the feature has to be very accurate and we may ask additional verification through their connections account.

To prevent this issue happen in the future, we should encourage users to sign up using their personal email account and ask security questions so that we can use them to retrive the account.

 

Storyline

there is a linkedin storyline in the App, some people asigned to this, some people assign to control

  1. what are the metrics we would like to measure

objective: Storyline aims for the users to return or spend more time on LinkedIn by viewing or creating stories.

metric:

High level:

Engagement metric - time spent in storyline

content creation: # of stories created

any cons for this metric? Answer: not sure about the content quality, we can check the length of the post, check # of share, # of comment, # of likes, # of read for a given post.

content consumption: # of stories viewed

Retetion metric - days since last login

  1. How to find a metrics to check if Linkedin is the primary source of some one posting on social network?

Change in post interval.

Use post frequency (post interval: daily poster, weekly, biweekly, monthly) to do cohort analysis. We define cohort as a group of user have similar post probability within the first week after they join. 

passive users never post;

active users: post probability = # of time users post at least once in a session divided by # of sessions

segment active users by How frequently they are using? profile completion rate? how many connections? engagement (# likes, # shares, # comments)

posted @ 2019-01-24 06:13  ffeng0312  阅读(330)  评论(0)    收藏  举报