您的位置：首页 > 其它

ecc研究更新：提问向导的影响

2020-04-22 09:00 337 查看

Welcome to this month’s installment of Stack Overflow research updates! Since November of last year, my colleague Donna Choi and I have been posting (over on Meta) bite-size updates about the quantitative and qualitative research we use to make decisions at Stack Overflow. As of this month, we are posting these updates here on the blog. ????

In March of this year, we launched the Ask Question Wizard, the biggest change to the question asking experience on Stack Overflow in our ten years’ existence. People have been using the wizard for about six months now, so let’s explore how many people have used it, what its impact has been, and where we’re going from here. We have seen some encouraging improvements in question quality, and we can use what we’ve learned to improve question asking more broadly.

这个向导是什么？

The wizard presents an alternate question-asking mode that we call guided mode, offering askers detailed instructions and direction about how to ask a question and what to include. We kept the original question-asking page intact, calling it traditional mode.

Before launching the wizard, we tested a final version of it in an A/B test. We found that question quality, which we define and explain in detail here, improved in an absolute sense by modest amounts for askers with reputation less than 111 (the asker population we included in that test). There was a 5.12% decrease in bad quality questions, and a 1.12% increase in good quality questions.

When the wizard launched more broadly, it was the default option for users with reputation less than 111, but such users could opt out if they wanted. Also, users with reputation above that threshold could opt in.

谁使用了向导？

Between when the wizard was launched and the beginning of this week, 777,644 questions have been asked using guided mode. What level of opting in/out do we see, specifically on people’s first questions?

A few percent of higher rep question askers opted in, and about 15% of users with reputation below 111 opted out. In this time period, 99.1% of first questions were asked by users with reputation less than 111, so using the wizard has been the main new question-asking experience over the past several months. Users are largely not opting out.

混杂因素和模型

It is quite challenging to measure what effect an intervention like this wizard has because users can opt in/out of it. We expect that the very characteristics that lead a user to have a hard time writing a high-quality question may make them more likely to opt in. A difference we see between using the wizard vs. not may be caused by a factor that predicts intervention (the wizard) rather than the intervention itself. This is an example of confounding, and is exactly why we typically use A/B tests when planning changes for our site. However, we would still like to see what we can learn from the last six months of data, which means we need to use methods appropriate for observational data. My academic background is astronomy, so this is a pretty familiar situation for me! (There are not a lot of randomized controlled trials in space.) I’ll focus on a few very important factors in this blog post, but predictors we explored included reputation, account age, user location, and more.

For this analysis, I am going to share results using Bayesian generalized linear multilevel modeling to understand the impact of the question wizard. Why use a Bayesian approach for this data? The main reasons are that only a very few users with higher rep ever used the wizard, and we’d like to explore whether the wizard only helps lower rep users, among other questions about when and for whom the wizard is helpful. Bayesian modeling provides a framework well-suited to those kinds of questions.

问题质量

First, let’s look at model results for question quality.

This plot shows results for a straightforward classification model predicting whether a question is good or bad trained using brms and Stan. I chose the reputation threshold at 11 (rather than, say, at 111) because the median reputation of a first time question asker is 8.1. A “new account” is one created within the past day. The size of these effects does change with the thresholds but these predictors having an impact at all is robust to such changes.

How can you interpret this plot? Reputation above the threshold, older accounts, and using the question wizard are all associated with positive improvements in question quality. Specifically, because of the modeling approach used here, we see that, for example, people with accounts older than one day write better first questions, controlling for other factors like reputation and using the wizard.

Quality?	Factor	Risk ratio	95% confidence interval (low)	95% confidence interval (high)
Bad	Created with wizard	0.82	0.79	0.85
Good	Created with wizard	1.06	1.03	1.10
Bad	New account	1.19	1.16	1.22
Good	New account	0.89	0.86	0.91
Bad	Rep <= 11	2.05	1.87	2.24
Good	Rep <= 11	0.59	0.55	0.64

Let’s walk through how to interpret these risk ratios, which are relative changes.

代表rep <= 11的人比起信誉较高的人，提出高质量的第一道问题的概率约为0.6倍（少40％），向质量低下的问题提出问题的概率约为2倍（多出100％），控制其他因素。
拥有新帐户（在过去一天内创建了帐户）的首次提问者，提出优质问题的可能性约为0.9倍（减少了10％），约为1.2倍（20倍）比拥有较旧帐户的人更有可能提出质量较差的第一个问题。

Controlling for these confounders, when a first time question asker uses the wizard, their question is about 6% more likely to be good quality and 20% less likely to be bad quality. We consider this a real success of the wizard, because when people ask better quality questions, they are more likely to get answers and have an overall more positive experience. We also know people who ask good quality questions are more likely to ask a question again. Confirming what we found during the A/B test, the wizard has a bigger impact on bad question quality than good question quality.

I explored whether we see evidence that the wizard only helps users with lower reputation or first-time question askers, and we don’t see evidence for that. We don’t have as strong evidence that the wizard can help users with higher reputation (we don’t have much data on this), but we can put some limits on how large the difference for higher and lower rep question askers would need to be for us to see a difference with the data we have.

评论嘉豪

We know from multiple sources of feedback and research that the comment p of Stack Overflow can be a pain point in using our site. Putting aside the issue of unfriendly or otherwise problematic comments, they are intended to be a venue for clarifying questions to improve the quality of a question. This means that from our perspective, in general and on average, the fewer comments, the better.

How has the question wizard affected the number of comments on first-time questions, and the number of unfriendly comments? Instead of a classification model, this uses a Poisson regression model.

People use several different categories to flag comments that are inappropriate for our site, such as rude/abusive and unfriendly/unwelcoming. Recently, my team, especially my colleague Jason Punyon, has used the human-generated flags to build a deep learning model to automatically detect unfriendly comments on Stack Overflow. We’ll share more soon about this model, how it works, and how we’re using it on our site to make Stack Overflow a more safe and inclusive community. For this analysis, I used the unfriendly comment model (not comments flagged by human beings, which we know are dramatically underflagged) to understand what impact the wizard has on unfriendly comments.

Comments?	Factor	Rate ratio	95% confidence interval (low)	95% confidence interval (high)
All	Good quality	0.85	0.85	0.86
Unfriendly	Good quality	0.28	0.25	0.30
All	Created with wizard	0.95	0.94	0.96
Unfriendly	Created with wizard	0.78	0.71	0.85
All	New account	1.05	1.04	1.06
Unfriendly	New account	1.18	1.10	1.27
All	Rep <= 11	1.06	1.03	1.08
Unfriendly	Rep <= 11	1.42	1.11	1.82

What do these coefficients mean? They are multiplicative factors, because this was Poisson regression.

代表rep <= 11会与6％的评论和不友好的评论约40％关联。
拥有一个新帐户（少于一天）与5％的评论相关第一个问题和不友好的评论增加大约20％。
写出高质量的第一个问题可以使该问题的评论减少15％，不友好的评论减少70％。< / li>

Using the wizard is associated with 5% fewer comments and over 20% fewer unfriendly comments. These factors and their impact come from models controlling for the other factors, so think about the wizard reducing unfriendly comments by over 20%, controlling for other factors, including reputation and quality of the question. We consider this another huge success of the wizard.

下一步

We are really happy to see that the question wizard has been having a positive influence on Stack Overflow, both in terms of question quality and interactions via comments. ???? This was the first time we had shipped changes of this magnitude for the question-asking workflow in a decade, and it is great to be able to celebrate its impact.

The question wizard has proven successful enough that we want to iterate on its design and see how we can better scaffold people with coding problems to either write a successful question or realize that they don’t need to ask a question at all because the answer is already available. We also need to revisit some of the technical decisions made when launching a two-mode question workflow, as some aspects of the current system are brittle and inappropriate for a growing system, especially when we consider the network beyond Stack Overflow.

What can you expect in the near future if you ask a question soon? We will be using A/B testing, so not every user will experience changes at the same time, but some of the changes we are working on so far include:

针对首次提问者的更多前期指导
设置对提出问题后会发生什么的期望
在起草问题时改进了“如何提出建议”的指导< / li>
将许多验证警告整合到单个查看界面中

For both technical and design reasons, we plan for the question workflow to change for all users, not only those with lower reputation. As we move forward, we can use both data and feedback from users to assess how successful changes are. How do we know when we are successful? We use both qualitative and quantitative research in making decisions about our site. This blog post is an example of the kind of quantitative analysis that we use, involving large samples of our users broadly. For more details on what’s coming to Stack Overflow soon, check out my colleague Meg’s blog post from earlier this week! from:https://stackoverflow.blog/2019/08/22/impact-of-ask-question-wizard/

点赞
收藏
分享
文章举报

cunehu1722 发布了90 篇原创文章 · 获赞 0 · 访问量 4114 私信关注

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航