I don’t think the problem is Simpson’s paradox. It’s that ProPublica (disclosure: I used to work there) used a different definition of fairness than you have. They define fair in terms of balanced false positive rates, that is p(high risk | no re-arrest). You are graphing the opposite, that is, p(arrested | high risk). I don’t know of any clear way to decide which is “more fair.” There are good arguments for both.
It doesn’t happen all the time, but some sources reach out to journalists at risk to their safety or livelihood. Understanding how to minimize risk begins with understanding how our information flows—who can see our communications, when, and under what circumstances? Of course, it’s not enough to guide tipsters toward meaningful protections—we also want to make sure their information is sound.
Hi! I’m leading a small team developing the Computational Journalism Workbench (cjworkbench.org). It’s is a collaborative platform for journalists to connect, analyze and visualize data — all without a line of code. We are looking for someone to lead our marketing efforts, and bring key newsrooms on board as collaborators. Workbench users stack together pre-defined modules into complete data processing workflows, which can then be shared with the public and colleagues.
I’m also deeply suspicious of the “journalist as story-teller” thing. Too many journalists are frustrater writers, and writing is not reporting. However... stories are core to how humans learn and understand. The trick is producing factual stories. https://twitter.com/jayrosen_nyu/status/964298395430150144
So it turns out you need a comparison between THREE groups to do a solid experiment: treatment, placebo, and nothing at all. Without that third group, you can't know if there is a "real" placebo effect or it's just regression to the mean. http://slatestarcodex.com/2018/01/31/powerless-placebos/
At least some of the accounts affected by #TwitterPurge seem to be humans. This is the problem with doing anything at scale: nobody cares if you are 99.9% accurate. Only the mistakes get amplified. Transparency of process might help, but also tells people how to game the system.
Muck Rack makes it simple to find people, tweets, or articles that mention any name, keyword, company, hashtag etc. We've compiled this guide to help you make the most of your search.
Selecting a term
Start searching tweets, articles from media outlets, articles mentioned in tweets, journalists'
names, titles and bios with some suggested searches:
Companies or Topics (e.g. iPhone, Microsoft)
Phrases (e.g. "cloud computing") — use quotes to keep the terms together
Twitter handles (e.g. @username) — returns those who have mentioned or replied to
Names (e.g. "David Pogue")
Hashtags (e.g. #sxsw, #london2012)
Bio details (e.g. vegan, Olympics, father)
Muck Rack's Advanced Search allows for many boolean operators.
Find results that mention multiple specified terms, use AND or
+. For example, ensure each result contains both Elon Musk and Mark Zuckerberg by
searching Musk AND Zuckerberg or Musk + Zuckerberg.
Use the operators OR or , to broaden your search when you'd like either of
multiple terms to appear in results. (This is the default behavior of our search when no operators
are used). For example, results will contain either cake or cookie by searching cake OR cookie or cake,cookie
Use NOT or - to subtract results from your search. For
example, searching Disney will yield results about the Walt Disney Company as well as Walt Disney
World Resort. To exclude mentions of Disney World, search for Disney -World or Disney
When using one of these operators with a phrase, enclose it in quotation marks. For example, you can
find results about smartphones excluding Apple's iPhone 4S by searching smartphone -"iPhone
Exact case matching or punctuation
If you're searching for a brand name or keyword that relies on specific punctuation marks or capitalization, you can
find results that match your exact query by adding matchcase: before the keyword you're searching for, like matchcase:E*TRADE .
Use parentheses to separate multiple
boolean phrases. For example, to find journalists talking about having fun in Disney World or
Disneyland, search for ("disney world" OR disneyland) AND fun.
An asterisk can be used to search for any variation of a root word truncated by the asterisk. For example, searching for admin* will return results for administrator, administration, administer, administered, etc.
A near operator is an AND operator where you can control the distance between the words. You can vary the distance the near operation uses by adding a forward slash and number (between 0-99) such as strawberries NEAR/10 "whipped cream", which means the strawberries must exist within 10 words of "whipped cream".