Category Archives: Uncategorized

Real People Silhouettes

I recently had the opportunity to talk to the Church lab about Open Humans, and in the course of preparing slides I needed to represent “people” somehow. The standard “person” icon felt irritatingly male-default (the one on men’s bathroom doors), and mixing it up with the stereotyped “dress equals woman” felt like a miserable propagation of gender binary. So I looked for some silhouettes of people (and I looked for them on OpenClipArt, where all artwork is supposed to be public domain).

I found these1 – and they’re better than icons. People, after all, vary. Humans aren’t a bunch of identical rubber stamps.2

Four silhouettes

You’ll probably think I’m too picky, but I was still dissatisfied. There was only one woman, and she still felt … extra feminized. I mean, it’s not offensive at all, she looks very normal. But the hair, the legs, the shoes… it still felt like a “performance of femininity” and not just “here is a person that happens to be female”.

Recently I saw that Inkscape has a trace tool and thought I could try my hand at making some “real people silhouettes” from real photos. So I searched Flickr and found this CC-BY photo by downstairsdev. From that I made these three figures:

Silhouettes of three people

Can you guess which of these are female? (Answer: all of them.3) Chris wasn’t sure if these were female or male when glancing at them. I think we are collectively inured to the exaggerations made by media: men and women aren’t as different as we make them out to be.

I’d like clip art with less gender-exaggeration to be available to all, so I’ve put them on Wikimedia Commons. I’ve marked these as CC0 because I think clip art really shouldn’t expect stuff like attribution.4 The photo was CC-BY (and not CC-BY-SA) so I think this licensing choice is allowed (let me know if I’m mistaken). (Sorry, I’ve requested deletion of the images per Sage’s correction below.)

I also made a nice “man with a cane” silhouette from this photo by ragesoss of his grandfather (going by the title there) but the photo is CC-BY-SA. As a result I’m not really sure what to do with it I’ve shared it with a CC-BY-SA license, but I really wish it could be CC0. Maybe Sage can change the license to CC-BY…. 😉 Thanks to Sage relicensing this, I’m able to share it as CC0.


1: All four were by rejon, links: Person Outline 1, Person Outline 2, Person Outline 3, Person Outline 4
2: Indeed, the fact humans vary — and that their risk preferences vary — is why something like Open Humans should exist. Some participants will be okay with publicly sharing data that others prefer to keep private.
3: If you crank up the curves – or the gamma on your computer – you can see the faces of the silhouetted folks in the original photo.
4: Well, to be fair, I decided to CC0 all my media shared on Wikimedia Commons.

Eight years and eight percent: Always giving more

(This is a joint blog post with Chris.)

Our tradition continues: to celebrate our eighth year of marriage Chris and I are giving 8% of our joint pretax income. (Each year we give 1% more.) This giving is made to organizations which we believe have the most concrete short term “estimated value” for helping others.

As people look forward to making resolutions for the coming year, we hope our own example helps inspire others to give – just as others have inspired us by giving more, despite financial pressures. Those who go ahead of us have blazed a trail we happily follow.

"Path Squiggles" by Dominic Alves

]6 “Path Squiggles” by Dominic Alves

As in previous years, we are guided by the research performed by GiveWell. Efficiency in good should matter, and for this reason our money will be going to help the developing world. Money can do more immediate good for the global poor – each dollar can accomplish more – than it can do to ameliorate the lives of those in first-world poverty.

Almost all of our giving this year will go to GiveDirectly. GiveDirectly aims to distribute 90% of the money it receives directly to poor individuals in the developing world. Their methods have been developed in Kenya, where the M-Pesa mobile-phone-based money transfer system facilitates the transfer of cash. GiveDirectly had a great year, with high profile and supportive articles in the New York Times, NPR’s This American Life podcast, and even The Economist. Even better, these articles often introduce one of the central ideas behind GiveWell (which has recommended GiveDirectly as one of three top charities) – that we can try to target donations to do the most good for the most people, and that acknowledging this involves a dramatic rethinking of which charities we choose to support.

"Mobile Phone with Money in Kenya" by Erik (HASH) Hersman

]13 “Mobile Phone with Money in Kenya” by Erik (HASH) Hersman

There are many ways to make our lives meaningful. We have been fortunate to grow our family with our first child: a concrete meaning and joy, though a local one. We’ve also been especially fortunate to have had employment (past and present) where our skills are used to improve the world. A third path to meaning – one we hope others will join us in celebrating – is to give, to give more, and to give wisely.

May you find the happiness of giving in the new year!

CC0 all the media


I’ve released as CC0 all the pictures I’ve created and shared on Wikimedia Commons. I’ve been thinking about doing this for a while; Aaron’s death and — more specifically — Nina Paley’s release of Sita Sings the Blues as CC0 have pushed me into doing it. I’ve encountered the same issues she has — people ask me for permission due to legal concerns when I don’t think they need to. In particular, my chemical structure of DNA diagram has been a popular item for textbooks.

I study body-books

Theo Sanderson has made a text editor that checks if a body of text complies with using only the 1,000 most common English words. This was inspired by XKCD’s “Up-Goer Five” — a description of the Saturn V rocket created according to this rule. It reads like a Simple Wikipedia article (but even more extreme).

Anyway, I’ve seen a couple friends describe their job using this constraint, so I figured I’d try my hand at it. It’s surprisingly intelligible, and I think I like the kenning of “body-book” to describe a genome.

I study body-books

Children often have bodies like their parents. One reason this is true is because we each have parts that tell our bodies how to grow. We get these parts from our parents, and they can be read like a book. I study these body-books.

Some body-books have words that cause people to grow in the same way. But sometimes people are different — even if their body-books have the same words — and so I also study what things make bodies different even if their body-books are the same.

We are able to study our body-books more than ever, because we can now read them very easily.

Another important thing about body-books: we think it will be possible to learn a lot from someone’s body-book, even if we aren’t able to do it now. Also, with computers it’s very easy to share body-books — and it’s very hard to hide them after they’re shared. This means if people give their body-books so others can study them, they might share things they didn’t know about and didn’t mean to share.

So another part of my job is making sure people learn this might happen. We want to share body-books with everyone so that everyone can study them, but only people who know the fears should share their body-books.

What can we do in Aaron’s wake?

The brother of my friend Noah Swartz committed suicide last Friday. I didn’t know Noah’s brother Aaron, so these are the terms I relate to it in. The Swartz family is close to many of my friends: Mako and Mika live in his Aaron’s former apartment/offices, and I’ve met both of Noah’s brothers through them. Noah’s a quiet guy, but a geek in his own right — crazy good at strategy games and an occasional host for college radio.

Noah’s brother was Aaron Swartz. Aaron’s in the news a lot right now, and with good reason. He was brilliant and he was unfairly treated. The Swartz family and Aaron’s partner aren’t going to have a lot of privacy these days, but I’m not sure they want it. They’re angry and they want you to know that Aaron’s death wasn’t just about depression:

“Aaron’s death is not simply a personal tragedy. It is the product of a criminal justice system rife with intimidation and prosecutorial overreach. Decisions made by officials in the Massachusetts U.S. Attorney’s office and at MIT contributed to his death. The US Attorney’s office pursued an exceptionally harsh array of charges, carrying potentially over 30 years in prison, to punish an alleged crime that had no victims. Meanwhile, unlike JSTOR, MIT refused to stand up for Aaron and its own community’s most cherished principles.”

It is difficult to explain what Aaron was actually prosecuted for, what he was facing, and why it was horribly wrong. The best summary I’ve heard so far was aired this morning on WBUR:

It’s hard to know where to go from here. Here are some ideas.

  • Ask MIT for an apology. It’s too little and too late, but those who loved Aaron would like to see MIT acknowledge that its involvement in his prosecution was wrong.

  • Dedicate yourself to publishing Open Access. If you are in academia, you know what this is about. Aaron was convinced that knowledge is power, and our publications are purportedly our efforts to share knowledge. You may also wish to share copies of your pdfs on the web, and there is a Twitter movement advocating this (#pdftribute). I should note while this is common it is also technically illegal — an act of civil disobedience, albeit on a much smaller scale than Aaron’s alleged and unrealized liberation of JSTOR archives.

  • Give to Givewell. Aaron believed we have a moral obligation to help others in the most efficient manners possible. He personally worked for structural change — he was a genius and so he had a reasonable chance of accomplishing this — but he was also a strong believer in Givewell and doing the greatest good by contributing to the developing world. My husband Chris and I donate a significant fraction of our income each year to Givewell, and the Swartz family has asked that donations made in Aaron’s memory be made to that organization.

Finally, here are articles and links if you’d like to learn more about Aaron. I present these in chronological order.

Well, That’s Ironic

I’m lucky and grateful to have been recommended by George Church for Genome Technology’s Seventh Annual Young Investigators. The profile they wrote — “Madeleine Price Ball: Free the Data” — is really nice. Or at least it was, if I recall correctly. I talked about how important it is for scientists to share information freely (in particular, human genome and interpretation data).

How ironic is it that it’s behind a subscription block?

I had mixed feelings about the interview, as I knew this would happen. At least the GenomeWeb account doesn’t cost anything. It does, however, require a password containing at least one of each of the following: uppercase character, lowercase character, number, and punctuation. And… it does this all over “http”, not “https”. Since GenomeWeb is apparently encouraging you to send one of your favorite super-secure passwords all around the internets in plaintext, I’m reluctant to recommend making an account there.

Celebrating Seven Years with Seven Percent

(This is a joint blog post with Chris.)

Today is Giving Tuesday. It’s a great idea. Here in the US, something feels odd about following our national day of giving thanks (Thanksgiving) with the consumerism of Black Friday, Small Business Saturday and Cyber Monday. As we shop to find gifts for those we love, we feel it’s also important to celebrate giving to those we don’t know, who need it most. We hope this post inspires others to give more and to celebrate giving.

For several years now we’ve celebrated our wedding anniversary by giving a percentage of our yearly pre-tax income to charity — a percentage determined by the number of years we’ve been married. This year that percentage is 7%. Our 7th anniversary was October 29th, but we’ve waited to hear from our favorite source for charity advice, GiveWell, to make their yearly recommendations. Luckily they did this yesterday, giving us the opportunity to post this today.

This year we are closely following GiveWell’s advice and giving 90% of the 7% to three charities: GiveDirectly, the Against Malaria Foundation (AMF), and the Schistosomiasis Control Initiative (SCI). (The remaining 10% will be decided later, and will probably be advocacy and other nonprofits that may not be highly effective, but are close to our hearts.)

Loiturerei village, Kenya. Taken by UK DFID, CC-BY-SA.

50% to GiveDirectly (3.5% of our annual income)

GiveDirectly is GiveWell’s only new recommendation this year, and we think it’s one of the most interesting charities out there. Its method is simply this: find the poorest people in Kenya (here’s how they do that) and give them money through the M-PESA money network.

There are all kinds of reasons why simply giving money to poor people directly might not be the best we can do (they might spend it on something we’d rather they didn’t, for example) but it does avoid the money’s impact being diluted by corruption or overhead. More importantly, GiveDirectly will be quantifying how much it helps. They will follow up with the recipients over the next year — using a randomized control trial for which they’ve pre-published the survey and analysis plan.

We’re hopeful that better interventions exist than GiveDirectly. But we want their project to succeed because it shares the commitment to measuring outcomes that we think is vital, and it can serve as a baseline to compare other charities to in the future (i.e. “Can you do something that creates more improvement to lives than GiveDirectly? Prove it.”).

30% to Against Malaria Foundation (2.1% of our annual income)

AMF distributes insecticide-treated nets for protecting against malaria infection. GiveWell estimates the cost per life saved is just under $2,500. Malaria is not usually fatal, so there is also a fair amount of disability due to illness is also being prevented.

10% to Schistosomiasis Control Initiative (0.7% of our annual income)

GiveWell thinks that SCI — which concentrates on the “Neglected Tropical Diseases” (usually worms/parasites) — offers an extremely effective intervention at improving DALYs (see below). This is because the infections they focus on are readily treatable using very inexpensive drugs, yet often come with debilitating symptoms that don’t quite kill the “host”.

“For You!” By Nomadic Lass, CC-BY-SA.

Donating effectively

It’s hard to list all the reasons people choose to give, or do not. One issue we’ve seen raised is the belief that “charity doesn’t work”. We believe that simply isn’t true. It may be true for some — many — perhaps most! Government-managed foreign aid especially so: it’s only around 1% of the US budget and mainly goes to political allies. But there are non-governmental charities that demonstrate real improvements, and GiveWell supports these. Giving can work, but it’s important to find effective giving opportunities.

And for that reason, we waited for GiveWell’s latest recommendations. Givewell looks for organizations that maximize the improvement to lives caused by each dollar you’re giving. This seems like it should be uncontroversial, but it’s not yet common to think about giving this way. Perhaps one reason for this is that it requires a way to measure outcomes and compare them against each other, and that’s very difficult. GiveWell is doing a fantastic job trying to do this all the same, though, using tools like the Disability-Adjusted Life Year (which is a measure of health that’s better than just measuring how long people live), randomized control trials, and the kind of statistics knowledge you have when you’re a charity review organization that was founded by a bunch of ex-quants. (A Businessweek article referred to GiveWell as Hedge Fund Analytics for Nonprofits.)

A second reason people are sometimes reluctant to think about donating effectively in this way is that for most of us, it’s going to involve donating to people far away instead of in our local communities. The price of living here in Boston, MA is very high, both for rent and food — in contrast, more than a third of the people in the world live on less than USD $2/day (most people don’t realize that this number is adjusted for the purchasing power of goods and services in the US!). When trying to decide whether to donate locally or globally, it’s clear that our money can do much more good in other countries than here in the US.

A third reason that people are reluctant to give to maximize outcomes is that we don’t have the same emotional connection to people across the world as we do to an individual call from help from someone that we can see — counter-intuitively, studies such as this one show that people have a strong bias towards giving more money to help a single identifiable victim than to help many “statistical” victims. The Internet has helped to reduce the effects of this emotional bias, with sites like Kiva giving a name and face to the global poor. Perhaps GiveDirectly could benefit from adopting a Kiva-style interface itself.

Closing thoughts

Each year we ratchet up the amount we give, and this year has brought us a new financial development: our first child. When people learn about our annual tradition they wonder how it will scale — will we be doing this on our 20th? Our 50th? Our 101st? (We hope to have that last problem!) As Yogi Berra said, “It’s tough to make predictions, especially about the future.” We know the responsibilities of parenthood will demand more of our finances, and balancing that with wanting to help others will be a lifetime project. Tithing (10%) is a very common tradition, and we want to at least reach that. Maybe we can go beyond it. For now we’ll take it one step at a time, and try to give a little more each year.

Phineas and Name Uniqueness

It’s been a while since I posted to this personal blog — so long, in fact, that I have had a child! We named him “Phineas Charles Ball”. (Photos are on Flickr.) “Phineas” is a fairly unusual name — although it’s become more familiar lately — and this post is my exploration on how “weird” this name actually is, and how name uniqueness trends have been developing over time.

As many of you already know, one of the most useful sources for analyzing baby name trends in the United States is the baby name data published by the Social Security Administration. These data have become especially high quality as social security numbers have become ubiquitous (at this point almost all children acquire one at birth). What you might not have realized is that some great raw data files are also available that go beyond what the website provides — the only limitation in these is that names used less than five times in a given year are not reported (for privacy reasons).

The first thing I wanted to plot was what most of us have noticed — qualitatively if not quantitatively — names have been becoming more unique. First I calculated the diversity as Shannon entropy. (I did a bit of a hack though: because I was limited to names seen 5 or more times, I only calculated the entropy of the most common 90% of names in a given year. This was close to the maximum possible — by 2011 nearly 1 in 10 girls has a name seen less than five times!)

Another way to slice this data is to try to answer this question: “How many names are needed to cover half the population?” (Or 10%. Or 90%.)

In 1950 you could cover half the male population with just 24 names — in 2010 you needed 139. As a child I remember sadly eyeing prelabeled personalized souvenirs, knowing I wouldn’t find my name among the items. (This is especially true because my first name isn’t the most common spelling.) Selling this sort of prelabeled paraphernalia has become a lot more difficult — many more names are needed to cover the same fraction of the population!

Some observations…

  1. Name uniqueness hasn’t been increasing monotonically. Names seem to have become slightly less unique between 1910 and 1950. After 1950 uniqueness increased, and really took off in the mid-1980s.
  2. Girl names are more unique than boy names (you probably already noticed this). It may be interesting to note that boy names today are as unique as girl names were in the early 1990s.
  3. You should take the early data with a grain of salt: the total applicant data shows that not all US citizens received social security numbers (SSNs); especially few that were born before 1910. The program was created in 1935 and the legal uses of SSNs expanded gradually.

So Phineas’s name occurs in a context of increasing uniqueness: to have a rare name now is more common than it was when I was born, and much more common than when my parents were born. This particular name also happens to have become more popular lately. When we slice the data we find that in the latest years the uniqueness of “Phineas” is near 80th percentile — one in five boys has a rarer name. It’s a bit unusual, but it’s not a dramatic outlier.

I’ll close with a list of famous Phineas’s: Phineas Gage (a famous case of frontal brain damage), Finny in “A Separate Piece“, P.T. Barnum (P. = Phineas!), and Phineas Flynn from the cartoon “Phineas and Ferb“. Also oft misremembered as Phineas: Phileas Fogg in Jules Verne’s “Around the World in Eighty Days“. Chris’s favorite find is Phineas Ball (1824-1894), waterworks engineer and mayor of Worcester, MA.

23andme’s First Patent

Update, June 1: 23andme has added an addendum to their announcement. In particular, the addendum clarifies and seems to promise that the patent will not be enforced with respect to performing interpretations: “Other entities can present information about the genetic associations covered in our patents without licensing fees.” This is reassuring news and it’s great to see 23andme outline such a limitation on patent enforcement! It allays my feared hypothetical situation regarding a “swiss cheese” effect on genome interpretation efforts (described below). — Madeleine

This morning I noticed a post from 23andme’s blog last night: Anne Wojcicki announced that 23andme expects to be awarded its first patent today. It touched on a lot of issues I care about, so I’ve written this personal post in response to it.

From what I understand, the 23andme patent seems to be a patent on genetic variant interpretation: in specific, on the interpretation of some variants (including one in the gene SGK1) as being associated with differences in an individual’s risk of developing Parkinson’s disease. Technical methods for determining the variants are listed, but they seem to be an enumeration of all extant methods for assessing genetic variants (including techniques used in whole genome sequencing).

In other words: this seems to be a patent regarding the reporting and usage of an observation that a naturally-occurring genetic variant is associated with a particular trait. As noted by 23andme’s announcement, these patents are controversial.

Patent Wars

While my first love is genetics, I am also a programmer — and in software, patents are very broadly hated by programmers. This American Life has an excellent episode documenting the tangled mess that is the software patent industry. It has become an arms race; even the most well-intentioned companies feel obligated to build up patent arsenals. Software patents are a different beast to biotechnology patents, but in some ways larger issues remain true: applied too broadly, in a field of rapid progress, patents have the potential to create a tangled web of litigation. The intended purpose of patents to protect innovation and encourage commercialization through exclusive access to innovation has instead become outright warfare.

A web of litigation in the mobile phone industry. ©2010 George Kokkinidis / Design Language, used with permission

I worry that this vision of patent warfare could exist in the realm of genome interpretation. The multitude of patents on the meaning of genetic variants seems to make the process of whole genome interpretation almost impossibly hazardous. I think it is vital to everybody that we are able to not merely return your “A’s, C’s, G’s, and T’s”, but also give you explanations like “you have A here, and according to these studies this means you are much less likely to be infected with stomach flu”. Will each one of those explanations run the risk of violating a patent? Will genome interpretations become like Swiss cheese as they must carefully avoid mentioning each of the patented genes (which are possibly the most important ones)? Is part of 23andme’s purpose here to build up its own arsenal of interpretations, as both defense and weapon against other interpretation efforts?

Will patents on the observed associations of genetic variants turn whole genome interpretation efforts into swiss cheese? Image credit: Madeleine Price Ball, CC-BY-SA

23andme is far from the first in this field (there are hundreds or thousands of patents like this one) and it is possible that they have no intention to engage in such wars. Nevertheless, as far as I am aware they have not released an assurance that the patent will not be used in this way (of course, neither has anyone else). In the software industry, some groups have made assurances regarding their patents — promises that the patents will only be used for defensive purposes (e.g. Twitter) or limits on their offensive uses (e.g. Red Hat). That said, such promises are easily broken.

Also troubling to me is the exact wording in the announcement itself:

“We believe patents should not be used to obstruct research or prevent individuals from knowing what’s in their genome. We believe that everyone has a right to know their genomes — their sequence of As, Ts, Cs, and Gs — and should be able to access them should they want to. This has been our guiding principle since day one, and 23andMe has pioneered the ability for individuals to have unfettered access to their genomes.”

I’m reading between the lines, but… if access to your genome means that you only have access to the uninterpreted sequence of A’s, T’s, C’s, and G’s — a completely unintelligible mess to the vast majority of humanity — then I think that falls short of “unfettered access”.

Patenting Nature

There is an important difference between software patents and gene interpretation patents. While software is clearly the product of design (hence the term “software engineer”), patents on the interpretations of genes are the product of discovery. Indeed, the word “discovery” dominates 23andme’s own announcement of the patent. As that announcement noted, whether this is patentable material is the subject of hot debate. Is this patenting a “law of nature”? While using the laws of nature is fundamental to any process, patent law has held that the “laws of nature” themselves are not patentable.

I am a researcher and not a lawyer, but I’ll try to summarize my understanding of the recent “Prometheus” case referenced by 23andme’s announcement. In a unanimous decision, the Supreme Court struck down the patentability of the act of monitoring the levels of a drug metabolite (the product of the drug as the body breaks it down) and the use of this information to adjust dosage of that drug. This correlation was held to be a “law of nature”, and therefore unpatentable. Some phrases from the decision that stood out to me were these:

“But to transform an unpatentable law of nature into a patent eligible application of such a law, a patent must do more than simply state the law of nature while adding the words ‘apply it.'”

“… the claimed processes are not patentable unless they have additional features that provide practical assurance that the processes are genuine applications of those laws rather than drafting efforts designed to monopolize the correlations.”

Patenting the observed naturally-occurring traits associated with a naturally occurring genetic variant strikes me as a very similar “law of nature”. Perhaps even moreso — at least the drug itself was some level of non-natural engineering? This is far from resolved, however. The more relevant case — the “Myriad” case regarding a patent on BRCA variants and their associations with breast cancer risk — has been remanded to the Federal Circuit for reconsideration in light of the Prometheus case. I am optimistic that the act of reading and interpreting genetic variants will be held to be non-patentable, and that all my worries written here will be moot and forgotten …. but this remains to be seen.

Cashing In On Crowdsourcing?

The discoveries made by 23andme have come from their “23andWe” program — a crowdsourcing of scientific research. A recent Nature Reviews Genetics article describes such programs as “participant centered initiatives” — “tools, programs and projects that empower participants to engage in the research process”. Crowdsourcing is a powerful tool to rapidly meet a goal, and an exciting consequence of the internet’s transformational facilitation of connecting and communicating. But it holds some darker questions: to what extent does such a program exist to benefit the participant — and to what extent is the participant used as a resource to benefit the organization? Although the lines might be fuzzy to draw, the ownership and profit from user-generated data has become a clear motivation for companies (c.f. Facebook).

The Personal Genome Project has a lot of overlap with 23andWe in style. We want to collect similar information from participants — we ask people (if they are willing) to share information regarding their health and traits, as well as genome data. But there is also a key difference between the two projects: we do not hold this data privately for our own research. We release the data publicly for all others to see, and this is something we are uniquely able to do due to our open consent process. We want everyone — including our participants — to have as much access to the data as we do, and the same potential to make interesting discoveries.

As such, I see Personal Genome Project participants as very much our “peers” in this research endeavor. For this reason I prefer to use the phrase “peer production” rather than “crowdsourcing” to describe some aspects of our work (a term that can also be applied to projects like Linux and Wikipedia): not merely a project that solicits participant contributions, but one that genuinely shares those contributions as freely as possible.