CS109: Statistics and Probability for Computer Science Majors

The last two terms, I took programming classes.  During spring term, I took a ‘theory’ class: it’s still about computer science, but the focus is not on how to program, but on learning the ideas that will lead to better programs later.  This class was an introductory probability/statistics course that was focused on computer science, so while we would learn the same general probabilistic principles that they teach in other probability courses, we would also go over the applications to programming. 

My Experience (3)

I had a good time in the course.  The professor was Mehran Sahami, my computer science advisor.  I took the class because he was teaching it.  I knew, because of my experiences with him in my first programming course, that he is a very charismatic and intelligent lecturer, and probability and machine learning are some of his favorite subjects.  If he wasn’t teaching it, I probably would have taken the next programming class, CS107 (I’ll be taking it this fall).

One reason I liked the course was that he just made it fun.  After the midterm course evaluations, one of the complaints that he got was that people couldn’t see where he was pointing when he would point to one of his powerpoint slides.  As a result of this, for the rest of the term he used a lightsaber that he was part of his Halloween costume one year when he worked at Google as a pointer.  At times, it would spontaneously turn on with a bout of light and a “vwoom!” before he would turn it off again. 

He also made the course material fun.  Because probability as a science was developed because of casinos and the gambling industry, he would occasionally bring gambles (I like to think of them as applied probability) into the classroom.  For instance, there’s the Monte Hall problem.  There are three doors (in our case, envelopes).  Behind one door is a prize; behind two doors is nothing.  One person picks a door.  Then, the game show host opens one of the door and reveals that nothing is behind it.  Should the person playing the game switch?  Yes (intuitively, because the game show host knows which door is the prize and cannot pick that door).  So, Mehran has three envelopes (one of which has cash in it), picks a volunteer from the class, opens one of them, and asks if they want to switch.  He actually repeated this during the admitted students weekend (all of the people who got accepted to Stanford can come before they decided where to go to college and sit in on classes, among other things), and the volunteer was a prospective frosh.  After the frosh opened their envelope (and got $20), Mehran revealed to the class the secret: all of the envelopes besides the one he revealed had cash in them.  The lesson?  “Sometimes, we make our own probability.”

It was also nice to take another math class.  I hadn’t taken a math class for a year, and even though CS109 was a computer science class, it was almost entirely math with computer science applications.  It was a little bit hard because it was the most rigorous math class I have ever taken: even when I took linear algebra at the University of Oregon, the pace was nowhere near as fast as in CS109.  As a result, I had to change the way I took notes (in previous math classes, there was time to do each example during the lecture, whereas this class had more than 100 people in it, so that would have been impossible, so I had to start taking notes on the general ideas rather than the particular examples), and I actually had to study my notes and the PDFs from the lectures after class.  I didn’t fully adapt until halfway through the course, so I got an 80% on the midterm and didn’t fully learn the material from the first half, but once I changed my study habits, I was able to keep up, and because the second half of the course leveraged the material from the first half, I had a good grasp of everything by the end.

Some Topics of Interest (1)

Utility: People have a sometimes-logical “utility function” that they use to make decisions.  In other words, their values.  For the sake of concreteness, let’s talk about money.  Most people don’t value every dollar equally.  In their utility functions, the value of money exponentially decays.  That is, if you had a choice between a 100% chance of getting $1 million versus a 10% chance of getting $10 million, most people prefer the certain million, and they would still prefer the certain million to a 10% chance of $20 million because the added value of $19 million isn’t worth the risk.  In other words, at high dollar amounts, people are risk adverse.  At low amounts, though, many people are risk preferring.  Many people are fine with a losing gamble if it’s only a few dollars because of the fun of gambling.

Breaking Vegas: As someone who has never read any “Get Rich Quick in Vegas!” books, I was interested to learn that there are certain algorithms that will actually result in you winning.  For instance, bet red on roulette.  Start out with one dollar.  If you win, quit (or start over with one dollar).  If you lose, double your bet.  As long as there are only a few dummy slots on roulette (ie, in a casino, there are 18 red, 18 black, and 2 dummies.  Or something), you can expect to have positive winnings.  Of course, the next part of the lecture was entitled “Vegas Breaks You.”  That algorithm relies on infinite credit, no maximum bet, and the casino not kicking you out.

The Layout of the Course (1)

In a general probability course, there is a type of problem that statisticians classify as a “coupon collecting” problem.  For instance, imagine that there are 10 different types of coupons.  How many coupons will you expect to collect in order to get at least one of every type?  If you collect 30 coupons, what is the probability that you have one of every type?  How does that probability change based on the number of types?  What if you only care about getting half of the types? 

In CS109, the professor briefly explained what “coupon collecting” was, and then he showed how it was related to computer science.  One way that computers store data is in a “hash table.”  In a hash table, you tell the computer to create a certain number of “buckets,” and you divide your data between those buckets.  This is a fairly efficient way to store (some types of) data because the process by which data is allocated to a certain bucket is randomized using a “hash” function, so each bucket will end up with about as much data as every other bucket. 

Analyzing a hash table is the same as analyzing a coupon collecting problem.  Each hash table bucket is like a coupon type.  It is inefficient if a hash table has a bunch of data in one bucket and no data in another bucket, so a computer scientist would want to compare the probability of having under or over utilized buckets based on the number of buckets – how does changing the number of coupon types affect the probability?

Because it was an introductory probability course, we had to do a basic overview of a bunch of topics (different types of probability distributions such as the normal distribution or the exponential distrubion, combinatorics, conditional probability, independence, joint probabilities, expected values, variance, confidence intervals, and different ways of predicting new data), so in much of the course, it seemed like each different lecture was starting anew with a different topic.  Each topic would build on previous topics (what is a given probability distrubiton would turn into what is the expectation and variance of a given probability distribution, which would turn into a way to derive the expectation and variance of a given distribution), but it still felt a little atomistic.

Towards the end, however, the course took a greater programming focus, and the course really came together.  The last unit was on machine learning, and we learned one way, for instance, of taking some emails and teaching a computer to classify them as spam or not spam (or taking risk factors and classify someone as having heart disease, or taking voting patterns and predict political party).  Rather than just having computer related math problems, we actually made a program that would take in this data and learn.  I guess I just learn best by doing, because after that unit, I felt like I had a really good grasp on how each of the topics related to each other.

The Final (1)

Finals week was from Thursday, June 4 to Wednesday, June 10.  I had one final before finals week, two finals on June 4, and this final on the very last slot for finals on June 10.  In other words, I had about five days with nothing academic but studying for my CS109 final. 

While I did participate in some end of year activities, I got a lot of studying in, and I ended up with an A in the final and an A in the course. 

Mehran (3)

I’ve been continuing my relationship with Mehran outside of the classroom.  Despite the hoards of people clamoring for his office hours, I have managed to secure a fairly consistent spot.  He’s a pretty interesting person.  Interestingly enough, he and my dad had one of the same summer jobs.  Bonus points if you can guess!

Because his office hours are so busy, it’s also interesting to hear some of the other conversations.  That is, Mehran has open office hours, so if two people want to talk at the same time, they’ll both be in the room at the same time asking questions.  It has actually been kind of helpful: there are a fair number of grad students who come in to talk to Mehran, so I’ve been getting more of an idea of what Stanford’s graduate CS programs look like. 

And, since Mehran is now an advisor and friend, there’s also the life advice.  For instance, my SLE section leader recommended that I read The Plague after a term of literary discussions with me, and Mehran mentioned that The Plague was his favorite book for a long while when I asked him if he had any books to recommend.  In other words, we work well together.

And to think – when I was coming in to Stanford, during New Student Orientation, I wasn’t even planning on taking a CS class!