Hi Elizabeth,
As you probably know, HLM stems from ANOVA which is a statistical method used to evaluate data collected using experimental research designs. Thus many examples assume a basic understanding of ANOVA and the related terminology. This is obviously not true for those of us in social sciences where experimental designs are typically impractical, impossible, or unethical.
In these experimental designs the researcher is usually most interested in the effect of the stimulus which is set at predetermined levels (e.g. no dose, 1 dose, 2 doses, etc.). These effects are then set as "fixed" because the researcher is only interested in the net effect of those stimulus dosages at those predetermined levels. The effects specified as random are those that are typically seen as confounding variables that need to be statistically controlled (age or sex for example).
Ironically, in the social sciences the situation is very often the exact opposite. Let's begin by discussing exactly what the hierarchical method accomplishes. As you know, the technique accounts for the clustering of cases within larger units. In your case it is people nested within neighborhoods. It could be observations nested within people. Or it could be students nested within schools and schools nested within districts. There are very likely unobserved factors that tend to make observations similar within each cluster. One thing that HLM accomplishes is that it accounts for this cluster-based similarity in the computation of standard errors. The point estimates don't change vs. OLS regression but the standard errors are usually larger which means that OLS regression tends to downwardly bias standard errors when the data are clustered.
But this is only the tip of the iceberg. The really great thing about HLM is that it can simultaneously estimate a separate regression equation for each cluster. Lets return to the school example for a second. It may be that the effect of being poor is different in one school than it is in another. Maybe being poor is *only* a hindrance in a school full of wealthy kids. With a variable specified as fixed you get the effect for that variable under the assumption that it is the same in all schools. When it is specified as random the effect of that variable can be different in each school. In some schools it may affect the outcome and in others it may not. It may have a big effect in one school and a small one in another. The reported effect is then an average effect across all schools weighted by the size of each cluster (the point estimate is more precise in larger clusters). I should also note here that an effect can only be specified as random when it is nested within a larger aggregation. In a 2 level model only level 1 effects can be random. In a 3 level model both level 1 and level 2 effects may be specified as random.
The next step is that this variation in the effect of student-level poverty can then be modeled as a consequence of level 2 (school level) factors (e.g. proportion of wealthy students, number of students, availability of technology/computers/internet). This is sometimes referred to as "slopes as outcomes" regression meaning that the the variation in the slope of the level 1 random effects is predicted by the model.
So you have to think critically about what level 1 variables should have homogeneous effects across all of the level 2 units; these should be specified as fixed. Those effects that you think will be different within each level 2 unit should be specified as random.
I would also add that random effects are particularly taxing on the model. Odds are that if you specify all effects as random then the model will not converge. So choose wisely. I would recommend starting with a baseline model that has no predictors. From this you can get an estimate of the amount of variation in the outcome at each level. If there is not a significant amount of variation at level 2 then a hierarchical framework isn't really even necessary. Next add in your level 1 predictors; this is a random intercept model because only the intercept is random, all effects are fixed. Note how the amount of variation at level 2 has probably gotten smaller. This is because you have accounted for compositional effects - it may be that schools are only different because they have different populations of students. Next specify your level 2 variables. Then specify some effects as random. Lastly, create some cross level interaction terms (level 1 variable specified as random effect X level 2 variable) and put them into the model and see if they account for any of the variation in the random effect. At each step you can do a chi square test or look at some other model fit criterion to see if the changes to the model are improving fit.
Lastly, I would remind you to be sensitive to how you go about mean centering your level 1 variables. Choosing group or grand mean centering will have an effect on the point estimates you observe. There's a great article by Enders and Tofighi in the June 2007 issue of Psychological Methods that covers this issue if you are not yet comfortable with it.
Caveat: I am neither statistician nor economist. Consider this a jumping off point rather than a definitive work on the subject.