I'm over a month in. Here are a few thoughts about my further adventures in using Standards Based Grading in AP Stats.

I've been grading my tests on rubric scoring for years. (AP Stats folks would recognize my EPI = 2, etc...) It is so awesome to NOT have to figure out how I want to convert 1-2-3-4 into 70-80-90. That is a time saver. I love it.
I have to rewrite some of my assessments so that they are similar in length and difficulty. And I'm changing my file system. Now I need files that are organized by standard. With multiple assessments for each standard. Including enough assessments so that students can come after school and take even more.
I was initially worried about AP questions that are very difficult and will produce mostly 1's and 2's. I think I'm going to call these problems "plus one" problems. After they are all graded, I'll add one to everyone's score. Hopefully that will make the assessment. However, a question like this raises a concern regarding the power law. More on that later in the post. And I what do I do with a 4 + 1 =5?
I discovered the hard way that my school website makes sub-standards under a larger standard more confusing, not less (Easy Grade Pro recommends using sub-standards to simplify student communication). So I had to ditch this plan altogether. No more sub-standards.
As I looked forward to my next "big" test (aka, a full period test), I began to fret. I couldn't figure out a way to get in all the new standards, old standards to retest and multiple choice into a one period test. Quizzes to the rescue! That next big test is over a week away. So I'm going to going to make sure and give two quizzes this week. They should take only 15 minutes each. I plan on reassessing one old standard and assessing a new standard for the first time.
Overall grades feel fine so far. I set the bar for 3.33 average for an A. That is feeling just right. Likewise with 2.75 for B's and 2.0 for C's.
My biggest mental dilemma right now is the power law. As I retest each assessment, a power curve will determine the student's current level of learning. My concern is that the free response questions vary in difficulty. So if I end with a fairly straightforward question, the power law will determine that the student level of understanding is high. If I choose a more challenging question, the scores will be lower. In theory, proficient (level 2) should be the same on every assessment. But in practice, that seems kind of tricky. And my textbook test bank has questions that add even more uncertainty. Part of me thinks that an average would be more fair, where you drop the lowest score to adjust for growth over time. But as soon as I type this, I see more flaws. For example, students can forget topics and not finish strong, but then drop that score. In short, I think I have yet to give enough assessments to see the full effect of the power law. I have tested the first topic twice. And when the power law has only two scores, the second score becomes their current score. Time will tell.

More info