Computers

Google Analytics Nightmares

Every so often, I think I’m about to escape, but I get trapped again.

Let’s start from the beginning…

A couple of years ago I was brand new and fresh out of quitting teaching (this post is about a figurative IT nightmare, I still have literal teaching nightmares).  My job was performance testing the applications at my company.  In order to accurately model the load, I had an obvious question of “What does the usage in production look like?”  The answer was “Nobody knows.”  The company is neither small enough nor new enough for that answer to make any sense, but I had to dig anyway “How can I find out?  Can I get some logs or something?”  I was then told to request access to the Google Analytics account, which supposedly held the answers.

I logged in a peaked around to find that it was in a state of serious neglect.  There were no filters in place.  Traffic was randomly segmented into different profiles using different tracking codes not for any logical reason but they just added the JavaScript at different times and didn’t know what they were doing.  Most of the traffic was recorded as the non-helpful “(other).”  I’d later discover that the JavaScript was even worse.  There were pages with multiple, sequential setVar() calls.  It appeared in some places that they were also trying to use setVar to differentiate multiple views on a single page.

I needed some real numbers and this seemed to be the only way to get them, so I spent a day or two reading some Google Analytics documentation.  I found some things that I thought would clean up the data a lot so I started shopping them around.  Every person I talked to told me to talk to somebody else.  Eventually I found a person who worked with the usability group.  She wanted numbers as well.  She didn’t have any experience as a Google Analytics administrator, but she had admin access to the account.  Through her I was able to get some simple filters in place (everything to lowercase, removing the query strings, identifying default pages being reported twice as the directory root).

As things began to clear up, more problems would become apparent.  Eventually my admin pal got tired of me telling her to add filters, so I was granted access so that I could add them myself.  Certain problems took longer to figure out than the initial issues, so I did more reading.  You probably see my mistake already, but I was new, I didn’t.  I got some help from Justin Cutroni by posing some questions in the comments on his site.  I read Brian Clifton‘s book.  I read Avinash Kaushik‘s book.  I also got my manager to send me to a Google Analytics Seminar for Success.  My primary work responsibility was still testing software, but tasks came around sporadically at that time, so I didn’t mind filling in the down time with the analytics stuff.  I was a math nerd before I was a computer nerd, so the wall of statistics was interesting.

My Google Analytics pal from the usability team and I went through and cleaned up the data as much as we could.  Many new profiles were created, lots of filters were added.  We pushed the development teams to correct the JavaScript.  We figured out what a deceptive little goon setVar really was.  Seriously, what the hell, Google?  It caused interactions to be recorded which completely subverted the filtering on different profiles.  The visit numbers would show thousands of visits and no page views.  Through testing we figured this out, but the documentation provided no good explanation about the side effects.

Then people started coming to us with questions about obtaining more data.  How should they implement the JavaScript?  Could we filter X?  On the technical side, this was perfectly easy.  I was capable of telling anyone how they could obtain (if it could be done) the data they wanted using Google Analytics.  The problem for me was/is that I am at the complete bottom of the org chart here.  I don’t make business decisions with the data.  I know people do.  I’ve heard them talk.  I cringe when I hear GA numbers being thrown around.  Numbers that are clearly wrong.  Numbers that don’t mean what they think they mean.  Since I am not in touch with how the business wants to use these numbers, or what data is relevant to them, it’s difficult to answer GA questions which will determine the data available for everyone.  It’s tricky because there’s no going back.  The data cannot be parsed again or filtered differently.  Also, our site spans several applications and multiple domains and sub domains.  One development team’s poor handling is capable of wrecking the numbers for the others.

I’ve been trying to express this.  I’m a tester.  I’m not even a performance tester anymore.  I have no need for the numbers myself.  I have no direct line of communication to any important decision making people, but I control the data.  The numbers they see are the ones I configured it for them to see.  I wrote a whole bunch of regular expressions that move a bunch of strings around.  Nobody else verified them.  Nobody else knows what they do.  I documented them and pointed people to them.  I tried to do presentations to explain what I’d learned so that someone else would be capable of maintaining the account.  At the very least they should double check the things that I’ve done.  The result of the presentation was pretty much “It sounds like you know what you’re talking about.  How about you just keep it?”  D’oh!

As my new testing responsibilities have been more consistent than my old ones, I’ve had no time for analytics.  I tried to bring it up, but I was told that they’d be switching to Coremetrics because GA doesn’t do what they want (like they have any clue what GA does).  Allegedly there exists someone in the organization qualified and willing to be the steward of Coremetrics.  Months, then years go by and people keep dumping more and more into Google Analytics.  Coremetrics is still not a significant part of their data collection.  I peak in every now and then and see the once tidy organization, falling back into the state of chaos which existed back when I first found it.  In one place they are using it as a security audit log.  I hope your eyes didn’t roll right out of your head when you read that.  Mine almost did.  The head of product development is ok with that though.  I spoke with him directly about my data concerns.  They’re switching to Coremetrics… I don’t need to worry about it.

Recently I got a new manager.  On my behalf he found a new home for Google Analytics maintenance.  Supposedly, I’ll finally be handing that responsibility to a more suitable person in the near future.  As though they sensed it coming, one of the dev teams I’m supposed to do testing for has requested a bunch of changes to the filtering in GA.  I haven’t had any time to do the knowledge transfer, so I’ll have to do it.  Just when I thought I was one step from done, the end moves back from me.

Nobody wants to own it.  Everyone wants the numbers.  Nobody knows what they mean.  I can’t believe this is a real business and not The Office, Office Space or Dilbert.


FMK vs Selenium

Motivation:

Recently I decided that it was time that I give Selenium RC a shot.  I was attracted to the idea of recording scripts with Firefox and then working with them in a language of my choosing.  All the cool kids at Defcon seem to think Python is the answer to everything, but since that’s still on my ToLearn list, I figured I’d default to the more familiar Java where I’m already comfortable.  The option to do that was a big selling point for me.  Since I’m looking at a JavaScript (using Strophe.js) jabber client, I was also happy to let Firefox deal with that.  Another coworker is developing LoadRunner scripts for performance tests on the ejabberd server.  Essentially his scripts must duplicate the basic functionality of the JavaScript code in LoadRunner’s C.  Gross.  Initially I thought I might borrow his scripts and tweak them for my own testing, but I realized that I could jump well ahead of him if the Selenium idea panned out.  No redoing the JavaScript logic.  No C code.  Win.

A few days of puzzled stares at Eclipse and a hundred Google searches later, I think I’m still winning.  It was not quite the effortless leap I had envisioned however.  What follows is my adventure in getting Selenium to a point where I could start producing some actual tests.  I found answers to many of these problems all over the place.  When I can recall where I found them, I will link to the page, but I viewed countless pages wandering around and pleading to the Googles for help and don’t plan to retrace all of that.  Hopefully someone else on a similar quest comes across this post and explains to me how I’m a total idiot, where to find these answers clearly written in one place, and how I could’ve accomplished the same thing in a smarter, easier way.  If not that, perhaps some other lost chump like myself will be grateful to find a bunch of solutions in one place.

Problem:

The problems started early.  I recorded a simple script.  I converted it to Java (please don’t bother to tell me that this was my problem).  I tried to run it with Selenium RC.  At this point I found that Firefox opened up in a strange configuration, barfed a bunch of popups at me and then the test killed itself.  Some poking around revealed that *chrome as the browser mode was at fault.  Certain extensions were being loaded that just didn’t want to play nice and the profile would not remember the login for our company’s Blue Coat Proxy.

Solution:

Selenium allows for a *custom browser mode.  This allows you to get your browser configured manually.  http://www.borngeek.com/firefox/profile-tutorial/ me helped get my Firefox profile straightened out.  I created a profile with minimal extensions (Firebug and McAfee’s SiteAdvisor were particularly rude to my Selenium tests).  The profile (named selenium) is also configured to use the Selenium RC server as a proxy (localhost:4444).  I made sure to disable warnings, popup blocking etc as these features may interfere with the test scripts.  Since it’s a separate profile, I can easily return to my normal, more securely configured settings by restarting Firefox in it’s regular profile.  The browser string in my Selenium setUp call now looks like: “*custom C:\\Program Files\\Mozilla Firefox\\firefox.exe -P selenium”

Problem:

Having defeated the popups, extensions and proxy settings I hit some SSL certificate issues.

Solution:

First, I added the cybervillainsCA certificate to Firefox as a trusted authority (only for the selenium profile!).  That didn’t quite do the trick.  It turns out the Selenium RC server doesn’t care for the certs in our test environment.  Some more Googling revealed that restarting the selenium server with -trustAllSSLCertificates.

At this point I’m starting the server with

java -jar selenium-server.jar -trustAllSSLCertificates

That’s actually in a .bat file which has a shortcut in my Quick Launch bar so that it’s started with a single click.

Problem:

I figured by this point I’d done the configuring that I’d need to and now the coding party would start.  When I recorded the script I had done a simple login, navigated to a second page, logged in to the test chat client and then stopped.  This failed and I noticed that the script had not recorded the second open, so it was looking for the second login form on the wrong page.  That was weird I thought, but I added it in.  The test still broke.  Here I learned that I was not allowed to start on one domain and then open another.  I had logged in at sub1.testenvironment and the second page was at sub2.testenvironment.  The problem relates to cross domain restrictions browsers enforce on JavaScript.  Selenium controls the actions with JavaScript so it has a meltdown here.  It turns out that the *chrome browser mode operates in a way that avoids this restriction.  Here “chrome” is a Firefox mode, not a reference to Google’s browser.  That’s a bitch when you’re searching to understand what it is and how it works.  I already found back at the beginning that the *chrome mode didn’t work for me, so then what?

Solution:

Some google wandering turned up this very useful document: Selenium Tutorial.  It indicates that this problem can be addressed by running the Selenium server in proxyInjectionMode.  Ok, fine.  Now I’m starting the server with:

java -jar selenium-server.jar -proxyInjectionMode -trustAllSSLCertificates

Problem:

That victory celebration was cut very short as I learned that this mode was broken.  Before making that change I was able to have my script open the browser, load a page, login, load a page and then fail on a cross domain navigation attempt.  After the change, the whole test failed right at the start.  A step back!  Bummer.  ”onXhrStateChange.bind is not a function on session”  WTF does that mean?  A couple resources essentially said “it means you’re using proxyInjectionMode, don’t do that.”  Well shit, I kinda have to don’t I?

Solution:

http://groups.google.com/group/selenium-users/browse_thread/thread/bc7030cd44730d4f

A clue!  Put “true” as an extra argument!  There’s a slight problem with this however…  Java doesn’t really care for additional arguments.  You can’t go around forcing extra values on methods.  Previously I was just using the provided selenium-java-client-driver.jar.  Thankfully, the source jar is also there, so digging into that we get down to DefaultSelenium.java.  There’s an array that I can put “true” into.  Let’s see if it works:

public void open(String url) {
commandProcessor.doCommand(“open”, new String[] {url,});
}
becomes

public void open(String url) { commandProcessor.doCommand(“open”, new String[] {url, “true”}); }

Booya!  Cross domain problem is gone for me.

Problem:

Now the script is logging in on sub1.testenvironment, navigating to sub2.testenvironment and apparently loading the chat client page, but the script cannot find the elements on the page.  Actually worse than that, it claims it can’t find the window.  Somehow after the second selenium.open() call, it completely loses track of the window and can’t do anything else.

Solution:

Rather than open() for the chat test page, I used openWindow() which allows me to name the new window which Selenium is now able to find.  This seems to be a rather inelegant solution, to just start popping up new windows, but the second should be as far as I go so it’s not a huge deal.

Problem:

IT IS STILL NOT SUCCESSFULLY SUBMITTING VALUES ON THE CHAT PAGE!!!!!

Solution:

Timing.  You can’t take anything for granted about Selenium’s timing.  While to my eyes, I’d see the browser window load with all of the necessary fields, Selenium would claim that they were not there.  I used waitForPopUp() after openWindow(), but that does not ensure that the JavaScript client has successfully executed it’s initialization.  Some additional code was added to wait for Selenium to find the element before trying to type in it.

System.out.print(“Waiting on login form”);

while(!browser.isElementPresent(“username”))

System.out.print(“.”);

browser.type(“username”, “testUser2″);

Sometimes it spins for awhile, other times it goes almost immediately, but it always succeeds now.  After all of that I don’t have any actual tests to show for it, but in my naive view, I’ve finally cleared the hard part and can actually throw in the test cases from here.

1 Comment more...

Cyberwar?

Today I came across a couple of conflicting posts about the word “Cyberwar” and whether they accurately describe the state of things.  First was Bruce Schneier’s opinion that “The Threat of Cyberwar Has Been Grossly Exaggerated.”  He concedes that he lost a debate on this very subject, but points out that the debate, and ongoing discussion has more to do with a disagreement on the definition than a disagreement on the current threats in cyberspace.  Sometime later in the day, I was promised by Matt Olney Yes, Virginia, There is Cyberwar.

I agree with Schneier on this one.  It’s not that I don’t see potential for something like a “Cyberwar” but to suggest that the attacks happening now constitute a war really stretches the concept of war.

Olney states: “It exists as certainly as espionage, defacing and cybercrime exist, and you know that they abound and are a threat.”

I absolutely agree that espionage, defacing and cybercrime exist, but I don’t agree that their existence is anywhere near sufficient to constitute a war.  The non cyber equivalents of those things do not indicate war, so why should they in cyberspace?  The United States and other nations have spies in other countries.  Espionage is a concern for the security of every nation, but that doesn’t mean everyone is at war.  Valuable property is stolen in the real world, kids have defaced almost any hard physical surface you can think of with spray paint, markers, or carvings and yet that has never been classed as a war.

“But we know that networks can be penetrated, servers can be compromised and we even know that generators can be destroyed simply by instructions from control servers.  We also know that there are those who would seek to harm us.  So yes, Virginia, there is cyberwar.”

We know that houses and offices can be burglarized, sensitive documents and property can be stolen and we even know that power lines can be taken down by a drunk driving his car into a pole.  We know there are mean people out there, so we’re at war?  Nah.  I’m not buying it.

Certainly security is an issue.  There are definitely threats from many different sources that need to be addressed.  If you want to say “Cyberwar” like we talk about the “War on Drugs”,  “War on Obesity” or “Battle of the Sexes” then fine, but when you degrade the language into Newspeak, then we’ll all be worse off.


#1 Fraud

I’ve been wanting to write something Information Security related since that’s what I do all day and that’s currently a major interest of mine. I can’t talk about what I find at work and I don’t have much to say on more general topics, but the Gregory Evans debacle is cracking me up. Articles have been coming up frequently in the security related news feeds that I read. This article from The Register gathered the best quotes.

Basically, the guy is a convicted felon who is going around posing as a software security expert (I mean, “The World’s #1 Hacker”).  He claims that everyone calling him out on his bullshit is due to racism, but his charade is pretty obvious to anyone who pays any attention.  Ben Rothke’s analysis on the plagiarism in Evans’ book is enough to devalue Evans as a credible expert.  In an attempt to boost his hacker cred, he also made up some stuff about his relationship with famous hacker (and felon) Kevin Mitnick.

He went to prison for fraud.  He is still a con artist.  Legal problems are likely to resurface in his life, but before they put him away again, let’s all spend some time laughing at him.

First, tell us about Ligatt Security, Greg.

http://www.youtube.com/watch?v=Wy9LELlwbZs

Oh dear.

Hey Greg, you know how I know you’re full of shit?  You’re on FOX & Friends:

http://www.youtube.com/watch?v=QpOuACC3g4o&NR=1

China is our biggest threat.  India is our biggest threat.  Got it.  It’s funny when the anchor tells him “you do your homework” and sounds impressed.  He managed to poorly articulate some things he’s read on the internet.  Impressive Greg!

As an investor, I’m concerned Ligatt stock is only worth 0.0002 cents per share.  Can you make me feel better?

http://www.youtube.com/watch#!v=uPsB9756HIA

What could he be doing better?  Nothing!  He’s already perfection!  He doesn’t sound like a hustler at all!

Some uplifting quotes to take away when you’re having a bad day:

“I got the news this morning on my way to work, got here late because I caused an accident when I was reading my email and I saw it and I started screaming and I swerved and then this tractor trailer fell over and hit this bus full nuns and it was just a mess, but I took off real quick because I got a fast car. They didn’t know it was me, so I’m here doing this video blog.”

“You could have bought Google’s years ago. Just imagine if you bought Google’s at a penny or less than a penny how trillionaire you’d be today. I’m trying to give you that same vision.”

Incipiat Turba readers, your asses are spoiled.  I’m done bustin’ my butt for now.

– Blarg!


I can’t make up my mind

I’ve been trying these new CMS dealies because I wanted to make it easier to add content rather than spending all my time cleaning up the PHP mess that I created on the old site. Now I finally have that all set to go, so of course I then decided to spend the last few hours working on the old stuff. The administration for Lost and Found items is now easier than it’s ever been. I don’t care that much about the rest of the old site, as we’re clearly not motivated to do that much with the other things, but the Lost and Found has long been one of my favorite parts. It’s so simple. Plus the search strings provide a starting point, since I’m too dumb to come up with my own ideas. Agitator, if you ever have any free time, you should totally check for some entries on Google Analytics and add some things. It’ll be totally fun, I swear.

I even added a new one myself. Who knew?
sadaam hussein banjo


Malware and Analytics

So I removed Joomla because of all of the security vulnerabilities I was seeing reported… and installed WordPress which is apparently getting raped by malware lately.  GoDaddy said it was outdated installations that caused the problems, I’ve seen others suggest that’s not the case.  I’m not sure what the real deal is right now, but if you get redirected from my site to some other site, don’t click anything or download anything.

(continue reading…)


Joomla scares me

Following exploit-db and bugtraq, I see a ton of Joomla entries every day.  I decided to give that up and try something else.  Also, the integration with Google Analytics sucked.


Copyright © 1996-2010 Freaky Metal Words. All rights reserved.
Jarrah theme by Templates Next | Powered by WordPress