Analyzing ~425 days of Hacker News posts with standard shell commands

(About) 425 days ago (at the time of this writing) I started scraping Hacker News via its shiny new API. And then I promptly forgot about it. That is, until I noticed my cronjob had been throwing errors constantly for a few weeks:

Traceback (most recent call last):
  File "/home/dummy/projects/hn-cron/hn.py", line 62, in <module>
    main()
  File "/home/dummy/projects/hn-cron/hn.py", line 53, in main
    log_line = str(details['id']) + "\t" + details['title'] + "\t" + details['url'] + "\t" + str(details['score']) + "\n"
KeyError: 'url'

Instead of fixing anything, I just commented out the cronjob. But now I feel somewhat obligated to do at least a rudimentary analysis of this data. In keeping with my extreme negligence/laziness throughout this project, I hacked together a few bash commands to do just that.

A few notes about this data, and the (in)accuracy thereof:

  1. The script ran once every 40 minutes, collecting the 30 most popular stories (i.e. those on the front page), and adding them to the list if they were new
  2. I only know I started roughly 425 days ago because the first link in log.txt was this one right here (Who needs timestamps? I have IDs!)
  3. A not-insignificant percent (probably ~10%) of the time, the script would fail because the stupid(, stupid, stupid) Python 2 script I banged out in 10 minutes didn’t know how to handle Unicode characters properly (oops).
  4. I saved everything to a flat file with tab delineation. I probably should’ve used something else, but I didn’t, so here we are.
  5. I only saved the score from the first time a story was found, so theoretically any given post only had an arbitrary 40 minute window to accumulate points, at most. This is probably not strictly true for a number of reasons, but I’m going to pretend it is.
  6. These bash commands grew organically (often with much help from StackOverflow), so they made sense to me at the time, but YMMV
  7. The data is probably inaccurate in a million small ways, but overall, it’s at least worth poking at.

Okay, let’s get down to it!

Read More

Constructing an XSS vector, using no letters

At the risk of spoiling a somewhat-well-known XSS game, I want to share an XSS vector I had never thought of before it forced me to. The premise of this level was, essentially, that you couldn’t use any letters whatsoever in the attack vector, and you had to call alert(1).

So, without further ado, here it is:

""[(!1+"")[3]+(!0+"")[2]+(''+{})[2]][(''+{})[5]+(''+{})[1]+((""[(!1+"")[3]+(!0+"")[2]+(''+{})[2]])+"")[2]+(!1+'')[3]+(!0+'')[0]+(!0+'')[1]+(!0+'')[2]+(''+{})[5]+(!0+'')[0]+(''+{})[1]+(!0+'')[1]](((!1+"")[1]+(!1+"")[2]+(!0+"")[3]+(!0+"")[1]+(!0+"")[0])+"(1)")()

What a mess, right?! What the hell are we doing here? Let’s take it piece-by-piece.

Read More

dot-man

I recently hacked together a little 300-line bash script to manage my dotfiles called dot-man. Basically, it will let you manage your dotfiles in a git repository, and you can run it every so often to keep your local / remote dotfiles up to date.

Install is as simple as:

git clone git@github.com:cneill/dot-man.git
OR
git clone https://github.com/cneill/dot-man.git

Let me know what you think! You can find me on Twitter @ccneill.

A tale of lost entropy

Recently, while looking at a JavaScript function intended to generate a cryptographically-secure random IV to be used in AES-GCM, I noticed something interesting which I immediately suspected was not unique to this project. Sure enough, Matt, my awesome colleague, sent me a link to a how-to article describing the process of generating random values in Node.js that included the exact same quirk.

Here is their example (with minor edits so as not to call out the author of that how-to post too explicitly):

Do you notice anything fishy?

Read More

Announcing DefectDojo v1.0.2!

I’m happy to announce the latest version of a project that the Security Engineering team at Rackspace has been working on: DefectDojo! DefectDojo is an open source defect tracking system that was created by our team to keep up with security engagements, but it can be useful for tracking any type of application testing. It supports functionality like Finding templates, PDF report generation, metrics graphs, charts, and some self-service tools for doing port scans, for example.

Checking out DefectDojo

A view of the DefectDojo dashboard

A view of the DefectDojo dashboard

To get the latest version, you can download a zip file or view the source on Github. Want to check out a demo before installing it on your machine? We have you covered.

Login as admin:

Login as product owner / non-staff user:

Read More

Using GNTP for remote notifications? I wouldn’t

Earlier today I wanted to explore using Growl / GNTP to listen for notifications from a remote server. I checked out the Growl developer bindings page, found the Python implementation, and started working on a simple app to send me notifications about various things.

I was planning on running this on my server so I could also interface with Twilio and accept callbacks, without having to expose a webserver on my local machine to the internet. To do this, I was going to accept remote notifications in Growl using a password. I realized pretty quickly this was a worse idea.

I started poking around in the source code, and found that the password is hashed using MD5 by default. In fact, it’s quite a pain to change from the default, since there is no configuration option to change the algorithm within the basic helper methods that are actually documented. This appears to be the case for all the other language bindings as well. This isn’t necessarily the end of the world, but it’s definitely not great.

More poking revealed that the packet contents are not encrypted with the password, but the password is merely used to determine whether the listening Growl instance will accept notifications from the remote source. A notification will actually come across the wire looking like this:

19:32:20.244428 IP (tos 0x0, ttl 64, id 53700, offset 0, flags [DF], proto TCP (6), length 13716, bad cksum 0 (->359d)!)
 localhost.60465 > localhost.23053: Flags [P.], cksum 0x3389 (incorrect -> 0xe50a), seq 1:13665, ack 1, win 12759, options [nop,nop,TS val 266111329 ecr 266111329], length 13664
x.L$......1.3........1Z
...a...aGNTP/1.0 NOTIFY NONE MD5:B80803CFA6C2F303266DC99501ED837D.D89A5B677CDA639FDF3305D233FA0487
Application-Name: poke
Origin-Software-Name: gntp.py
Notification-Sticky: True
Notification-Name: Timer
Notification-Text: Derp?
Origin-Platform-Version: ...
Origin-Software-Version: ...
Origin-Machine-Name: ...
Notification-Icon: x-growl-resource://fcaeca33ea9ee6fa902f79aa47f980f0
Notification-Title: Timer Alert
Origin-Platform-Name: Darwin

...

As you can see, the name of the application, the name of the notification, the actual contents of that notification, and the title of the notification are easily readable (in blue).

What about that weird string starting with “MD5” (in red)?

The meat of the password hashing algorithm can be seen here. Basically, they use a hash of the system’s time as a salt (which they call a “seed”), and include it with messages sent to the server (D89A…0487 above). The other component of the string is a hash of the concatenation of the password and the salt’s hash (B808…837D above).

To see if it was really as easy as it appeared to crack these hashes, I wrote a quick script called Growl Crack that will first bruteforce the “seed” (timestamp/salt), then the “secret” (password + salt). Obviously the difficulty of cracking the password is dependent on its complexity, but the seed is usually cracked pretty much instantly.

In short, if you’re using Growl remotely, you should probably stop unless you want all your notifications being read, or want to expose your password for easy cracking to anyone listening to your communications.

25 Node.js Nuggets

Node.js

My last Nuggets post, “50 Linux Resources for Developers” was pretty well-received, so I figured I’d try to do the same thing I did there for Node.js. Hopefully something here gives you some inspiration to make the next great Javascript app. It’s not meant to be an all-inclusive guide to learning Node, but more of a look at my journey with Node and some things I’ve found useful which you might find useful as well.

For a little background, here’s the synopsis of Node.js from their website:

Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.

Read More

50 Linux Resources For Developers

I try to always bookmark interesting things I find as I bumble around the internet. I’ve collected thousands of bookmarks over the years, and I want to share some of the cool stuff I’ve found. I call these Nuggets.

Today, I want to bring you a list of links that might help you on your path to understanding and appreciating Linux. I don’t consider myself some wizened Linux guru, but I have spent many, many hours looking for guides and tools to make my life easier while using it.

If you’ve ever struggled to find information about Linux basics, or you just want to polish up your skills, there’s probably something here for you. This guide will be particularly focused on developers, but there will be information here that’s applicable to many other Linux users. Some of it is specific to Ubuntu users, but much of it is applicable across the board.

I’ve by no means covered everything, so comment or tweet to me if you have any you think I should include.

Read More

Am I evil, or is killing patents just plain fun?

The other day I re-discovered this post by Joel Spolsky on Hacker News, entitled “Victory Lap for Ask Patents.” I saw it when he originally posted it a while back, but it didn’t resonate with me at the time.

But re-reading it today, I realized how great an opportunity we, as software developers, have to force patent reform by actively contributing to this project. Ask Patents, if you haven’t heard of it, is a StackExchange site where you can ask questions about patents, or, in my case, respond to requests for prior art that invalidate an overly-broad patent. In my case, I focus on software patents.

I can hear what you’re thinking.

That sounds fucking boring

I know, right? But actually, I’ve found it to be quite a fun little puzzle to decrypt the legalese used by patent lawyers to try to get away with ridiculous patents. Here’s an example patent claim:

“A method comprising:

  1. generating, using a processor, time-based event boundaries detected in a plurality of images;
  2. computing inter-event durations;
  3. grouping events into clusters based on the inter-event durations; and
  4. validating, using a rule-based system, that each event belongs to an associated cluster based on event level content based features.”

Short version: a photo album that groups your photos by the time they were taken.

How hard do you think it was to find examples of prior art? (Hint: it wasn’t)

If you’re still wondering what I’m going on about, then perhaps a different motivator is called for. If you think this shit is boring and pedantic, how do you think someone at the USPTO feels when they have to read it day in and day out, and formally parse and research it to decide whether it should stand?

Let me put this another way – wouldn’t you rather those working for the USPTO were spending their time on legitimate patents? On getting a bunch of those “patent pending” labels off of everything we buy? On crippling the patent trolls, who raise the cost of doing business for anyone who gets successful enough to trespass on one of their dubious “works of genius”?

Well, you can help. Every minute you save the USPTO is another minute they can spend doing things that actually matter. I’m going to start doing it every day. I’ve already done 6 in the last hour. Time will tell whether my contributions actually do anything, but I suspect that, given how unglamorous the work is and how few people generally comment, even a little bit will be appreciated.

So how does this lead to patent reform? My hope is that the community can shred a lot of these useless patents before they take any brain cycles away from a qualified researcher. And if it happens enough, it will start to become clear to everyone involved that the vast majority of software patents are bullshit.

It might sound like a bad, or at least contradictory, idea coming from a programmer, but I genuinely hope (and have some reasons to believe) software patents go the way of the dodo in the next decade.

In fact, I would go so far as to wager the following. I will bet, on pain of writing an entire blog post dedicated to why patents are good, that no one reading this article can find a software patent granted in the last year that actually should exist. The requirements for a good patent are:

  1. Novelty
  2. Non-obviousness

Some software patents may technically be novel, but I’ve yet to find one that I thought was non-obvious. Maybe someone will be able to enlighten me.

Want to help some more? Take it to Twitter with the hashtag #patentreform!

So I want to learn web development. Now what?

You might want to grab a cup of coffee

My last article about the importance of getting started on your programming education is my most-read article on Medium so far. Like anything in my life, my writing is an experiment. When I see as many people getting excited about programming as I have because of this, it excites me too, and tells me I’ve hit a nerve.

I think there’s a little more to the story that I didn’t fully flush out. So here, I want to set you on the path to writing your first line of code as quickly as possible. I don’t want to delude you: there is no getting over the fact that programming is an iterative process. I love this article, describing the process of programming through the allegory of cooking. The author describes the frustration of “just getting started” when there isn’t a clear picture of what “getting started” means. I can’t just yell at you to “GO FORTH AND CODE” without at least helping you understand what you need in order to do that.

Read More