My Dream App

Welcome to My Dream App!

The event where 24 finalists compete for a chance to have their dream app made into reality.

First time? Check out what this contest is about and create an account for free. If you are already a member, please login to remove this message. Thanks!

Jason Harris

ShapeShifter/Chicken of the VNC

Jason Harris has been coding up spiffiness and silliness for about ten years, working on such diverse projects as a solid-state quantum computing simulator for electron waves in GaAs semiconductors and a Monte Carlo simulator for electron transport in nanostructure devices. He also wrote insane, down-to-the-metal microcontroller assembly language code for Octofungi, a robotic sculpture. In the Mac world, he's the primary author of ShapeShifter, Mighty Mouse, ThemePark, and heads the open-source Chicken of the VNC and Paranoid Android projects. He digs mountain biking, skateboarding, art, martinis, loud music, and creating oddly euphonious phrases. He never wears shoes if he can help it and can dance like a mofo!

View Jason Harris's Comments →

Okay, as promised, here’s my quick’n'dirty Hijack Proof of Concept.

You almost certainly want to download the app to find out how it doesn’t really work well and then complain to me about it. Well, click that link. I’ve tested it on this site, forums.macnn.com, and the forums on the phpBB homepage. It only works on pages of posts, not listings of threads. There’s no way to post. If the site admin even considers changing the HTML, if they even think about it, this build will break.

In other words, it’s not in any way, shape, or form “useful”. You don’t really want to download it is what I’m saying. I know you will anyway.

You can add support for other sites by playing around in the Windows menu if you can understand my incredibly arcane string crap I elaborate on below.

—-

Codegeeks: Here’s what you want.

Basically, it loads your request into a hidden WebView so that it can get WebKit to generate a DOM for us. It hands that DOM off to something that creates a “tagSoup” our of it. Tag Soup is sort of like a giant Xpath of the entire document, but it shows sibling and parent relationships. The idea behind tagSoup was to give a really quick way of finding matches for a page that Hijack has never seen before, to identify what scraping scheme to use. More on tagSoup in a bit (hint, tagSoup sux).

Along the way, it looks to see if it can figure out through keywords what forum package the page belongs to. If it can, it’ll try to use those to figure out how to do the scraping.

Then, it tries to match a ScrapeScheme to the DOM that got loaded. ScrapeSchemes are stored in a CoreData database (binary format because the SQLLite ones don’t support beginsWith, which I needed for some reason I have forgotten). This happens inside of ScrapeSchemeManager. It tries to do this as lazily as it can, but if lazy doesn’t work, it comes down to doing string matching against the tagSoup. More on tagSoup in a bit (hint, tagSoup sux).

If it can find a match, it sucks the data out of the DOM and builds a new generic html/css page out of it in DOMRenderer. This is then rendered in a visible webview by way of what appears to be a generic NSEnumerator but is actually 85000 lines of code pain, making you say “ooh” and “ahh” and “wow, this program totally fucking blows goats.”

A scrape consists of an (optional) prefix, an identifier or regular expression, and an (optional) suffix. The scrape is used to find something in the DOM that matches the scrape. The found item is then matched to a “significance”, which indicates what the matched thing actually is. Your avatar, for example.

The syntax, as I said, is similar to Xpath, but not the same, as it’s a bit more fuzzy. A child is denoted by “{”, a sibling is denoted by “-”, and a parent is denoted by “}”. So, if you want to find the second “tr” of a table with 2 data cells (and no tags inside the data cells), you would use “table{tbody{tr{td-td}tr}}”.

ClassNames are prefixed with “$” (I’d love to have used a period, but it’s a legal part of a className according to the XML spec, which I find baffling). idNames are prefixed with “#”. So, if that second “tr” was actually <tr id=”uniqueRow” class=”firstClass secondClass”>, the match string would be “table{tbody{tr{td-td}tr$firstClass secondClass#uniqueRow}}”.

If the match portion has an innerHTML, it is what is scraped, otherwise, the outerHTML gets scraped. Unless, of course, you specified a regex, in which case the match is scraped, unless the regex has at least one parenthesized portion, in which case the first parenthesized match is scraped.

Fun, huh? :)

—-

But you can pretty much ignore everything I just said because it’s all waaaay too brittle to be useful.

The tagSoup was a nice first attempt, but it’s no good for actual matching, as I discovered when adding this forum to the mix. This forum uses multiple class tags for a given tagName, which might be in any arbitrary order, so naive string matching is useless. tagSoup needs to go away.

Here is a Core Data schema that makes more sense. I began implementing this and got bogged down in writing an editor for it. I’ve since discovered XQuery, which supports regular expressions, which are completely vital (you’ll know why when you look at nassssty phpBB). That’s probably a better route to take.

Reskinning this is easy, just change the html/css in the “HTML” folder.

Finally, this code is the result of under 20 hours of work, done in a rush because I was annoyed at people telling me Hijack wasn’t feasible. So the code is pretty gross. Apologies for that, c’est la vie, yo!

And enjoy!

24 Comments

Copyright © 2006, 2007 - My Dream App. All Rights Reserved

Username:
Password:

E-Mail Me

Please notify me when there is news on the My Dream App winners.

Atmosphere
Portal
Cookbook

Contestants

  1. Anders MelinAnders

    Stick-It

    A modernized sticky solution that lets people use virtual stickies just as they do in real life.
  2. Andrew WilsonAndrew

    Desktop Wars

    A real-time strategy game that brings the battlefield to your desktop with network play, voice commands and more.
  3. Bob ConlonBob

    Savant Carde

    Takes the Hypercard concept into the 21st century through direct manipulation. Could this be the next big breakthrough in hyperlinked media?
  4. Bogumil GiertlerBogumil

    Herald

    A modern update to the newspaper, combining the power of RSS, simple newspaper creation and sharing, and an eye-catching user interface.
  5. Cameron WestlandCameron

    Atmosphere

    A virtual window to the outdoors for your desktop. View a virtual representation of your area's weather when too busy to go outside.
  6. Dan LundmarkDan

    Blossom

    A virtual plant that responds to productivity, not sunlight and water. Had a good session in Excel? Your plant will thrive. Play too much Warcraft? Expect some withering.
  7. Dillon KrugDillon

    Bookroom

    Get back into reading, with Bookroom. Presents e-books in a beautiful interface, and supports annotations and Leopard's VoiceOver support.
  8. Farzad SadjadiFarzad

    Portal

    File syncing from the future. Sync folders and documents between Macs effortlessly and watch transfer progress through a cool, highly visual wormhole user interface.
  9. James BadcockJames

    Destinations

    Plan vacations and trips with ease and tie related photos and notes to locations on the map as an interactive travel album.
  10. Jeff GreenbergJeff

    iGTD

    A Mac implementation on the popular "Getting Things Done" productivity system with iCal and Address Book integration, iPod sync, and more.
  11. Joe BatutisJoe

    Puppet Constructor

    Create simple 2D animations with the ease of manipulating puppets. With Puppet Constructor, keyframes are replaced by users manipulating their "puppets" with their mouse.
  12. John BellJohn

    Minerva

    A virtual secretary for your Mac. Minerva can automatically process new contacts, aggregate news, remind you of appointments and more, speaking with Leopard's voiceover.
  13. Josh McGuireJosh

    iGotPets

    Keep track of your pet's well-being with iGotPets, and share your pet's profile through the web.
  14. Kevin CapizziKevin

    Hijack

    A full Cocoa interface for browsing and participating in your favorite discussion forums.
  15. Marshall KucharczykMarshall

    SweepIt

    The solution for messy desktops and download folders. Set folders for automatic cleaning based on user set rules.
  16. Michael WuertheleMichael

    Chatboard

    The virtual, network-enabled whiteboard that adds real-time shared visuals to group collaborations.
  17. Michael YuanMichael

    Cookbook

    The ultimate cookbook application, with online grocery shopping, thousands of recipes, Leopard voiceover technology integration, shopping list sharing, and more.
  18. Mickey WemberMickey

    iVlog

    Photo Booth for videos, with easy to use video logging (or "vlogging") support.
  19. Mike GabouryMike

    iSightSee

    An alternative control method powered by your Mac's iSight. Control your Mac with hand gestures and movements.
  20. Peter PeblerPeter

    Bubble Fish

    Bubble Fish is the friend who knows everything, but without the annoyance factor. Ever curious to learn about a word or phrase beyond a dictionary definition? Wikipedia, Google, Flickr and more would be just a control click away.
  21. Raven ZacharyRaven

    Telepath

    Turns your phone into a Blackberry lite. Push important emails, news items, and more to your phone from your Mac via SMS.
  22. Richard WhitelockRichard

    Whistler

    Ever had the urge to create a song until you realized it was harder than it was worth? With Whistler, just whistle, hum, or tap out your creation into music app importable form.
  23. Russell HeistumanRussell

    Ground Control

    Dashboard done right, with a unified design and modules for your most used apps and important information at your fingertips.
  24. Windy ChenWindy

    iStyleIt

    Bring your wardrobe into your iLife with iStyleIt, a virtual closet on your Mac. Pick your clothes with ease, store and rate your favorite outfits, and share them with your friends.

Developers

  1. Jason HarrisJason

    Jason Harris

    Developer of ShapeShifter and Chicken of the VNC.
  2. Austin SarnerAustin

    Austin Sarner

    Developer of AppZapper.
  3. Martin OttMartin

    Martin Ott

    Developer of SubEthaEdit.
  4. John CasasantaJohn

    John Casasanta

    Developer of iClip.

Today's Sponsors

Sponsor My Dream App