Anything Geeky Goes

Saturday, April 26. 2008

Marketing and the Art of Being Difficult

I don't intentionally try to be difficult. I just am.

Connor had a baseball game today and I had to do third base coach because the other dad was gone. It's an interesting role to play for the team. I have to admit I kind of dig it.

But I'm aggressive. Really aggressive.

We had 2 outs, and were down 0-4. The kid up to bat is short, and had a full count on him. Bottom of the batting order. Another kid is on 2nd, Connor is on 3rd, and he's aggressive too. He's leading off by at least 20 feet. I made the call, and told him to steal.

Normally you'd ask to kid to square up to bunt, but this is a casual league, and they are a mix of 6th thru 8th graders, so this kid doesn't know how to bunt. What do you do?

You go for it.

Sometimes you don't make it.

Life is like that too - if you want something, you go for it and fight hard for what you want. If you made a bad move, admit it and move on. If you are wronged, fight back. It's everyone for themselves.

I manage a few other OS projects on Google Code besides Zoto Server. Ironically I've seen a bare fraction the traffic of what I've seen with Zoto:



191 visits from ReadWriteWeb. Maybe being a difficult ass pays after all!




Friday, April 25. 2008

The Art of the Open Source Snark

Edited 4/26/08: I've removed some of the more inflammatory remarks in my post. It's my Highlander temper getting the best of me. I understand the reasoning for OS "bringing the hammer" when they deal with crap - personally I'd hate to battle the likes of M$ and others that may try to bend the rules. Again, I'm not trying to do that here, or be intentionally inflammatory about the whole mess. I just want to get Zoto out there and being used. That's all.

-----------------

I've been having a running dialog with a bunch of OS guys over the past few weeks, and it's been exceedingly frustrating due to the way they handle themselves when talking about my project. In fact, it reminds me of dealing with Gary Denny's snarkiness in my AP science class in high school.

Here's what has happened so far. At around 5:30AM on Sunday the 23rd of March, the morning after pushing all the code up to Google Code, I write the wiki page for the project and end up saying something along the lines of "companies using this in a commercial install should contact me regarding licenses", "and it's free for non-commercial use". A few days of being busy in the office goes by and then Spring Break comes up. I go to the Redwoods with the family.

Keep in mind that at this point I've only worked on the site for about a day. The site isn't getting traffic - why would it? And no, people don't read my blog - I can show you the numbers. It's like me, some kid in Albania, and 2 search engine bots that someone forgot to shut off. Seriously - even my mom doesn't read it.

Anyway, I get back from vacation, and sit down to read my work email. There's an email from someone I used to work with a while back. He's posted on Delicious about the project, and is all fired up about it. I also have one comment on the wiki, and a fucking bug ticket opened on me - both for having the wrong wording on the main page of the project. Talk about premature - I can't even get our SQL schema installed on the DB, and people are coming by to visit - and opening tickets on *me*. Sigh.

I'll admit that the text didn't sound like it was Open Source when I went back and read it. I responded to the posts, followed a blog link the one guy sent, commented on the blog post, and went back and edited the pages with what I thought was more satisfactory wording. Yes, it's BSD. Yes, it's free. Have fun.

I went back to my real job.

Saturday comes, and I flop down in the office to work on it some more. The Google Code page is gone.

WTF.

After floundering around a bit, I get around to checking my Gmail account. Hello. Chris DiBona. I know that name.

Turns out Chris was trying to get me on the horn about the text on the page the same day people saw it on Delicious. Some guy reported it to Google, and Chris in turn emailed me. As I didn't respond to him (after three pings I might add), he turned the site off. As I've been in a similar situation before with my ISP, I understand this approach. Still, I was pissed about it, and it showed through when I started dealing with him about getting it turned back on. Regardless, without going into details, we're cool, Chris turned me back on, and all was good.

So I thought.

Here's what I got yesterday in my Gmail account. I've really got to start checking it everyday I guess.


Date: April 24, 2008 6:55:38 AM PDT
From: xxxxxxx@opensource.org
Subject: Zoto's use of New BSD license
To: kordxxxxx@gmail.com, chrisxxxxx@google.com
Cc: xxx@opensource.org

Kord,

I'm contacting you because people on the OSI's mailing lists have alerted us to a possible misuse or misunderstanding of open source software. This error seems present both in postings you have made on Google's Code site as well as other sites.

On readwriteweb.com you say "The BSD license is one of the few licenses that actually allows a separate license to be placed on the same code. That means I could put a separate commercial license on Zoto later, as I mention on the page." Actually, there are several ways one can put separate licenses on code, such as the way MySQL puts a special license on their database which, in the public sphere, is covered by the GPL. But regardless of the mechanism by which you apply some new license, the question as to whether or not a given distribution of software is, in fact, open source, is entirely dependent on whether you choose an open source license or not. By choosing the New BSD license, which is an open source license, the distribution you make under that license is open source, and as such, YOU MAY NOT ADD ARBITRARY TERMS TO THAT LICENSE WITHOUT CREATING A NEW, NON-OPEN SOURCE LICENSE.

Of course you may choose to create a new license that says "All Your Base Belong To Me", which is not an open source license. And that's fine. Just don't call that distribution open source, because it is not. You can say "the software content is also available under and open source license >here<". And you can refer to that software as being under an open source license. But the idea that you can force somebody who is using open source software to appeal to you for permission to use in a commercial context IN ORDER TO COMPLY WITH THE LICENSE is wrong.

Thus, when you say:

FWIW, it makes sense for someone to license the code from me if they
are going to be using it in a commercial application. They might
need install services, support, extra features added, etc., and it
would be a requirement for them to use it.

You are either talking about a license that is not open source, or you are misspeaking about a license that is open source. Red Hat, for example, has terms of service that govern the /service agreement/, but they, and all others who abide by open source terms, do not say that the rights to use the software depend upon executing the service agreement. Instead, it is the services that depend on executing the service agreement.

I have copied Chris DiBona on this email because of the statements made on the google website, which are potentially misleading:

Remember, the BSD license basically says any BSD code can be sold or
included in proprietary products without any restrictions on the
availability of the code or someone's future behavior. As such, the
BSD license also allows for the addition of extra commercial
licenses. Some companies may desire or require these extra licenses
to be able to run the software in their organization.

If you require extra licenses to run Zoto's software in a commercial
environment, require custom features, or need commercial support,
you can contact Kord Campbell at kordless@gmail.com
, to inquire about the different
licensing and payment options. If you don't, then download it and
have fun!

The first sentence, that BSD code can be sold or included w/o restrictions on future freedoms is correct. The second sentence is a non-sequitur. The third sentence is potentially misleading because it implies that licenses are required in order to comply with the terms governing the software itself, not because of policies a company may impose upon itself about running commercially supported software in production.

True open source software can be run in a commercial environment without payment of fees or without special permission. Period. If somebody has a policy that they can only run software backed by a third-party vendor, that's their choice, and while they might choose you as their support agent, open source does not allow you to be that exclusive agent. If you choose to offer services for a version of software where you can mandate your exclusivity, more power to you, but that's not open source.

Chris, I hope that Google has a policy about what to do if it receives reports that claims made about licenses hosted on your site are false.

M



Two things strike me here about this email. First, I'm not given the benefit of the doubt that I'll handle it properly when he emails me. He CCs Google on it AND then proceeds to talk directly to Chris in the email. Second, he proceeds through an analysis of my comment on ReadWriteWeb, which has ZERO to do with what I say on the Google Code page. While I'm sure I'm semantically off a bit in the comment, what I basically say is that anyone can place additional licenses on the code if they want. The BSD license allows this, and more than several companies have done exactly this with their software.

Essentially what he's saying is he's not happy with the wording of the text on the page for the project. If you're confused by this point, welcome to the club.

As best as I can tell, there are a bunch of Open Source heavies running around policing the Internets. It feels (to me) they are blissfully ignorant of a source code's success or failure, and they have absolutely nothing constructive to contribute, other than snarky remarks about licensing concerns. They hang on every word, every byte, to ensure you and the rest of the world are protected from evil OS doers.

It's a crock of crap really.

These licenses are licenses of use. If you put a BSD license on code, it's licensed under that license. Period. The license is not affected by rain, snow, or bad breath. They are there to protect the users of the software from the person or company that originally wrote it. The give a license of use per the text of the license, not the text of some stupid page somewhere.

BTW, these licenses don't go the other way. They don't say that because you put the license on your code, that some self elected organization gets to tell you what and how you say it on your web pages. (Google does however, as you are required to follow their TOS if you use their site and servers!)

Still, in the interest of moving forward past all this madness, I've gone and plagiarized Wikipedia for the BSD license wording and replaced what I wrote the second time around. In the morning, I'm going to start applying the BSD licenses to each of the directories in the source tree so this will be settled once and for all.

As for Zoto's happy entrance into the OS world, it's been bitter-sweet. I'm hoping to move past this and get on with doing some cool stuff with the software.

Thanks for listening.

Saturday, March 22. 2008

Open Sourcing Expensive Proprietary Software for Fun and Profit

I'll keep this simple. I'm uploading the Zoto 3.0 source code to its new Google Code project tonight, and placing the BSD free software license on it. Version 2.0 of Zoto will follow in a couple of days (as soon as I find where we put it). We've also been working on a new site called Fotofluff, and its code is going up there as well.

This may seem to be a radical idea, but I think it's EXACTLY what needs to be done with the software. I've done about as much as I can with the project, and it just didn't work out the way that I thought it would with the users and marketing. Something has to give, and it's going to be me - giving everyone the code to view.

Open Source software isn't necessarily free to use - it can cost money to install, develop for, and maintain. I'll have to figure out some type of revenue model for this. Maybe we'll do Paypal donations, or revenue share like PHPGallery.

Post more later when I've got it up!

Edit: I've updated this text to be clearer about the fact people are allowed to use the software free of charge, as much as they like. As you guys have pointed out, the original posting was misleading and incorrect. Thanks.

Monday, January 14. 2008

Pathetic Excuse for a Blog Post

No time to post. No time to post.

Macworld is tomorrow. A little bird told me that you'll see a full-featured debugging kit released for better developing of web applications in Safari. Think Firebug for the Mac crowd.

And no, it's not this.

Thursday, December 6. 2007

Splunk's Preview Launches

I've been busying myself over at Splunk since September, which explains my complete lack of attention to my blog postings. Working on getting a bit of exposure to their dev team, and starting to work on their developer API site to help developers get going with their REST APIs.

Zoto now uses Splunk for weblog analysis, and I'm doing things like charting the number of hits a particular image gets in a given day, signups, visits in general, etc. It's not necessarily built for web stats, but it definitely does the job with a little tinkering. While Google Analytics has nice graphs and all, Splunk ends up being a bit more customizable ongoing.



Zoto is getting ready to launch a new service, Fotofluff.com, which will be providing on-demand, multi-authored media content. There won't be an auth system in the traditional sense with Fotofluff, so we are implementing a 'cleanup' method which will monitor the logfiles with Splunk, then delete media and containers when they haven't been used in more than a month.

If you are interested in Splunk, check out the Preview over at Splunk.com! Happy logfile eating!


Wednesday, October 10. 2007

Why are There Afghan SAM Launchers All Around?

Josh and I had to make some concessions today to move forward the implementation of the publishing photos to the Charlotte Bobcat's website from their install of Zoto. The Bobcat's website is housed on servers controlled by the NBA, and they have strict requirements regarding where content is stored.

The requirements? The content has to be stored on their servers. All of it. That's it. End of requirements.

Well, that's not so bad, you say. Maybe they run some LAMP stack or something, where we can push out our nifty code that does nifty things with pictures? No. They wrote their own proprietary content management system, and it serves up basically static content, with a bit of XML sprinkled in for good measure. All content needs to be on their servers. No databases here buddy, just CSS, XML, and good old wholesome HTML. And no, we don't have an API.

As you can imagine, the programmers at the NBA have great job security. Who in their right mind would fire the only guy that can write the upgrade for software which runs your multi-million dollar business website? A fool without a job, that's who.

Content management systems aren't hard to implement. This blog uses a simple CMS system called Serendipity, and I can pretty much change whatever part of the code I need to make the site do what I want. Sure, the NBA site is a bit more complicated than my dumb blog, but it's still just a CMS for christsakes! Surely these guys that wrote it were writing features around 'job security' instead of 'expandability'. These guys are crafty, right?

Or maybe they aren't. Maybe they found themselves in situation where the managers forced a pile of conflicting requirements on them that in turn forced the developers to find less than elegant solutions that still solved the entire feature set, but made extensibility impossible. Maybe the developers didn't fight back, and had to build something custom - for their uniquely custom social situation. This is the NBA after all - some of the managers are 6'8" and weight 240lbs or more. You sure as shit aren't going to argue with these guys when they ask for a button here and a flashy thing there, are you?

Which brings me to Dwain. Dwain is both a manager, and a pussy.

Found at 118 King Street stairwell on 10/8, (1) sheet of quad-folded paper, labeled File: C:\Documents and Settings\dwain\My Documents\GMnotes.txt. Discussion of what product is uncertain, but is as follows with spelling/grammar errors preserved:

- why are there afghan SAM launchers all around?
- why are all units sunk into ground / buildings?
- when is it scheduled to actually get worked on by art?
- is the greyblock 100% signed off? from my looking over the level it has not even had a 2nd pass terrian nor any real attention to block layout
- heroic choice just seems like a timer for the mission more then an active choice
- boss fight does not cover just why the cannon will not take damage from IM's chest beam
- #47-49 references sea units which we do not have
- i find it unclear what these charging stations have to do with keeping IM away from the actual gun
- #89 does this only call if missiles are fired? what about RT & Unibeam?
- the order doesn't make sense, there is a whole convo at the end that seems like an alternative to the middle. how/why?

Let me shed some light for you Dwain. Programmers that code up SAM installations from mother-fucking Afghanistan aren't going to take this type of shit off you. If you asked the developers I work with such ambiguous questions of 'how?' and 'why?', they'd sure as hell tell you where to sink your units. In your ass Dwain. In your ass.

On the other hand, your counterparts at the NBA would sooner die than be asking their developers such stupid questions. They would show no respect with their developers. "What the fuck were you guys thinking when you broke RT and Unibeam's ability to take damage from IM's chest beam? Christ-Fucking-Almighty! What are you guys, idiots? Now get in there and keep fucking IM away from the actual gun! Break!".

Yeah.

There's got to be a happy medium. A level of respect between both parties. Ideally, your managers should be doing a little development, and your developers should be doing a little management, ultimately giving the entire team a better understanding of each other's positions on the project's requirements.

And, if you can understand other people's positions, you'll tend to work toward minimizing your impact on them. Chances are, they'll do the same for you and help get things done better and faster ongoing.

Or, hell - maybe managers just suck.

Friday, August 10. 2007

Stewart Butterfield and Et al. Need to Get the Fuck Out of Yahoo

Last month I was shocked MySpace paid a third of a BILLION dollars for the crappy-ass photo sharing site, PhotoBucket. As I said the week it went down, "Hell, guys I'd have taken 1/100th of that for Zoto!". Yes, I am envious of the founders' success, but at this point I'm definitely used to the feelings acquisitions like this elicit.

I only have one thing keeping me sane through all this - I haven't sold Zoto yet. When you haven't been bought yet, you really don't envy one success much more than another. It's a bit like being flat broke, and trying to decide which you envy more, Bill Gates or Donald Trump. Personally, I was more envious of the Flickr acquisition than the PhotoBucket one because I think Flickr is a better product. I'm really envious that they were able to come up with such a great product, not how much they made on the transaction. With the PhotoBucket acquisition, I was just like "WTF did they do that for?".

If I were Stewart however, I'd be having a fucking cow right now. If he and Caterina were lucky, and assuming a typical angel/seed investment round before being acquired, they would have walked with about $9 million before taxes in stock and cash. With Yahoo's crappy performance in the stock market over the past few years, they'd have $7 million or so left - assuming they kept the stock. That pales in comparison to what the PhotoBucket guys just got for their crap. The worst part is that Flickr has way more potential to generate revenue than PB does. Jesus.

Stewart got screwed.

And to make matters worse, now he's in corporate hell. I worked for LookSmart for about five months in 2003, and I hated it. There was simply no allowances for innovation in the over-the-top process they practiced. I know that Yahoo has been diligent about keeping things real with their acquisitions (remember GeoCities) but, with the recent announcement of Flickr replacing Yahoo Photos, you can bet your ass that Yahoo is starting to get their meaty process hooks farther and farther into the Flickr property. It's inevitable Yahoo will eventually ruin Flickr just like they did all their other acquisitions. Just give it time.

Unfortunately, it would appear that Stewart and team don't see the merits of striking out on their own again. They probably assume they are still masters of Flickr's fate, but I doubt that attitude is long for this world.

Stewart has a resource that I can only look at longingly - the Flickr fan base. If he, Caterina and Cal were to start a new company, their fans would flock to them in union, and go about to all corners of the Internet telling everyone how great the new service was. They'd enjoy a virtual overnight success with adoption, and would immediately start benefiting from community feedback. Early mistakes in the new model would be easily forgiven, and partners would lie down in the street to be happily trampled by the crowds of users.

Stewart, you can make another success dude - you just have to get happy with walking first. Flickr is going to become what it will become regardless of what you do now. The wheels of fate began grinding the second you signed that contract. Get out now while you still can.

Get out and make something fresh and new again - for your fans, for your peeps, for yourself.

Friday, July 27. 2007

Grub is Back

Seems pretty weird to be working on Grub again. I'm not sure how much I'll be working on it ongoing - if at all - but it sure does seem right. It's as if 'things' have been pointed in the correct direction now that Wikia is doing it again.

Pretty exciting stuff, but it also means lots and lots to do!

Friday, June 22. 2007

Resizable Photos with AJAX and ImageMagick

I've been wanting to put together a technical implementation post again for a while now, and because I run Zoto, it makes the most sense that I do one based around manipulating images for use on a web page. I've used these techniques to improve Zoto's image detail page and a preview of that functionality is available here, and it will soon be pushed to our production site.

One of the common problems with showing images on a web page is downsizing the original image to fit in a post column, or for the whole the screen when displaying a larger size. Because your visitors are using a variety of different sized browser windows, it would be nice if you could tailor your images to fit nicely inside their browser without having to generate all the different rendered sizes they might need beforehand.

Of course, some browsers already do this for you when you view a larger sized original image. Some, like Safari, don't downsize, and others like IE downsize with artifacts. Either way, scaling a larger sized image to a smaller view is going to require the overhead of sending the entire original over, and dealing with some users seeing artifacts on the image when the browser is doing the downsizing.

What is really needed is a way for the browser figure out what the best size would be to show to the user and then downsize images on the fly server-side.

The following example requires a few external tools. You'll need to have PHP5 installed on Apache 2 running mod_rewrite, and optionally need mod_proxy installed to cache your images after rendering them. You'll also need the command line version of ImageMagick installed on your system, and the MochiKit library installed in your web path. You will also need a directory in your root web path called images.

Here's a live example of what we're going to accomplish with this guide. You can click on the image to alternate between landscape and portrait image examples. You can also resize the browser to force fetching a new image that fits inside the browser. This should work flawlessly in all JS enabled browsers. Notice the image source on the page is the same size as what is being displayed.

The example uses three files that include a .htaccess rule, a PHP script, and a HTML file with JS.

The .htaccess file does a simple rewrite allowing you to alias image requests to an image directory and subdirectory of the different sizes:

RewriteEngine On
RewriteBase /
RewriteRule
^images/(.*)x(.*)/(.*) http://www.geekceo.com/image.php?filename=$3&width=$1&height=$2 [P]


The last two lines above should be on the same line. In other words it should say something like "RewriteRule ^images...etc.".

You'll want to put the rule in your existing .htaccess file in your web root directory, or if you don't have one already, create a new .htaccess file with the lines above in it. Note: If you don't have mod_proxy installed on your version of Apache, you'll want to remove the [P] from the end of the rewrite line. The flag is used for caching content in Apache, which is nice if you are serving a ton of different sized images.

Once you have this rule in place, you can use it to request a URL in the following format, with width and height being specified as a directory:

http://www.geekceo.com/images/200x200/connor.jpg

This rule maps all requests to the image directory to the file image.php, along with some parameters that include the filename and the desired width and height. Note: The link to image.php uses MochiKit's Source Viewer utility, which will show the filename as image.html. You should put that code into a file called index.php in your web root directory so it can be executed correctly by your Apache server.

Based on your current setup, you may need to change a few things inside the PHP file, including the paths to your image directory, and the command line convert utility from ImageMagick. Note: You could just as easily use a PHP ImageMagick library, but I just elected to run it from an exec() call to make it simple.

Take notice of the following line in the PHP file:

$sharp_sat = "-modulate 100x105x100 -unsharp 1.2x1x0.50x0.02";


This increases the sharpness of the smaller images, and increases the saturation slightly making the images 'pop' more. Both Zoto and Flickr use this technique, and for the most part it ends up making your photos look better than just downsizing them alone.

Finally, you need the HTML file, image_resize.html, that will display the larger sized images full screen - as big as the user's browser supports. This file should also be located in your web root directory, as should the packed version of the MochiKit.js JS library.

When the HTML file finishes loading, it fires an event that calls main_load(). main_load's job is to connect a browser resize event to browser_resize(), and to swap_image(), for when someone clicks on an image. The last thing main_load() does is call update_image_url(), which updates the image URL for the first time for viewing once the page finishes loading.

When a user resizes the browser window, browser_resize() gets called. This function contains some madness required for dealing with Safari's habit of constantly firing resize events while you drag the resize handle around. We've tried a variety of methods to deal with this, including disconnecting the signal, but none of them work very well except where you track how many times the event is fired, and only deal with the last one that comes back from MochiKit's callLater function. It gets complicated, but basically when you are dealing with asynchronous events you have to consider nothing comes back in an expected order.

The update_image_url() call figures out the current width and height of the user's browser, and then resets the image_detail URL to whatever will fit inside the browser nicely. Some simple inline CSS centers the image in the window.

These functions could be taken out and integrated into a liquid layout such that the image resized in a blog post, or modal popover. Implementation of that functionality is left to the reader! Left to do is include a spinner in the background while loading the image, and doing snap sizes that will increment the image size by 10 or so, instead of single pixels.

Pulling it all together, I end up using the PHP file to give me a resized version of my daughter Lily, and link it to a larger version, which uses the JS to show the larger size:

Monday, April 9. 2007

Zoto 3.0 Launches

Well, Zoto 3.0 is finally out!

That explains the complete lack of attention to my blog, including the fact that I had to take it down when we reconfigured the load balancer on Zoto's network over a month ago.

I've got it back up, but still haven't had a lot of time to post here about anything worthwhile.

I think my next post will be a HOWTO on how to configure Apache to serve dynamically generated image content from another type of webserver using only a .htaccess rule. We recently had to do this for Zoto for image serving speed, and I learned a lot about what Apache can do with its rules.

Anyway, at least I have a new post up, and I'm back in action with the blog! ;-)

Defined tags for this entry:

Tuesday, January 2. 2007

How to Build a Better Search Engine

I went into Starbucks yesterday to get some coffee before heading to work to repair our mail server that crashed (sigh) on Friday morning. While I was there a friend of mine, Buddy Haydel, showed me an article that was in the New York Times titled In Silicon Valley, the Race Is On to Trump Google. The reason he showed it to me was because he knew a few years back I used to be in the "business" of search myself, and well, that's what he does - show me things he's read.

I hit Digg this morning (for lack of a better site for nerd news) and saw the same article. For the past two hours I've been researching these semantic search companies in the article, and trying out their "new" technologies for myself. Let me summarize by saying that most of them fall flat on making a decent effort - in effect having lost their left shoe somewhere just past the start line in the race.

Hakia, a semantic search engine, is currently down for maintenance, although it was up earlier and served up some interesting results - with highlights of text that is similar in context to your search. Chacha has an interesting twist to the semantic approach - you can choose to chat with a "search specialist" to help you refine your search. Assuming you get a hold of someone that knows your subject, they can probably refine your search and help parse out non-relevant search results. However, don't bother if your search is technical in nature.

I asked my specialist about "how do i set up postfix to connect to a postgres database?". Unfortunately she wasn't familiar with any of the key terms, so she didn't know to ask the all important refinement question like "what OS/distro are you running it on?". Turns out after chatting for about 5 minutes more with her I found my "specialist" was a stay at home mom that simply cut and pasted the first result off the standard search from Chacha. At least she was honest about her specialty, which was the do-it-yourself section.

For comparison, I refined the query slightly on Google to "how do i set up my gentoo box with postfix and postgres?" and got the top result pointing to Gentoo's excellent wiki on how to do just that. In all fairness, if I had asked her something like "i'm having problems with my windows", she would probably have known to ask something like "microsoft windows, or car/house windows?", as opposed to Google who just assumes that you are talking about the OS. Stupid Google.

In theory the technique of having a human sitting in between your search engine and the user holds some promise. Basically it would allow you to parse the search terms better than any current real time semantic algorithm can (even with a stay at home mom at the helm), and the search specialist could ask you questions that help eliminate non-relevant results. It's to bad that this model falls apart at scale - there simply is no way to allocate a sufficiently skilled person for every single search term being processed by a busy search engine. And, at the end of the day, we are still all fallible humans which end up being several orders of magnitude slower than a machine would be - if it could do semantic parsing for the whole web.

And here's my point - computers can do a pretty decent job at semantic parsing already. The algorithms for doing it already exist today. The real problem is that they take FOREVER to do it for large data sets. For reference, FOREVER for a computer ~= 5-10 seconds of processor cycles X the number of pages on the Internet. That's a really, really big number.

Here's the real tear - to be better than Google in search is going to require a radical shift in the basic approach to search. FYR, the way search engines work today is that they a) crawl for content from a central location, b) take that content and run some algorithms on it on a really big cluster, c) insert it into a database, and then d) allow people to search the database for random search terms. The biggest problem in this "pipeline" is that you have to spend a LOT of processor time running algorithms, and inserting into a database (and make it searchable) is hard to do in real time.

A lot of people don't realize that Google is where they are today because they (the big "they" - Sergei and Larry) wrote a distributed file system that allowed them to run a page rank algorithm they "borrowed" from someone else. With their fancy pants GFS in place, Google was able to distribute the job of PageRank across a lot more servers than anyone else ever had, and they suddenly found themselves with the ability to do what was impossible before - running a CPU intensive algorithm on millions of pages a day.

Just like PageRank, semantic search has high cost in terms of CPU usage. Even Google can't do a semantic parse on all the pages they crawl with the million or so servers they own. Yeah, it's that big of a problem.

Q: So how do you solve the problem, get rich, and make the girly boys at Google cry?

A: Distribute the job to millions of servers.

Step #1: Find lots of servers. According to Netcraft there are over 53 million active websites on the Internet. If you look at previous charts on Netcraft, you see a ratio of about 4:3 for sites to servers. That means, roughly speaking, there should be about 40 million servers on the Internet today. For convenience, those servers also contain the data that you need to index for your engine. How nice of them.

Step #2: Write some new server software. This software should distribute the "jobs" contained inside the search pipeline, including semantic text parsing. Write some client software that will listen to those requests, and then make it downloadable by the server's admins from Step #1. Try, where possible, to give the jobs to the computers containing the data that needs to be accessed for that job. Make it a plugin for Apache and IIS so you get 90% of the server market covered.

Step #3: Find a compelling reason for these server's admins to install the software. "It's free and minty fresh!"

Step #4: Profit.

Note: Your plugin/module for the servers should be configurable - allowing it to access data inside databases, and/or on the filesystem, and have limiting settings that defines how much proc cycles can be used for indexing, and what to stay the hell out of. It should also be programmable via the master controlling servers - allowing algorithms to be tweaked specially for the data it indexes, or to have real-time queries sent in to process on the data set. Their server becomes, if you will, an extension of your new Google's Ass Pounding Search engine.

Powerset, besides having an apropos math nerd name, seems to be the most interesting of the lot in the NYT article. (BTW, I'd be pissed if the NYT interviewed me and then left my logo out of the copy graphic.) Perhaps they are thinking what I am here - they seem to be fishing for it at the very least.

{go}, {powerset}, {go, powerset}

Monday, November 27. 2006

The Best Image Detail Page Wins

I've been busy over the last month coding on the new Zoto. I've taken over the development for our photo editing features, and our new blog and forums. Keep in mind that I'm first and foremost a manager - I do a hell of a better job telling coders what to do that I do coding myself, but in a pinch I can do a decent job of getting code written. However, being a bit of a lazy ass, I like to utilize other people's code where possible, and allowed. In that vein, I've chosen PXN8 for our photo editing solution, Serendipity for our new blog, and Vanilla for our new forums. I still have quite a bit of work to do on these features, but should be finished up in a few weeks at the latest. BTW, I now happen to be single-handily responsible for the support of PostgreSQL for Vanilla. That there is geek-cred, baby! :-)

To my point. I was doing some research on support for RAW formats when I ran across a post by Andy Atkinson regarding the comparison of Flickr and SmugMug, and why he was leaving SmugMug. It's impressive that Don MacAskill (CEO of SmugMug) got to the post as fast as he did - in under 24 hours. He must keep his feed reader maximized on his desktop and connected via XML-RPC to a small shock device embedded in his watch. I'm sure it also helps immensely not having ten SSH windows open to the development servers like I do.

Unlike Don's mostly agreeable attitude, I'm going to say that I strongly disagree with Andy's reasoning for leaving. I'm not saying that Flickr isn't a better solution for Andy, I'm just saying that he doesn't really know why he's leaving. Allow me to counter a few of his points, and then tell you my why I think he left. It is, if you will, a bit of competitive research, which I do practice occasionally.

First point, regarding the fact there are a large number of tools available for connecting to Flickr's API: A large number of tools != better base service. In other words, making the argument that Flickr has a lot of programs using their API does not necessarily imply the service is better. If you want a good analogy of this statement, look at the comparison of Windows to OSX. Windows definitely has the market edge when it comes to the number of programs that run on it, but arguably costs more to operate long term than OSX does due to crashes, viruses and spyware. Yes, it's cool that so many people write software for Flickr, but it doesn't make Flickr's base service better.

Second point, regarding the desire for IPTC and/or XMP standard support: Both Flickr and SmugMug support IPTC, but neither of them support exporting IPTC in the files when you download them from the site, unless it was already in the file to begin with. We're going to add IPTC import and export functions to our new release, but it's support is still spotty when it comes to photo management applications. Hello Apple. IPTC for iPhoto!

It's not a strength of Flickr then if you tag your photos and they don't magically get that data written to the IPTC fields in the JPG you download. While it's cool that a service supports importing IPTC, it's lame that nobody (to my knowledge) supports exporting the fields as well. We'll be doing exactly that on Zoto 3.0, FWIW, for exactly the reason that Andy lists for leaving SmugMug - even though Flickr really couldn't do better.

Fourth point - there is no doubt that Flickr's tagging system rocks. While a hierarchy (like SmugMug's galleries) is nice conceptually, it is unwieldy for the user, and making a decent UI that supports it is quite difficult. We made the mistake a while back in implementing a hierarchy for our tags, and I've regretted it since. Smugmug doesn't really have the tagging feature down pat, assumedly because it was an add-on, and never made a priority. Flickr could still use some tagging features, like the limiting you can do on delicious. A good mix of these features on a photo sharing site would be to have a flat namespace tagging system, with an album system that allows nesting of galleries.

Fifth point - permissions: Permissions are hard to implement correctly. Flickr has a way to build a list of contacts into contact lists named "friends" and "family", which make it convenient to limit to those two groups. However, there are limits to what you can do with this as you may have a set of contacts (like "crazy ass work people") that you'll share just about anything with - unlike "family" which has your mom in it, and thus shouldn't see those photos. SmugMug finds itself limited in the perms department because they require all accounts to be paid - thus eliminating your mom from having an account. A good solution would be to allow users with no upload permissions ("viewers") that can view, comment, tag, etc., but can't upload photos, and then allow the user to build contact lists as they see fit. I agree that Flickr's perms are currently the best solution for sharing photos online.

Sixth point - community support: There is no doubt that Flickr has a rocking community section. However, this alone doesn't make a site more attractive to someone who is looking to upload, archive, organize and share their photos online. Many Flickr users never take part in the Groups/Pools area of Flickr. Just for reference, Flickr's groups are pools of photos with comments on them. Pools are just a set of photos. The majority of people looking to put their photos online only care that you have good features for building your own sets/galleries/albums. SmugMug rocks at allowing you to make custom sets of photos for sharing. Much better than Flickr's static white theme on all your sets.

Seventh point - uploading photos: Uploading photos to the Internet sucks. It sucks because user's connections are asynchronous in nature - slower uploading than downloading. Yes, an all Java uploader sucks - but I bet you a plug nickel that is EXACTLY what Don is eluding to with SmugMug's "new feature" he mentions. Flickr's uploader is OK, but it still sucks because you have to wait on it to finish that set of photos to upload. It also sucks because there is a cap on the uploads (if you have a free account). What if someone wrote an uploader client that allowed for batching uploads? That way you could drag, tag, upload, repeat as desired, walk. Hmmmmm. I think I'll do that.

Eight point - geotagging. I'll say this again if you didn't know this already. I know the guy that was the first to implement this with photos. His name is Alex Jarret and his site over at the Confluence Project pioneered putting photos/blogs and geo data together. Zoto was the FIRST site to implement geotagging, and SmugMug was not far behind. If you want to fault them (or us) for anything, it's not continuing to apply updates to that feature. However, and this is important, NOBODY is ever going to use it extensively for organizing their own photos. That is, until every camera ships with a GPS in it. Unfortunately to date, nobody seems to be shipping a decent camera with a GPS in it. I wonder why? Again, it's cool and all that - I've shown that at the least I think it's VERY cool, but still nobody cares about this feature. Note: Nobody is defined as < 1% of your user base.

Ninth point - printing via Qoop. Yeah, Qoop is cool. Here's a shocker for you though - nobody prints photos online via Flickr. I know a guy that worked at Yahoo after they bought Flickr and implimented printing, and he said (take this FWIW) that they did less than 20 print orders a day. With 1 million users, that works out to less than 1% of their users printing in a given YEAR. Yes, I know people that have printed books on Qoop. However, I helped with integration of Qoop and Zoto, and even I haven't ordered a book yet. Printing is one of those things that everyone wants, and nobody uses. You have to have it if you have online photo sharing, but nobody uses it.

Tenth and last point - migration tools. Migration tools are important for one simple reason. Proprietary/closed systems suck, open systems rock. However, I really don't think that you can call Flickr an open system and SmugMug a closed system. Looking at both APIs, you have access to the photos, titles, tags, comments, etc. You are simply lacking the software that utilizes SmugMug's API decently. That said, a lot of Flickr aps don't work because they are free and unsupported. What you really want is a service that allows importing from one service and exporting to another - as you see fit. They are, after all, your photos.

So, I don't think that Andy has really stated a solid reason for moving to Flickr. Again, I agree that Flickr is more appealing to him, and others, but he hasn't really given a blow by blow account like Greg Reinacker did. All he did was list a bunch of excuses for the move to justify one over arching desire to move - Flickr appears to be superior to any other photo sharing site out there. (Yes, I did just type that.)

Now I've posted before about why Flickr sucks, but that was to give a dissenting view to counter all the fan-boy action going on with them. In this regard however, Andy is simply choosing Flickr as a solution because, somewhere deep down inside, he likes the way that Flickr shows his photos to him. Specifically, I'm talking about what we call the "image detail page" - the page that shows the photo to you when you click on a link or smaller image.

This is the real meat of this post. I maintain that Flickr has succeeded as much as they have because of one very important page on their site. The image detail page. To illustrate my point further, here's an image detail from Flickr, SmugMug, Zoto 2.0, and Zoto 3.0.

Notice the amount of links off the page on Flickr, and the speed at which you can move to the next photo. This is what makes Flickr appealing to its users - nothing more, nothing less. The ability to "link" to other photos makes it easy to get around and explore images. It makes the VIEWING experience nice for the user.

If you look at the SmugMug page, it's slow to load, and the links off to the gallery and tags are crammed down below where you aren't going to see them (esp. if they are below a page break). While the page looks nice, it's functionally a dinosaur. Small prev/next links make it impossible to navigate, and the modal popup requires clicking on an impossibly small "x" at the top. Looks great, but feels like poo to the user. If you had to look at your photos with this, you'd kill youself in 10 minutes.

Speaking of killing yourself, Zoto 2.0 will make you kill yourself in half that time. While we have decent prev/next navigation, the pages are hella slow to load, and they don't have a decent way of getting to a larger sized image (click on "other sizes", click on size, wait, view, close window). We also fail miserably in encouraging good tagging, which eliminates links being created, and we don't show any related photos when you are looking at photos in a gallery. Boo Zoto 2.0.

With Zoto 3.0, things are better. Notice the speed of moving between images - there are no page loads occuring there - it's all AJAX. When we add groups and album support, there will be links to the right of the image, just like the good positioning going on with Flickr. We'll also add a "lightbox modal" that will do a popover on the page - working much like Flickr's popopens for sets/pools/groups, but much faster and with more images. We've also added a second type of "image detail" to the lightbox that we call "lightbox modal detail" that will provide rapid fire viewing.

At the end of the day people will choose to use a photo sharing site because of one thing: How fast can they view their images at decent size? The reason Flickr has done as well as they have is because they have excelled at this one thing very well. Their "image detail" page rocks. Hands down.

Anyway, I've obviously plugged Zoto on this post, but that's why this blog is here, after all. My primary purpose was to debunk all this crap about why one service is better than the next because of X feature. These comparison posts don't help the companies any, and they sure as hell don't help the end user any, because there is little to nothing someone like SmugMug is going to do about it when they read the post (other than freak out).

Why they prefer one site over another is a complicated affair, where the user themselves is probably not even aware of the real reasoning. That is left as an exercise for those of us (sort of) paying attention.


Thursday, September 28. 2006

Paypal Violates Privacy with DoubleClick Ad Filled Payment Emails

Oh the insanity of some people. Today I opened my email and was skimming over the Paypal transactions from last night when I noticed to my shock and awe that there is now DoubleClick driven ADVERTISING in my payment details email. Whoever decided it was a good idea to put ads served from DoubleClick in our payment emails seriously needs to be fired. Like right now.



Let me explain why this is a "very bad thing". Those ads you see are being served from DoubleClick. That means each time I get a payment from Paypal, DoubleClick knows about it. This is bad for several different reasons, mainly of which that I don't have an agreement with DoubleClick like I do Paypal to keep our transactions private. Here's one the URLs from the email for reference:

https://paypalssl.doubleclick.net/jump/paypal.us/ReceivedMoneyRec=Receipt-email;lang=3DU.S.

There is more to the URL, but I don't want to reveal any details about Zoto's Paypal account here. God knows what is encoded in the rest of the URL parameters.

Besides being out-of-your-mind stupid with my privacy, Paypal, I'm upset because I'm already paying you for your damn service. You make a cut on each transaction we run, and then we pay a monthly fee on top of that. We are paying users, and we don't want your stupid advertising in your emails to us, regardless of whether you think it's secure or not (which it's clearly not). It's bad enough that everywhere we go on the Internet today is filled with mindless advertising, and that I already get a crapload of SPAM in my inbox (thanks Ken), but to start filling up the payment records with it is just fucking stupid.

If you agree with me, and you have a Paypal account, you should go complain here.

Stop it right now guys. Stop this insanity, and fire the loser that did this.



Digg!

Tuesday, September 12. 2006

Apple Turnover

Apple is going to make some announcements today regarding movie downloads. I thought I should post a short blurb on what my predictions are regarding the announcement.

First, you are going to be able to start downloading movies via iTunes. That much seems pretty obvious.

Second, you need something to watch the movies on, and a regular iPod blows for watching movies, so we are going to see a new widescreen iPod released with more memory. I say fullscreen display with controls on the back, with a 100G drive in it. Something needs to give with the transfer interface, but I'm not sure it's going to be on this release.

Third, the TV. I've had a Mac Mini hooked up to my HD TV for a few weeks now, and it's awesome. The only problem with it is you can't record from the broadcasts, ala Tivo. So, you are going to see a modified Mini, or an addon for the Mini, that gives it PVR capabilities.

I don't really care about the iPod one way or the other. Movies are for watching at home on the big screen. The PVR however, is where it's at, and I'd buy one in a heartbeat. My wife will be the one that buys the movies for it.

Now let's see what Steve says....

Monday, September 11. 2006

Fooled by Aquisition

I'm currently reading Nassim Taleb's Fooled by Randomness. I picked it up in the investment section of the bookstore the other day, and have been reading it over the past week or so. The reason I picked it up was because I was writing a stock trading program in PyGene, and was looking for some good reference material. More on that later.

In simple terms, Nassim makes the assertion that all of us are making wild ass assumptions about the random stuff that happens in our life on a day to day basis.

I had a friend just tell me a story about an attorney friend that had a witness to a car crash fly into town to go out to the site and describe what happened. When the witness and the attorney got to the scene of the previous accident, the witness jumped out of the car and was immediately struck by an oncoming truck, killing him instantly. Talk about being at the wrong place at the wrong time.

What are the chances that this would happen to you? Just because it happened, does it mean that you "deserved" it, or that it was "fate", or that you were the direct cause of it? Unfortunately, most of us assume the answer to this question is a resounding "Yes!", when something as unlikely as this happens to us.

Nassim gives an example in the book of a lady that has won the powerball lottery twice. When asked, most people refuse to believe that this is actually possible without some type of external influence (like God for example), as it is a 1 in 4 billion chance that it would happen to themselves. However, when you run the statistics on the probability of it happening to anyone sometime in the last 30 years, it's about about 1 in 30 - a decent chance at the least, and not at all suspicious.

Now for the controversial bit.

I assert that most of the successes or acquisitions we see in the dotcom space are due primarily to raw, unadulterated luck. I'm not saying that you don't need a product to qualify you for entry into this arena, but I am maintaining that what comes after you enter is attributed to random chance.

Bill Gates' success should not be attributed to the fact that he's 100K times as smart as I am, or that he has come up with a better product than everyone else. It is attributed to the fact that he realized that he was on to something in the early days of Microsoft, and took action when adoption rates jumped. The fact that adoption jumped at all was just stupid luck. Ever hear of the Amiga?

Take MySpace for example. When Chris and Tom started MySpace, they made the assertion that people would like to have a social networking version of something akin to GeoCities. It was necessary that they develop the product and launch it, but past that, everything leading to the success of the product is attributed to being at the right place at the right time, and then executing on it after you realized you were actually in the right place. For MySpace's founders, nobody really knew if combining personal web pages with social networking would be appealing, it was just a lucky guess, and it turned out to be right.

Most site's measure of success hinges on adoption by large numbers of users - AKA The Masses. Much like the success or failure of submitted stories on Digg, the adoption rates by The Masses are analyzed after the fact, and bold claims are made about why a particular approach worked to get The Masses to adopt in the first place. Of course most of the people doing this analysis are investors, press and bloggers - all of which think they know what they are talking about. They don't, but they sound good with 20/20 hindsight vision at hand.

There are a ton of intelligent people in the world that are capable thinking up and executing on an Internet startup. However, there are only a few guys that actually make headlines with their successful exits on a day-to-day basis. These few people aren't necessarily smarter or better informed than the rest of us - they were just in the right place at the right time, and realized it when they got the inertia given to them by mass adoption.

To ensure one's own success, one needs to make assertions, execute on them, gather feedback, and then repeat until a movement is made in the right direction. With out the repetition, and awareness, you could find yourself doomed to failure before you even begin.

Of course it always helps to believe you are smarter than everyone else too.