Interview with Powerset

Steve Newcomb is the founder and COO of Powerset. They are "The startup" that is going to revolutionize search engines as we know it through their breakthrough technology based on natural language processing.
If you haven't already, to get a first hands-on look at Powerset before their official launch, sign up for PowerLabs from their homepage.
I look forward to Powerset going mainstream with your September launch, changing my home page to www.powerset.com, and exploring long-term career opportunties in Powerset.
In July 2007, your ranking feature was still in development, can you please comment on the progress you’ve made?
Sure. One of the things we try to do here at Powerset is be very fair and open to the communities that we’re talking to. To give you an example, we recently invited approximately 40 people on our demo day and started off the meeting by saying, “ask any question you want and we’ll try to answer it. We won’t try to spin it, avoid it, or dodge the question.” There’s definitely some things that we can’t talk about because it’s our secret sauce type of stuff, but I can certainly address the ranking question.
First, it’s really important to understand how ranking works. For keyword-based search engines, pages are ranked using a number of criteria and features: PageRank [link graph analysis], keyword frequency, and keyword proximity, just to name a few. Google was clearly the innovator in that area. Significantly better ranking, in large part based on link-graph analysis, was one of the core inventions that they brought to the table. As a result, it made them better than their competitors in the search space.
Years later, Powerset is changing, through semantic analysis, the way ranking is performed, amongst other things. In fact, although, we are different in our approach, at the end of the day, the core of every search engine is an index, and pages retrieved from that index must be ranked in order to be returned to a user.
To build the index, you crawl the web and transform the text in information that is stored and optimized for fast access. At query time, you select the pages from the index that are potentially good matches (for a traditional search engine this is all the pages that contain the query keywords).Then, you use ranking algorithms on that information to arrive at the “rank”, which is the order in which you show them to the user. One of the important things to note in that process is that whatever information found in your index can be used to compute your rank.
The way that we store data in our index and compute rank is thus quite different compared to a keyword index.
To compute the ranking function, Powerset uses all the factors found in keyword-based searches plus our own unique linguistic and semantic features. This gives us a number of extra features that we can rank on beyond standard criteria that you can find for keyword ranking. It is by building a more elaborate [than simple keyword sequences] model of the information contained in textual documents that you can achieve better ranking, especially when there is more context, more linguistic information in queries themselves. But we can really take advantage of the best of both approaches, simple keyword-based features and semantic features, to deliver better relevance. We have a lot more data to help us out with ranking.
This is quite fundamental. The primary value of our system lies in the fact that we capture more data in the index. We can have better recall and precision, as well as better relevance through ranking, because we have a more elaborate model of the information on the web.
Ultimately, that is how we derive better quality search results.
Do you foresee privacy concerns with Powerset’s semantic web approach? If so, how are you addressing those issues? That is, if Powerset is a “better” search engine, is it then easier to find private information?
If some website is publishing private data and any search engine is then indexing it, then it is possible to find that data. Every search engine uses techniques to not bring back private data like social security or credit card numbers. We don’t track or record or index any specifically private information and we recognize the privacy issues related to user queries.
In essence, while I think privacy is a big issue, I do not think we’re any different from any other search engine on how we would approach it.
Where do you see Powerset in 3-5 years in the search space?
When we founded Powerset, one of the reasons why we and investors were so excited about it was because we were looking to create a company that was going to focus on building the next decade or two of search. We were addressing the fundamental question of what the next big breakthrough in search was going to be about. I believe Google did the same thing when they got started.
When Google looked at search, it said, “the core of search is broken”. Directory-based ways of doing search were limiting in so many ways. They saw algorithmic keyword search as being the next decade of search similar to how we see natural language semantics and machine learning approaches as being the next decade of search. We strongly believe that is where the next leap will be.
We look at keyword search and believe that nobody does that better than Google. And probably nobody will do better than what Google has accomplished for keyword type searches. But that ladder can only go so high. The other ladder that will take over and go much higher is about natural language, and about using a combination between machine learning and semantic approaches.
Powerset is working on the type of search that is going to dominate for the next decade or two in this space of search. We are the company that has the most resources, the most money behind it, and is the most focused on just that.
We are singly focused on changing the core of search yet again, where the other large search engines out there are focused on hundred of peripheral things at the same time.
That’s one benefit that we experience here at Powerset - everyone is focused on the exact same thing.
You ask where we’ll be in 3 – 5 years in the search space. Will we be out of business? Will we be a mediocre player? Will we be a leader? Will we be dominant? Certainly, we focus on these questions somewhat, from a business development perspective, since we do want to capture our audience, but at the same time, we are all here to make search better.
We have a really good feeling based on our current comparative results that we are going to come out as the dominant leader in this space by combining keyword approaches with semantic approaches to produce better ranking.
We are already how much better search results can be from these very early stages, and we believe it’s the tip of a giant iceberg.
What has been Powerset’s approach to hiring talent and building a great team?
“Talent begets talent.” An “A” player will also find other “A” players that they know in their peer groups. If you hire a “B” player or “C” player then you will get other “B” or “C” players on your team. We concentrate on only hiring the best of the best right from the beginning. You can see this in our teams. We have hired people that are well known in the search community, well known in the NLP community, well known in the data center community, or well known in the product and consumer experience community.
Two really good examples were Ron Kaplan and Chad Walters.
We knew that we needed to build a natural language team that was just the best in the industry so we hired Ron Kaplan who had been the head of Xerox’s natural language group for 35 years. He was the inventor of the Xerox language technology that we have licensed. The German newspaper Der Spiegel has dubbed him the Albert Einstein of Computational Linguistics. Every single person in the field of natural language processing knows who Ron Kaplan is.
Similarly, on the search side, we hired Chad Walters who was one of the lead search architects for Yahoo runtime. He was an individual that many search engine engineers have come to know and respect. He develops very elegant code, has a conservative mindset about what can or cannot be done, and he really knows what he’s doing.
When you hire people like that, you start bringing in other “A” players into the groups. The best part is that for a long time we have not had to hire an outside recruiter nor have we posted our jobs on job boards.
However, we have just hired an in-house recruiter, and she also handles our recruiting for PowerLabs. She does all types of recruiting, be it recruiting for employees or for people for our beta testing programs.
But most of the people that we hire are coming from people who know people who know people, and that’s how you build a very strong team.
It’s hard to imagine Google won’t see you guys as a threat. How do you think Google will respond to your launch after September? Do you see an acquisition as a possibility to eliminate the competition, if you will?
First of all, we have the utmost respect for Google and what they have accomplished. We feel that we are on the same side as anyone within the search industry who is trying to make search better. It is rare here that we see Google as the enemy.
We actually get quite excited, when we hear, for example, that Google is trying to change their message from “data, data, data, more data” to more model-based approaches. They are building their natural language teams. In fact, because our DNA is so intrinsically tied to the NLP community, we know many of the leads of their natural language teams very well.
That said, I think it’s going to take a long time for them to change anything about their core index because after all, that’s where they derive their fundamental market value from. However, I do expect them to use some techniques in natural language. In turn, that has a couple effects on Powerset.
We are already seen as a dominant player in natural language search. Our name is sort of synonymous with natural language search. If Google gets into the natural language search space, number one, it validates our belief that natural language is going to be the next ten years of search. Now, that may or may not mean there is an acquisition opportunity with Google. It may actually amplify the other search key players in the industry to say, “oh my, if Google is getting into natural language search then so should I and that sort of thing. “
But I would say that all the key search engines are starting to invest heavily in natural language search. I think that over time, they will start to see and understand what Powerset is doing. There’s always a lot of confusion about what we do exactly, and how we are different than anyone who’s tried to do this in the past.
I see that in five years, people will adopt this combined approach between machine learning, symbolic natural language processing , and keyword indexing. What that means in terms of our exit, whether we go public or we get acquired by somebody, I don’t think that’s something we should necessarily concentrate on or analyze right now. We are focused on building a superior product.
We are trying to maintain close relationships with each of the search companies out there and it’s quite common for some of them to talk with us on a weekly basis. In fact, some of those whom most people would consider our competitors were in just this week to come and chat with us.
What has been some of Powerset’s greatest obstacles to building your product?
I think that there were a couple of fundamental risks we had to consider. There was technology risk and there was execution risk.
From a technology risk perspective, the biggest obstacle that we’ve overcome was the threat posed by our cost of indexing. Building a semantic index is many times more expensive than building a keyword index. In fact, until recently it was financially impossible to use our same technology to build a search engine. Just a few years ago, all the computing power in the world wouldn’t have been enough to process that much information, as deeply and as fast as we need it to be processed. The cost of doing that, measured in the number of machines that you would have had to buy to index the web, would have made it impossible to build a sustainable business. We’ve now broken through, making it financially feasible to build a search engine based on our technology. We have been successful in getting it to where we can parse sentences fast enough to make it possible to do this at large scale.
The second thing is actually physically scaling out. How do you get thousands of cores working at the same time to build a semantic index?
How do you scale technology built for a million document index to process 10 million, then 100 million, 500 million, a billion documents? How do you deal with spam? How do you build a crawler? How do you build a content repository? How do you build the indexer? How do you distribute the index at runtime? Some of these are known challenges and we have built a team that addresses and reduces this risk.
From an indexing perspective, we feel like we’re there. And right now, we’re working on our core runtime technology, and that’s going to be the next big hurdle we have to cross. But we have a pretty good team under Chad Walters to address that obstacle.
How many people consist of the Powerset team today?
We’re 72 people right now.
Message to leave with:
One of things that nobody really talks about Powerset is the things that we’ve done that are breaking the mold that are outside our technology and outside our space of search.
This is my fifth startup so I’ve been here a few times before and Barney’s been around the block as well. Barney, Lorenzo, and I set out to really break the mold in several different ways in terms of how we create a company with the hope that other companies copy us.
For example, we take care of everybody’s health care. As a startup, that is almost unheard of. We have great commuter programs. Also, we pay for everyone’s commute tickets and we pay for EVDO cards so everybody in the company can always be online at all times. We have a meeting every Thursday where we get the entire company in a room and they are allowed to ask any question that they want to the founders and the founders have to answer the question, transparency like that is very important to build culture.
The type of stock-programs that we give employees are very innovative. They are the very first of its kind in their nature. We make our compensation packages very unique in their structure so when people come on board they say, “wow! that’s different. I’ve never seen that before. It’s wonderful!”
We’re trying to create a great culture, and sort of change the world one little bit at a time. Already, we’ve had 15 companies copy our exact infrastructure. In fact, we’re going to be releasing our non-secret sauce operational models with the intention of helping other startups.
We’re releasing our financial models and data center models to the public, as well. We are sharing who we get our HR from and who we get our accounting from? How are we doing our recruiting? Everybody always asks us, “who is your press agent?” Who is your press company? Who is your marketing company? I want to hire them.
And the answer is we didn’t hire anybody. There is no press agency that represents Powerset.
One of my goals as an entrepreneur has always been to help other entrepreneurs. Actually, between company three and four, I left the venture world for two years and did nothing but non-profit work. It was designed to help out entrepreneurs. I helped to create the “VC breakfast club” where every Thursday of Silicon Valley, we bring in 6 or so engineers with great ideas and introduce them directly to general partners and VC’s. They get to pitch to VC’s and we help to evaluate them.
In our second year, I was the Managing Editor of a book entitled, “The Entrepreneur’s Workbook.” I went out to entrepreneurs that are engineers that aren’t used to creating companies and asked, “what are your basic questions on how to create a company?” Their questions were like, how do I rent an office? How do I create a stock option plan? What is a cap table? How do I incorporate?
Then, we went and took all these questions and we had all the lawyers, VC’s, and service companies write a chapter of that book. Now, that book is out in over 10k copies.
I’ve always felt that engineers have the best ideas and putting great business people with them is the trick. That should never be information that is held tightly within the company, but should be shared freely and as much as it’s humanly possible. So we’ve actually been doing that at Powerset, just sharing with the people how we do it.





