Things About Codase Source Code Search Engine

Friday, November 11, 2005

Codase Beta 1 Released

Today, we released the Beta 1 version of our Codase search engine and service. It now contains about 250 million lines of code in C, C++ and Java. In additon to expand the code base, we have fixed many bugs, and also added several new features. These new features are based on the feedbacks we received during the two month alpha testing period.
  1. Smart query interface. Many users want a simple interface to enter all kinds of queries, and we finally come up an innovative way of doing it. Namely, we provide a simple text box, users can enter some code segments, similar to what they type everyday for their programming work. Our parser then builds an internal query consisting of classes, methods,fields and variables to feed into the search engine. This simple interface provides a very powerful way that can handle the most complex queries. You can read the help file here,

    The following are some sample queries:

    • Find any method that invokes fopen, fread and fclose calls, you can enter this query:
      fopen; fread; fclose;
      click here to see search results.

    • Find any main method that has a local variable named as driver, and invokes the getConnection method of the DriverManager class. you can enter this query:
      main() { var driver; DriverManager.getConnection;}
      Click here to see the search results.

  2. Java API documentation and sample code. We have host the Jave 2 SDK and J2EE API documentations on our site. In addition, we add links on each API to for users to quickly retrieve sample code from our search engine. This might be the best way to learn how to use the hundreds of thousands of JAVA classes and methods. Real sample code and documentation are at your finger tips.
You can read the press release news here,

Monday, October 24, 2005

Lots of Java code coming soon

Per requests from many people, we will add more than 100M lines of java source code soon on our codase search engine. This new codebase covers over 10,000 open source projects. We will also provide a browsing interface for people to easily find code sample for a specific java API.

Please stay tuned :)

Wednesday, October 19, 2005

Smart Query Parser

Our current interface provides different forms for different kinds of searches, for example, method call search has its own form, whereas variable search has another different form. This interface requires extra clicks, also a bit different from what programmers usually type everyday.

After we released our Codase service, there are feedbacks requesting us to provide a free style typing box that a user can simply type texts to perform different searches. In other words, typing into the same input box will redirect to different types of searches, such as method call, method definition, class, field definition, field reference, variables, and so on.

The good news is that we are implementing this feature, which should be ready for use in a couple of weeks. Basically, we come up with a smart query parser, so a user can type something similar to the regular code that he or she types everyday. The parser will generate different search queries and deliver them to our Codase search engine.

For example,
(1) search the main function def, one can type:
int main(int argc, char** argv) {}

(2) search fopen and fseek function calls,
fopen(); fseek;

We will try our best to implement a flexible query syntax, please tell us if you have suggestions.

Tuesday, October 18, 2005

eHub Interviews Codase

eHub: What is your web application/service about?

Codase: Codase is a new source code search engine. Rather than treating code as text, Codase understands programming languages, and treats code as code. This unique and syntax-aware approach provides the best search results compared to other services available today. With Codase, developers can search functions, classes, strings, constants, macros, comments and other programming language constructs from billions lines of validated open source code.

eHub: Why did you start this project?

Codase: As a developer, I’ve spent and wasted lots of time searching for sample code, so do many other developers we know. Developers have built all kinds of projects for others; we ought to have a good search engine for ourselves. However, all of the available general search engines such as Google, Yahoo or MSN, failed to address this need, they either don’t index enough source code, or simply search code as text and thus offer poor results. Therefore, it’s quite natural for us to start Codase once we have invented intelligent source code analysis engine and efficient xml index and search system.

eHub: How much time do you devote to its growth? Do you have a day job?

Codase: This is my day job. We work full-time on Codase, day and night.

eHub: How large is your team and what are your backgrounds?

Codase: I worked in Oracle for a few years after graduated from Stanford in 1998, with PH.D. and MSEE degrees. We have 3 people in our team, all with more than 10 years of development experience in software engineering with big name companies. Our strength are in programming languages, database and search technologies. We are graduates of Stanford and CMU. In addition to this core team, we have a few other folks helping us from time to time.

eHub: What is your design philosophy?

Codase: If there would be a better way, don’t do it now. We keep our technologies to be simple, easy to use, scalable and state-of-the-art, and we implement it in a way that is based on the feedbacks of hundreds of prototype testers. We improve the system every day.

eHub: What technologies are you currently using?

Codase: We develop most of the code analysis, index and search technologies in house, but also leverage open source stack such as Fedora and Apache. We have our own index and search system, which facilitates a way to convert flat data into hierarchy and relationships are saved as well. This way, relationship among elements can be searched in addition to keywords, whereas other search engines only perform flat keyword search.

eHub: If your project is live, what are the most requested features from your users/community?

Codase: More code, more programming languages. Right now, we have about 110 million lines of code in c, c++ and java. This is the alpha release. We will add many more code in beta release.

eHub: Does your user base reside in a primary geographic location or is it distributed?

Codase: It’s truly global.

eHub: Where do you see the project heading in the next 6 months? The next 2 years?

Codase: Over the next 6 months, Codase will evolve and become a major site for developers to easily find code snippets. With this initial focus on code, we will then embrace other vertical areas of searches on development. In the long run, I see Codase to become a significant service that developers can consult whenever they have questions with regard to their development work.

eHub: What is the greatest challenge to your success?

Codase: Time and capital investment.

eHub: What is the one thing you need to get to the next phase of the project?

Codase: Time. We are working very hard for the beta release that will cover more code and more programming languages.

eHub: Do you have a business model? If so, what is it?

Codase: Yes. In the sort time, we will release an enterprise edition that allows software companies to search for their own source code, and thus greatly improve their developers’ productivity. Over the long run, we feel confident that Codase will attract lots of users, and thus provide us opportunities on generating revenues from sponsorship.

eHub: If you’re able to disclose this information, how much traffic or usage do you see on an average day?

Codase: Since we released our alpha version on Sep-09-05, more and more people are coming to our source code search engine, and the traffic is growing.

eHub: What is the one thing you’re most proud of about the project?

Codase: Our state-of-the-art solution to the complex problem and the high quality of our service. In the area of source code search, Codase offers the BEST search results than any other search engines available today. There are several blogs comparing our source code search with others, they all vote Codase is better than others.

eHub: How would you describe the shift that’s occurring with the web right now to future generations?

Codase: I think the web will shift towards providing more accurate and relevant info with high level of interactivity we are not seeing today. More desktop applications will shift to the web with large scale and users. General search engine is good for the general purpose; more and more vertical search engines are coming out to give the users more relevant, accurate information.

eHub: What site(s) do you visit everyday other than your own?

Codase: Google for general searches and a bunch of vertical search engines for shopping, travel, music, movie, learning, Yahoo for email and news. Microsoft for .NET technologies, and Sun for Java platforms, NBA for sports, etc.

eHub: How many hours of sleep do you get a night?

Codase: 7-9 hours, I try to sleep as much as I can; this is something I don’t usually give up

Codase Launches the First Comprehensive and Syntax-Aware Source Code Search Service

Fremont, CA - Sep 9, 2005 - Codase, Inc., a leading source code search company, announced today the release of the alpha version of its advanced source code search service, with the service being made immediately available through its website, Codase - Source Code Search Engine at

Codase is a new kind of search service for open source code. Rather than treating code as text, Codase understands programming languages, and treats code as code, the way it's supposed to be. This unique and syntax-aware approach provides the most accurate and detailed search results with fine granularity levels of controls. With Codase, developers can search functions, classes, strings, constants, macros, comments and other programming language constructs.

Codase hosts huge amount of open source codes providing a much better coverage, as it covers codes usually hidden inside compressed files and source control repositories, where general search engines, such as Google (goog), Yahoo (yhoo), MSN (msft), Baidu (bidu) and Ask Jeeves, fail to find and index them. In addition, Codase only indexes and searches high quality codes with every line of code literally validated and compiled by intelligent and powerful source code analysis engine. This initial alpha release focuses mainly on Linux C/C++ code. Future releases will address other programming languages and platforms.

"Our service represents a major step towards intelligent and quick retrieval of useful and relevant information from the massive amount of unstructured open source codes." said Dr. Huihong Luo, founder of Codase, "By using our service, developers can fully take advantage of existing work and thus improve their own productivity for development."