Blog Posts

How to parse code-like elements from free-form text?

Partial programs such as uncompilable incomplete code snippets appear in discussion forums, emails, and such informal communication media. A wealth of information is available in such places and we want to parse such partial programs from informal documentation. Lightweight regular expressions can be used based on our knowledge of naming conventions of API elements or other programming constructs. Miler is a technique based on the regex idea. But Miler’s precision is only 33% and varies based on programming language.

Practical challenges in computing recall

Most experiments are designed on controlled corpus i.e., the precision and recall of the corpus are already known either manually or through some other means (not the same as the experimental tool/automation itself). Thus, these are smaller samples of the real corpus. An Oracle can now be implemented to compute recall. Sampling works in most cases. However, it has its own limitations too. For example, samples can suffer from a serious threat to validity. With another sample, the results could be different. Creating several large samples in several circumstances could be infeasible.

Searching for UI

Over the last few months, I have been studying code search. A beautiful application of Code Search is the work done by Steven P. Reiss of Brown University on searching for user interfaces. He searches for the UI structure and APIs using Java and Swing/AWT knowledge. Further, he applies some transformations to avoid duplicates and score the search results. Basic idea is to extract UI code from the search results (from Ohloh, GitHub, etc) and build a new class file with standard identifier naming conventions. More transformations are applied to clean the code.


Links to tools created by the group

Get in touch with us

Education - This is a contributing Drupal Theme
Design by WeebPal.