CloMan: A Clone Management Tool

We have been developing an IDE-based code clone management system to flexibly detect, manage, and refactor both exact and near-miss code clones. Using a k-difference hybrid suffix tree algorithm, we can efficiently detect both exact and near-miss clones. We have implemented the algorithm as a plugin to the Eclipse IDE, and have been extending this for real-time code clone management with semi-automated refactoring support during the actual development process.
Stacks Image 47

The current prototype of the clone search tool is available here.

Code Clone Refactoring Scheduler

Duplicated code, also known as code clones, are one of the malicious `code smells' that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. We propose a refactoring effort model, and apply a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

The OPL implementation of the code clone refactoring scheduler and the normalized data can be found here.

Genealogical Study on Clone Removal

An empirical study based on the clone genealogies from a significant number of releases of open-source software systems, to characterize the patterns of clone change and removal in evolving software systems. For use in this work, we have made significant extension to the basic gCad (clone genealogy extractor), which was originally developed by Ripon K. Saha.
Stacks Image 64
The extened version of gCad can be downloaded from here. This distribution also inlcludes the NiCad-2.6.3 clone detector that is necessary for gCad's operation.
A Study on API Usability

Software development today has been largely dependent on the use of API libraries, frameworks, and reusable components. However, while writing client code using the APIs, the developers often face difficulties, which increase the development cost (e.g., time, effort) and lower code quality. In this regard, we study 1,513 bug-posts across five different bug repositories, using qualitative and quantitative analysis including topic modelling technique.

This work makes three main contributions. First, we identify the API usability issues that are reflected in the bug-posts from the API users, and distinguish relative significance of the usability factors. Second, from the lessons learned by manual investigation of the bug-posts, we propose recommendations for designing APIs with better usability. Third, we demonstrate how topic modelling techniques can be applied for concept localization in the bug-reports, and explore avenue for automating similar studies in larger scale.
ZibJana: A Sensor Fusion Framework

ZibJana is a localization application we developed (joint work with Farjana Zebin Eishita) on htc magic smart phone running Android 1.6 OS. ZibJana collects sensor information from smart phone’s built-in GPS receiver, camera, and accelerometer. Then applying Kalman Filter it combines those data from different sources to obtain smart phone’s location estimation more accurately than what is obtainable from GPS only.
Stacks Image 81
MSched: A University Course Timetabler

M-Sched is a university course timetabling software. It takes into account available resources (teaching staff, classrooms, courses, etc.) and associated constraints and preferences in producing a feasible timetable optimizing the utilization of those resources. MSched applies a multiphase approach to solve the timetabling problem. The entire timetabling problem is decomposed into several sub-problems, and each subproblem is mathematically modeled using constraint programming (CP) or Integer Programming (IP) techniques. Each of these models are solved in separate phases and the overall timetable is generated by accumulating solutions from all these phases.
Stacks Image 92