Thursday, December 26, 2013

Maintainable python code

The benefit of dynamically typed languages is the ease of writing the code but the cost of this is the problem with its understanding.
Such heavily typed languages as Haskell and Scala make use of comprehensive static type system and compiler that does all dirty job. At any given point of code one knows for sure that this particular variable is of this type and this function has such return type, etc. So basically one can understand what is going on. In python with its duck typing one can pass to function any object that adheres to some contract, this function can pass it further and further, add/remove some methods on the fly, etc., etc. So looking at some piece of code where you see variables and function applications one can barely understand it and lose track on what is going on. A static analyzer can help to some degree, see for instance PySonar, a Deep Static Analyzer for Python:
Treatment of Python’s dynamism. Static analysis for Python is hard because it has many dynamic features. They help make programs concise and flexible, but they also make automated reasoning about Python programs hard. Fortunately, some of these features can be reasonably handled. For example, function or class redefinition can be handled by inferring the effective scope of the old and new definitions. For code that are really undecidable, PySonar uses a universal honest answer: “I don’t know.” Well, not quite so. It attempts to report all known possibilities. For example, if a function is “conditionally defined” (e.g., defined differently in two branches of an if-statement) and the condition is undecidable, then PySonar gives it a union type which contains all possible types it can possibly have. By doing that, PySonar reduces false negative rates.
Sidenote, Scala has duck typing via structural types but their usage in general is not recommended because implementation uses reflection that is slow. But indeed Scala structural typing is type safe in contrast to python, see Structural typing vs. Duck typing.
There is an example of dynamically typed language that doesn't suffer from code readability problem - Erlang. One always knows what comes in and what comes out (and as a result what is in each line of code). It doesn't have some comprehensive type system except records aka structs in C. But it has Function Specifications and dialyzer. Unlike python when you call the function in Erlang you pass not some object that has incapsulated state and exposed behavior but just plain data, the input data format is defined in function spec along with return data format. One doesn't need to pass behavior because it is incapsulated in some other lightweight process pid of which you may pass within the data. Because of such elegant/specific implementation of incapsulation and polymorphism Erlang solves problem with readability.

So, while pysonar is promising can one still do something easier and better? Unit tests? Good to have but covering each function is too much. Docstrings? Too informal. After a while I hit the article Making Wrong Code Look Wrong. I ended up with simple idea: name each variable/function in a way everybody understands what type it has/returns (the same for function args).
Simple example. Having following information aside
user variable has type model.User
userid is a user id, has type int
It is easy to get idea what line below does indepedently on where in code you see it
user = user_by_userid(userid)
If you present this information on variables/functions in some formal way, IDEs/static analyzers can also put warnings on variables that do not have such spec and on expressions/statements that just look wrong (see again article by Joel Spolsky), navigate to type definitions, show variable/function descriptions upon hovering, etc.

No comments:

Post a Comment