Some good reasons for learning python

Python

President Obama thinks required programming language learning in high school is a great idea. So do I, and I think we should all start with python.

  • Writing code with it is very fast. When software engineers tell you “I can do it in 10 minutes”, in C/C++ they mean 4h, in java they mean 2h and in python they mean it.
  • You can really do anything, I’ve done some serial communication, bit level manipulation, network level event-based servers, multithreading, webservice providing and consuming, SQL and cassandra client faster than what I’ve been doing in any other language.
  • It’s easy to learn. You can start your first program right now and be good at it in 2 weeks.
  • It comes “batteries included”. You don’t have to install third-party libraries. Contrary to ruby, you don’t have to choose between the thousands of gems available, there’s almost always one official way to do things. Which leads to the next point:
  • It’s simple to read someone else’s code. This is because it’s high level language and you quickly know all the librairies.
  • It now has some IDEs. I know some people like to code in vi, but this is ugly and unproductive. Pydev is simple to install and supports a pretty good (or not so bad) auto-completion.

This leads me to two opposing ideas (but you’ll understand where I stand):

  • On a software architecture level, I think java (or C#) is the right choice for any complex or performance requiring system. IDE can really do there magic and most of the problems (mistakes, API change, etc.) are found at compilation.
  • But on a pragmatic/real-life level, I think python is especially relevant for companies (I’m really thinking about startups) who want to build and launch something quick, make it evolve easily and obviously don’t need performance. Your engineer brain might think “Yes, but it’s scripting, this sucks”. But who cares? In 3 years, your product will be probably obsolete, if not already dead and in noone’s hard-drive anyway.

If you feel python isn’t the right choice because you need to have a complex all-in-one-language architecture, you should have a look at message brokers. My favorite one is RabbitMQ. It works instantly (like any modern software should be), has client libraries in every language you can imagine and supports very interesting features like persistent queues, load balancing and replication. Load balancing means that if python happened to be a bottleneck in your system, you could just duplicate the instances and server two times more.

Opensourcing the content of this blog

Hi everyone,

During the last years, I launched the javacint google group which now has grown out to be a good community of professionnals working around the Cinterion (java enabled) chips. I also created a TC65 development document. And all the questions and feedbacks you gave me on the development around these chips helped me a lot to improve (what was) my document and (what was) my FAQ.

You helped me so much indeed that I believe this content should know be open to everyone to modify. That’s why I created the javacint wiki.

So from now on, for all your TC65i related questions and feedbacks, please go to the javacint discussion group or the javacint wiki. And please share your knownledge on the javacint wiki.

I still provide development services around the Cinterion chips through my company but I try to focus more on creating products with few partners.

xrdp and the ulimits / nofile issue

You might have noticed for xrdp on Debian (but quite possibly with a lot of other Linux tools and other Linux distributions) the user limits (described in /etc/security/limits.conf) are not enforced. Which meant in my case that any session open with xrdp was opened with a max number of open files (nofile) set to 1024.

To fix this, edit the file /etc/pam.d/common-session and add the following line:

session    required   pam_limits.so

Limiting number of connections per IP with ufw

This is a personal reminder post.

The easiest attack one can perform on a web server is opening all the connections and do nothing with it. iptables fortunately has a “connlimit” module to avoid this. If you’re using ufw like me you will want to keep your good integration with it.

In the /etc/ufw/before.rules file, after these lines:

1
2
3
4
5
6
7
# Don't delete these required lines, otherwise there will be errors
*filter
:ufw-before-input - [0:0]
:ufw-before-output - [0:0]
:ufw-before-forward - [0:0]
:ufw-not-local - [0:0]
# End required lines

You can add this to limit the number of concurrent connections:

1
2
# Limit to 10 concurrent connections on port 80 per IP
-A ufw-before-input -p tcp --syn --dport 80 -m connlimit --connlimit-above 10 -j DROP

And this to limit the number of connections:

1
2
3
# Limit to 20 connections on port 80 per 2 seconds per IP
-A ufw-before-input -p tcp --dport 80 -i eth0 -m state --state NEW -m recent --set
-A ufw-before-input -p tcp --dport 80 -i eth0 -m state --state NEW -m recent --update --seconds 2 --hitcount 20 -j DROP

This second rules might create some issues with http clients that don’t support keep-alive (is there any?).
If you want to do some benchmarks (with ApacheBench for example), you need to enable the keep-alive and set the max number of keep-alive requests per connection very high (or unlimited).
In apache config it is set with:

1
MaxKeepAliveRequests 0

Cassandra as registry

One of the biggest issue with distributed database is to find the right model to store your data. On a recent project, I decided to use a registry model.

The registry idea

The idea behind writing a registry is to have an easy way to both store and view data.

For a given device that has a {UUID} id:

  • I will access “/device/{UUID}/”.
  • Any properties will be stored in “/device/{UUID}/properties/“.
  • Deletion of the device will delete all the contents this device contains.

Classical column-families to index data

The problem comes with the data we need to index. We can store everything in a registry manner like having a path “/device/by-owner/{UUID}”:["{UUID1}","{UUID2}"]. But it’s just easier to use cassandra secondary indexes have each property of each entity written to the indexed columns of the column family.

Sample use case: file storage

So you get the basic “Registry” model. Storing file on top of that is quite easy. Then what I did is I just said files are chunks of data. So if I want to store a picture for a user, I could store like this:

  • “/user/{UUID}/picture/” becomes the path of the picture.
  • “/user/{UUID}/picture/type” describes the type of this file (“file” or “directory”)
  • “/user/{UUID}/picture/filetype” describes the content of this tile (“text/plain” per example)
  • “/user/{UUID}/picture/size” describes the size of the file
  • “/user/{UUID}/picture/chunk-size” describes the size of each chunk that we will save
  • Then we will save each chunk from “/user/{UUID}/picture/0″ to /user/{UUID}/picture/X.

Hector object mapper

I have to say I didn’t know this project existed not that long ago.

I think HOM is a much better option in pretty much all the cases. Still having a simple tree view of your data can be a very interesting feature to analyze what you are working on.

TINC – Simple P2P VPN

The world is full of good surprises.

If you joined the NoSQL gang like me, chose Cassandra to store your data and you distributed your system among different datacenters. Wouldn’t it be great to interconnect all your nodes on a virtual private network with no single point of failure? Well, TINC does just that. In fact, it does a little bit more because it’s able to establish a meshed network if hosts can’t directly contact each other (in case of a routing issue, a NAT firewall, etc).

One of the amazing things about this software is that it’s really simple to setup. I followed some setup instructions and it just worked. I didn’t have to increase the verbosity or check any log, it just worked everywhere.

Sources:

The Mystery of the Duqu Framework

Update 2012-03-25:
It turns out, it’s just some object oriented C:
Kaspersky Lab experts now say with a high degree of certainty that the Duqu framework was written using a custom object-oriented extension to C, generally called “OO C” and compiled with Microsoft Visual Studio Compiler 2008 (MSVC 2008) with special options for optimizing code size and inline expansion.
Source


If you missed it in the news, you should definitely read this: The Mystery of the Duqu Framework.

I’ve a little culture around languages and frameworks, mostly because I’ve worked with C, C++, Objective-C, C# .Net, java, javascript and PHP, but I’ve also read some things or even done few tests on languages like python, scala, erlang, caml, F#, VB or d language. It has always been a great pleasure to discover these new languages because it shows how some human beings decided to create a new way of organizing intelligence. What usually happens around these new languages (and frameworks), is that we speak about them and they get adopted by developers or nothing is done with them. But we usually speak, at least in the beginning, a lot more about the language than the projects done with it.

Here, security companies first discovered a virus (yet an other one) and then discovered it was embedding it’s own framework (and we can pretty much guess there’s a dedicated language for that one). This story uncovers a real mistery with its set of questions: why did some people decided to create a new framework? Why was it only used (or seen) in a virus? How could it be especially applicable to a virus? Why did they decided to use everything internally and not use standard C/C++ compilers?

They are few things that are very interesting in this framework: It’s a low-level framework (no standard library), but it’s totally eventful. This is quite innovative. It terms in modern sales-speech it means: very light and very scalable. “Function table is placed directly into the class instance and can be modified after construction”: You can change the behavior of any method of your object at any time. It’s quite a good idea (can easily be done in javascript thought but that’s because javascript is super-permissive).

The conclusions of this article are:

  • The Duqu Framework appears to have been written in an unknown programming language.
  • Unlike the rest of the Duqu body, it’s not C++ and it’s not compiled with Microsoft’s Visual C++ 2008.
  • The highly event driven architecture points to code which was designed to be used in pretty much any kind of conditions, including asynchronous commutations.
  • Given the size of the Duqu project, it is possible that another team was responsible for the framework than the team which created the drivers and wrote the system infection and exploits.
  • The mysterious programming language is definitively NOT C++, Objective C, Java, Python, Ada, Lua and many other languages we have checked.
  • Compared to Stuxnet (entirely written in MSVC++), this is one of the defining particularities of the Duqu framework.

Who could have done this framework? Well…
- Stuxnet, the last virus that mostly attacked Iran, was at least backed (maybe created) by the USA
- This one required the workforce of a pretty big organization (a lot of smart people put together to do evil things)
I hope we’ll discover who is behind this this someday.

Source: The Mystery of the Duqu Framework