Home
Self-Similar's Journal
 
[Most Recent Entries] [Calendar View] [Friends]

Below are the 20 most recent journal entries recorded in Self-Similar's LiveJournal:

    [ << Previous 20 ]
    Friday, July 4th, 2008
    7:47 pm
    [alexeilebedev]
    I just realized that when you sue a Doe (i.e. a defendant whose identify is unknown), it's like using a future in programming to reduce latency.

    Since the legal process is supposed to be independent of the defendant, we can go through the entire trial and even sentencing, and only adjust minor details later when the identity does become known.
    Thursday, June 5th, 2008
    3:52 am
    [alexeilebedev]
    Programming Contest
    For those who like a challenge...

    Here is one. Write a function with the following prototype:

    void FormatInteger(char *c, int i) {
    }


    c is a buffer that's large enough to hold the largest integer.

    The integer must be printed out in decimal (not hex).

    The winning entry is the one that performs the fastest. Post your entries in comments. I will test them and announce the winner.

    Comments won't be screened, but in the interest of fairness don't look inside.

    P.S. be creative
    Wednesday, May 28th, 2008
    12:06 am
    [alexeilebedev]
    Opteron 285 context switch twice as fast as Xeon E5345
    Here is a test I ran under Windows XP 64-bit on two machines: one with 2 dual-core opterons @ 2.6 Ghz, the other with two Xeon  E5345 (Core quad @ 2.33 ghz)

    Two thread, each sits in WaitForMultipleObjectsEx with an infinite timeout.

    We call QueueUserAPC(threadA, funcA)
    funcA() {
      QueueUserAPC(threadB, funcB);
    }
    funcB() {
       QueueUserAPC(threadA, funcA);
    }

    This basically measures context-switching speed. When run in a single thread on the AMD machine, each function gets invoked about 550K times/sec.

    With two threads, the AMD machine does 140K calls/sec for each function.

    The Intel machine can only pull in 62K calls/sec.

    Now, this is a very realistic kind of load for a multi-threaded app. A real-life app is not spending all of its time doing integer math, but in fact it does do a lot of context switching, if only to/from the OS. When every 62 context switches eat up a whole millisecond of your time, it's very hard to build something both multi-threaded and fast.

    By the way, in case you wonder, this IS the fastest way to force a context switch that I know of. I tried passing a 1-byte token between two anonymous pipes (i.e. each read does a blocking read of 1 byte, and then writes it back for the other thread to receive). The results are not inspiring -- 80K tokens each thread on AMD, even fewer on Xeon.

    More numbers: Dual AMD 285 (2x dual core, 2.6 ghz), Windows 32-bit: 105K calls/sec.
    Dual Xeon X5365 (2x quad core, 3.00ghz): 101K calls/sec

    The conclusion: something about Intel processors really makes context switching slow.
    Another conclusion: whenever possible, use x64 code. It's not the "large memory" benefit, it's the extra 8 registers that will make your code fly.
    Friday, November 16th, 2007
    6:58 am
    [alexeilebedev]
    Monday, October 22nd, 2007
    7:25 am
    [alexeilebedev]
    Fonts for programers:

    http://www.lowing.org/fonts/

    My favorite is Monaco. I still remember the look of Pascal source code printouts made using this font back from when I learned programming on a Mac in 1989.
    Monday, August 20th, 2007
    11:25 am
    [alexeilebedev]
    For those who don't think Microsoft is bloated, here is an example of a new API element, fully documented in five programming languages (Visual Basic, C#, C++, J# and JScript). First, its location in the hierarchy (12 deep):

    MSDN -> MSDN Library -> Development Tools and Languages -> Visual Studio 2005 -> Visual Studio -> Integrated Development Environment -> Customizing and Automating the Development Environment -> Automation and Extensibility for Visual Studio -> Automation and Extensibility Reference -> Visual C++ Automation -> Microsoft.VisualStudio.VCCodeModel -> MFCDialogStringVariableExtenderInterface

    So, what does it do? It's a special object that solves an important problem... It gives you access to the "maxChars" property of an edit box!
    Wednesday, March 14th, 2007
    2:51 pm
    [nchaly]
    So, what about Self-Similar itself?
    Studied all message from the beginning of the community. Looks like many things changed from 2002. For example, all old links to self-similar web site are weak (for example, http://www.self-similar.com/mtptr.html or http://www.self-similar.com/test_temporary.cpp). However, there is lagre amount of nice photos there, liked them.

    I'd propose summarizing some ideas on self-similar from time to time. That can be useful for persons who join community - they will be not forced to read all messages. And - we really can start now, if this is possible.

    "The following summarizes self-similar's features: it is a C++-like compiled self-extensible language with reflection and meta-programming".

    C++-like and compiled: that is clear.
    Self-extensible: does this mean some kind of dynamics? For example - addind/removing class methods in runtime.
    Reflection and metaprogramming: encoundered a couple of such thoughts - meta class of garbage collector and ref. counter; named function parameters.

    Also there are many message about editor/viewer features: background code check, visualization during debugging, show assembly insructions, etc.

    By the way, what editor do you use?
    Friday, February 16th, 2007
    1:15 pm
    [alexeilebedev]
    Why does pseudo-code have to be lame?
    Whenever programmers write documentation, they sooner or later run out of the precision available in the English language, and start using pseudo-code to increase rigour of the description. For example, in the ECMAScript manual, you find pseudo-code of the following kind:
    Date.prototype.setYear (year)
    NOTE
    The setFullYear method is preferred for nearly all purposes, because it avoids the “year 2000
    problem.”
    When the setYear method is called with one argument year the following steps are taken:
    1. Let t be the result of LocalTime(this time value); but if this time value is NaN, let t be +0.
    2. Call ToNumber(year).
    3. If Result(2) is NaN, set the [[Value]] property of the this value to NaN and return NaN.
    4. If Result(2) is not NaN and 0 ≤ ToInteger(Result(2)) ≤ 99 then Result(4) is ToInteger(Result(2))
    + 1900. Otherwise, Result(4) is Result(2).
    5. Compute MakeDay(Result(4), MonthFromTime(t), DateFromTime(t)).
    6. Compute UTC(MakeDate(Result(5), TimeWithinDay(t))).
    7. Set the [[Value]] property of the this value to TimeClip(Result(6)).
    8. Return the value of the [[Value]] property of the this value.
    


    I hope it is obvious to anyone reading this code that the above is simply a transcription of the actual implementation into pseudo-English. What is it about the original code that makes it unsuitable to use as documentation? Why should we prefer BASIC-like numbering of lines, as if this rigidity equals rigour?

    Why is "2. Call ToNumber(year)" and subsequent references to "result" better than saying "integer_year = ToNumber(string_year)"? Is "return the value of the [[value]] property of the this value" readable at all?

    What I'm suggesting is the true virtue of pseudo-code is its informal nature, not the property of not looking like any language in existence. C++ code works perfectly well as pseudo-code. However, C++ contains lots of details that hinder comprehension, details like casting, memory allocation, etc. So when you convert an algorithm to pseudo-code, simply strip those non-essential details that only the compiler needs. There is no need to fight curly braces, which do a good job of grouping statements, or replace "return this.Value" with "return the value of the [[Value]] property of the this value".

    If the 8 steps are hard to read (and they are, because we're not used to reading strict-English), then how about 26? Here is an excerpt:
    20. Let R be a new string value computed by concatenating the previous value of R and S.
    21. Increase j by 1.
    22. If j is equal to L, go to step 25.
    23. Go to step 18.
    24. Let R be a new string value computed by concatenating the previous value of R and S.
    25. Increase k by 1.
    26. Go to step 4.
    


    At this point, you can see the ridiculousness of the entire approach -- they took an implementation of JavaScript, translated it into an unreadable mess and enshrined as the standard. We're gone from a high-level implementation to assembly in the name of clarity. No one is actually going to implement these algorithms using gotos, so why use them in documentation?

    The original document: http://www.ecma-international.org/publications/files/ecma-st/ECMA-262.pdf
    Thursday, February 8th, 2007
    4:02 pm
    [alexeilebedev]
    Most people think that dividing by a power of two is free.
    That's true, but not really. Dividing unsigned numbers by powers of two is free. With signed numbers, -10 >> 2 == -3, not the same as -10 / 4 (-2).

    There are two useful types of shifts -- where the "new" bits become zero, and where the new bits take on the value of the highest bit of the original (this is called an arithmetic shift, because it keeps the sign of the number). Now, clearly we need to use the arithmetic shift for negative numbers. But this means that the number will always be at least -1, because there is no way to get a zero.

    So the compiler generates a few extra instructions:

    // dividing EAX by 4
    cdq           ; EDX = EAX<0 ? -1 : 0
    and edx, 3    ; EDX = EAX<0 ? 3 : 0
    add eax, edx  ; EAX += EDX
    sar eax,2 
    
    // compare this with the unsigned version
    shr i,2    
    


    The unsigned version version is 60% faster.

    Also, think about this: you have implicit division going on when you're subtracting pointers, so when you have some pointer X to the middle of an array, and you want to know its index, use this: (uint_ptr(X)-uint_ptr(base))/sizeof(T) instead of (X-base)
    Saturday, January 27th, 2007
    9:03 pm
    [alexeilebedev]
    WindowProc thunk on AMD64
    I recently switched all my code to AMD64.

    If you have a 64-bit processor, such as Opteron or one of the new Intel chips, you should too, because you will get an automatic performance boost of about 35%. The reason is that twice as many registers are now available for integer arithmetic (and for floating-point, too). The first four arguments of any function call are passed via registers, too, so memory traffic is reduced. Everyone keeps quoting the 2gig limit as the reason to switch, but I found that my programs never need that much -- I tend to write real-time software, which tends to be cpu-hungry and memory-efficient.

    One of the problems I encountered was with my WindowProc thunks. WindowProc thunks get created when you want to install your own WindowProc, but want the callback to invoke a member function. The question is where to store the 'this' pointer? MFC stores it in a global variable. A better solution is known -- allocate a piece of memory on the heap, put some assembly in it that has 'this' as an immediate value and invokes your function. On x86 this worked beautifully, because the standard WindowProc is stdcall, and all methods are thiscall. No registers are used by stdcall, and ECX is used by thiscall to pass 'this'. So all the thunk had to do was store 'this' in ecx and jump to the new location.

    On AMD64, all calling conventions are replaced by one new convention -- the Microsoft x64 convention (see wikipedia). The convention says that the first four arguments are passed via registers, RCX, RDX, R8 and R9. So it's no longer possible to create a universal stdcall->thiscall thunk, because the arguments would have to be shifted around, and the universal thunk doesn't know what arguments there are and what sizes they have. I came up with this solution:

    struct WithWindowProc {
    	virtual LRESULT WindowProc(HWND hwnd, UINT msg, WPARAM wparam, LPARAM lparam)=0;
    };


    Now we write the thunk that we want in C++:

    LRESULT WindowProcThunk2(HWND hwnd, UINT message, WPARAM wparam, LPARAM lparam) {
     	WithWindowProc *that = (WithWindowProc*)0x0f0f0f0f0f0f0f0f;
     	return that->WindowProc(hwnd,message,wparam,lparam);
     }


    Disassemble it and steal the bytecode:

    uint_8 thunkcode[] = {
    0x4c,0x89,0x4c,0x24,0x20,0x4c,0x89,0x44,0x24,0x18,0x89,0x54,0x24,0x10,0x48,0x89,0x4c,0x24,0x08,0x48,0x83,0xec,0x48,0x48,0xb8,//25
    0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x48,0x89,0x44,0x24,0x30,0x48,0x8b,0x44,0x24,0x30,0x48,0x8b,0x00,0x48,0x8b,0x4c,0x24,//50
    0x68,0x48,0x89,0x4c,0x24,0x20,0x4c,0x8b,0x4c,0x24,0x60,0x44,0x8b,0x44,0x24,0x58,0x48,0x8b,0x54,0x24,0x50,0x48,0x8b,0x4c,0x24,//75
    0x30,0xff,0x10,0x48,0x83,0xc4,0x48,0xc3};//83


    At offset 25 of this array, where you see eight bytes all equal to 0x0f, you should place a pointer to the WithWindowProc interface. You can then invoke the thunkcode as it it were a windowproc (with four arguments and all), and it will re-invoke the object that you want. The code is larger than it should be, because Microsoft's CL for some reason spills the four registers onto the stack, even though that can be avoided.

    Another thing to watch out for is that the thunkcode may be located on a non-executable page. You can steal a simple linked-list allocator from a file called atlthunk.cpp somewhere in the visual studio 2005 tree. This allocator uses VirtualAlloc to make executable pages and returns memory from those pages.
    Saturday, December 23rd, 2006
    1:07 am
    [alexeilebedev]
    Ideas for a code editor
    Programming environments of the future will blend the line between compile-time and run-time.

    When you are editing a module that comes with its own tests, the editing enviroment can not only compile the code in the background (to translate compiler warnings into markup), but also run it repeatedly in a sandbox, displaying the results of the run in various elegant ways.

    For example, assertions that fired during the run can be highlighted by the editor. Just imagine, you write an assertion and your text editor immediately flags it as invalid!

    Another feature is real-time profiling. The background of each expression is shaded in proportion to the time spent in it. Alternatively, each line is annotated with its relative weight (again, in order to make this impressive, it has to be done while you're editing the code).

    Again, imagine this: you are writing a loop, and it's shaded gray because it has quite a few cycles in it. But then you add a nested loop, and that one turns positively dark. This will make people avoid N^2 algorithms right in the editor!

    (Obviously, the sandbox in which the test code runs should silently prohibit side-effects such as establishing network connections and formatting disk drives)
    Sunday, September 3rd, 2006
    4:35 am
    [alexeilebedev]
    Large Swap
    How do you swap two arbitrary objects?
    Like this:
    template<class T> inline void Swap(T &a, T &b) throw() {
      struct u {char s[sizeof(T)]; };
      u tmp = (u&)a;
      (u&)a=(u&)b;
      (u&)b = tmp;
    }
    

    If your objects store pointers to themselves, they are considered unmovable, and cannot be swapped.

    How do you swap two large structures? Not using swap, because it makes more sense to swap the objects one cache line at a time.
    The following example illustrates:
    // TSwapLarge: Faster than TSwap on larger structures
    
    // 16 bytes:
    // TSwap 1000000000 times: 5.33238 secs
    // std::swap 1000000000 times: 5.38049 secs
    // TSwapLarge 1000000000 times: 5.29612 secs
    
    // 100 bytes:
    // TSwap 100000000 times: 15.3282 secs
    // std::swap 100000000 times: 15.294 secs
    // TSwapLarge 100000000 times: 4.88687 secs
    
    // Note that TSwap is orders of magnitude faster than std::swap on classes that have
    // constructors/destructors, because STL assumes all objects to be unmovable (a very damaging, in terms
    // of performance, assumption)
    
    template<class T> inline void TSwapLarge(T &a_, T &b_) throw() {
            if (sizeof(T) > 16) {
                    T *a=&a_, *b=&b_, *end=(T*)((uint_8*)a+sizeof(T)/8*8);
                    do {
                            struct u {char s[8]; };
                            u tmp = (u&)*a;
                            (u&)*a=(u&)*b;
                            (u&)*b = tmp;
                            ((u*&)a)++;
                            ((u*&)b)++;
                    } while (a < end);
                    {
                            struct u {char s[sizeof(T)%8]; };
                            u tmp = (u&)*a;
                            (u&)*a=(u&)*b;
                            (u&)*b = tmp;
                    }
            } else {
                    struct u {char s[sizeof(T)]; };
                    u tmp = (u&)a_;
                    (u&)a_=(u&)b_;
                    (u&)b_ = tmp;
            }
    }
    Monday, August 21st, 2006
    6:24 pm
    [alexeilebedev]
    Artifacts of lack of typing in Scheme
    When a language has strict typing, you basically think about what types your operation should work on, and never concern yourself with anything else. After all, none of the other types except the ones you explicitely specified, can be passed to your function.

    Not so with dynamic languages. A dynamic language is a language where the most that can be said about any symbol is that it either references an object or is undefined -- you can never tell, other than when the program is running , what type of object is represented by a symbol. In a dynamic language, any symbol can assigned an expression of any type at any time.

    So, in a dynamic language, most of your functions become so-called generics -- they work on as many types as possible, because there isn't a convenient way to reject any types to begin with. So if a certain type ALMOST work with the function, but not quite, you usually extend the function to support it.

    As an example, Scheme has a notion of a list. A "list" is defined a node of a tree (called "cons"), with two variables, left (called "car") and right (called "cdr"). These variables can be anything, because Scheme is a dynamic language, but typically, left contains a value, whereas right contains a pointer to another list.

    Now, because there is no notion of a strict list (i.e. a linked list of values), sometimes the last element of a "list" contains two values, instead of a value and a pointer. But since there are no assertions, no preconditions or postconditions (all notions from other, stricter languages), there is no way to detect that the list is "improper" other than by traversing it. So of course nobody bothers to traverse these lists just in case the last element is funny, so they just call them "improper lists" and try to support them as much as they can.

    An "association" list, or a list of pairs, is again defined by its structure. There is no way to check, or assert, if something is really a list of pairs. You just construct one and keep this fact in mind as you go along.

    It's interesting to read documents such as this from a perspective of object languages. So many functions float in the global namespace, seemingly without a good home. For example, the global function length returns the length of a list. But to compute the length of a vector, you use vector-length. In an object-oriented language, you'd make the function local to the object can call object.length; An appropriate version of the function would be selected based on the underlying object type.

    Operator /, when used with one argument, is defined to return the inverse of a number: (/ 3) = 1/3
    Much the same way as (- 3) = 0 - 3
    I'm not sure what the rationale is, why even support this shortcut, but it might have to do with argument checking (or lack of it)

    All fans of Scheme, including the authors of R5RS (Revised 5 times Reference of Scheme) will tell you that Scheme "supports" object-oriented style of programming, along with many other styles. They'll even "prove" it by arm-wrestling association lists into a construct resembling a class. Overall, this claim is a gross exaggeration. Scheme supports OO much in the same way as C supports C++. You can certainly write OO-ish programs in Scheme, but you don't get any of the benefits.
    Wednesday, May 31st, 2006
    5:16 pm
    [alexeilebedev]
    A challenge
    OK, so I haven't posted anything in more than two years, but that doesn't mean that I forgot about this journal! I just had nothing to post.

    Here is a little problem I've run into when writing some multithreaded code. I've solved it, but as I did, I realized how deceptive it can be. So I gave it as a test to three people I know, and what do you know, all of them got it wrong on the first try!

    Here is the problem.

    You are given the following function for writing a value to a memory slot.
    This function is thread-safe.

    void* AtomicExchange(void *&slot, void *new_value)

    You need to implement a thread-safe linked list-based stack, using the following two functions:

    struct Elem {
    Elem *next;
    };
    void Push(Elem *&head, Elem *elem);
    Elem* Pop(Elem *&head);

    Good luck. Comments will be screened so that everyone can enjoy writing a solution. Incorrect solutions will be revealed, and I'll explain why they are incorrect (hopefully I didn't get it wrong).

    update: my solution is inside
    Saturday, April 10th, 2004
    2:13 pm
    [alexeilebedev]
    Faking named parameters
    In C++ we don't have named parameters. This sometimes results in confusion, especially in functions that have several arguments of the same type. Generally such functions should be avoided, because the code that calls them is not readable. Here is an imaginary example:

    connection.Attach(channel,true,false,false).

    Just from reading this code, we get the idea that there's some connection, and it's being attached to a channel, but beyond that we're stuck -- what in the world is true,false,false? With named parameters, we would be able to write

    connection.Attach(channel, async:true, blocking:false, flowcontrol:false)

    Again, this is a completely imaginary example, but now we have a much better idea of what the flags mean. We can evaluate whether there are any bugs in the code just by reading and imagining what kind of model is being constructed. OK, so here is the solution:

    #define DEFINE_BOOL_TYPE(name)\
    	struct name {\
    		bool f;\
    		operator bool() const {return f;}\
    		explicit name(bool b):f(b){}\
    	};
    
    namespace connection{
        DEFINE_BOOL_TYPE(BlockingQ)
        DEFINE_BOOL_TYPE(AsyncQ)
        DEFINE_BOOL_TYPE(FlowControlQ)
        class Connection {
            ....
            void Attach(Channel *channel, BlockingQ blocking, AsyncQ async, FlowControlQ flow_control);
        };
    }
    using namespace connection;
    connection.Attach(channel, BlockingQ(true), AsyncQ(false), FlowControlQ(false));

    That's it. In Ada, this would be done like this:

    type BlockingQ is new Boolean;

    So we get a very similar functionality, for booleans only, and syntactically not-as-pretty, but it works very well. Note that the explicit constructor is the only thing that prevents you from passing "true" or "false" instead of the boolean object. This works similarly for integers and floats, however the arithmetic becomes limited, so you give up even more convenience than you do with the bools.
    Friday, April 2nd, 2004
    3:44 pm
    [chudov]
    loops
    Couple of years ago i started creating a "C++"-like scripting language, which would be easy to use for people with minimal coding experience. The first thing that i had changed were loops. "C" loops are powerful, but in 99% percent of cases they are used just to iterate through the set of elements, be it numbers or elements of some kind of list. Syntax of operator "for" appears to be too complex for the task.
    What people asked for was an operator like this:
    for (i @ list) {
      // 
    }
    
    Where list is any object, which is able to create an iterator; Compare to this:
    some_list_iterator list_iterator (list.iterate ());
    bool have_element;
    for ( have_element = list_iterator.Step(i); have_element;  have_element = list_iterator.Step(i)) {
      // 
    }
    
    People don't want to know which functions and classes implement enumeration of a certain kind of list, so stripping these technical details from the syntax makes programming much easier and protects from some common errors. Here are some examples of sets, frequently used in loops:
    • constant list of values: for (day@set (''mon", "tue", "thu", "wed", "fri", "sat", "sun")) {...}
    • range of integer values: for (i@range (1, 10) ) {...}
    • time intervals: for ( tick @ time_intervals (10, 50) ) {...}
      // will be executed 10 times with 50 miliseconds delay between iterations
    • directory contents: for (file@directory_contents ("%SYSTEMROOT%") ) {...}
    It looks like the only problem implementing this in C++ would be that an iterator's virtual method Step has to be able to accept any type of variable as an argument.
    Saturday, February 28th, 2004
    1:18 pm
    [alexeilebedev]
    Welcome, all the new subscribers!
    Please feel free to post any thoughts / questions / ideas on any topic related to programming, program design, programming methodologies, or language design.
    Tuesday, September 30th, 2003
    2:34 pm
    [dym]
    random question
    What do you consider to be core ideas in practical computer science? What are the patterns that are useful in diverse contexts?

    I would list caching (locality-based performance optimizations), enforced abstraction (type safety, interfaces), and instructions as data (from interpreters and code generation to self-modifying code.)
    Thursday, June 12th, 2003
    7:59 pm
    [alexeilebedev]
    A simple way to iterate over a binary heap
    Imagine you have a binary heap... a really common occurence indeed. You might have a scheduler, or a priority queue, a graph traversal algorithm, an order book, or something else that requires partial sorting. You know just the smallest element of the heap. The question is, how do you extract k smallest elements of the heap?

    The solution that comes to mind is to remove k elements from the heap by calling Remove, then process them, then add them back.

    For a number of reasons, this solution is not the most elegant one. First, you are destructively modifying the heap. You have to guarantee that while you are iterating in this way over a heap, no function that you call tries to access the same heap for some reason of its own. Second, it's slow. It takes O(k * log(N)) operations, where N is the size of the heap.

    Well, there is a way to do it in O(k * log(k)), of course. You begin by taking the top element of the heap and adding it to an auxiliary heap. Then, inside the loop, you do this:
    1. extract the top element of the auxiliary heap
    2. find the location of this element in the original heap, and add its one or two child elements to the auxiliary heap.

    That's it. At the end of the iteration, you can throw away the auxiliary heap.
    Here I am assuming two things:
    1. The main heap is indexed -- each element knows its location in the heap, because this location is stored inside the element itself. The indexed binary heap is usually organized as an array. The two children elements have indices index*2+1 and index*2+2.
    2. The auxiliary heap is not indexed -- it is a simple array where you sort elements using a comparison operator.

    This iteration mechanism is non-destructive, and uses O(k * log(k)) operations by construction.

    Here is another cool way to think about this: if we view a heap as a graph (it's a tree, after all), then we are basically visiting the nodes of that graph, starting with the root node, using breadth-first search, by means of another heap :-)
    Friday, March 28th, 2003
    5:16 pm
    [alexeilebedev]
    Round(1.5) == 2
    Round(0.5) == ?
    answer )
[ << Previous 20 ]
About LiveJournal.com

Advertisement