What's going on (from twitter)

Happy New Year everyone!

2011 was a slow year for this blog. Early last year I joined Erik Meijer’s team in order to work on an incubation project that involved actors, distributed graph-based processing, a highly-scalable and reliable document store, and more. Given the non-public nature of the project, I couldn’t really talk about it. It was a lot of fun and we learnt a lot. The nice folks in the team are continuing the work but I decided to move on, following my dream to work on knowledge representation and reasoning at scale. Unfortunately more months of work-related “silence” are going to follow.

I’ve already talked (on this blog, in public presentations, and in articles) about the opportunities and challenges associated with the data-information-knowledge spectrum. Semantic Computing is at the “age of reason” and I believe that 2012 will represent its big breakthrough into commercial applications and experiences.

As I’ve been talking with others about the knowledge representation space, I observed how the terms “information” and “knowledge” are often misused. As I expect to be writing more about information and knowledge in the coming months, I felt like sharing few words on the subject.

Information vs Knowledge: I find that the two terms are often used interchangeable. While an authoritative definition doesn’t exist, Bellinger, Castro, and Mills offer an informative description in their article on “Data, Information, Knowledge, and Wisdom”.

Data is symbols (bits, numbers, characters). Information adds meaning to data through the introduction of relationships. It answers questions such as “who”, “what”, “where”, and “when”. Knowledge is a description of how the world works. It’s the application of data and information in order to answer “how” questions.

As an example… while “Savas likes Coldplay” and “Coldplay is a band” represent information facts, the statement “for each person X there exists a female person that gave birth to X” makes an assertion about a truth in the world, about how we perceive the world. Also, “Savas is an adult” is an inferred statement from our general understanding (i.e. knowledge) that “for each person X with Age(X) > 18, X is considered an adult” when it is combined with the information fact that “Savas is 38 years old”. Of course, one might argue that the inference about Savas’ adultness is erroneous because we haven’t accurately described the world in which Savas acts like a teenager on many occasions :-)

The above is just part of the basics of course. Once we incorporate temporal and probabilistic reasoning things get a lot more interesting :-)

I predict that 2012 is going to be the year where parts of Vannevar Bush’s vision are going to start becoming a reality. Experiences such as Siri are leading the way in incorporating natural language understanding in general-purpose computing. However, I believe that Siri is only scratching the surface of what we can achieve today. 2012 is going to be very exciting :-)

As always, feedback/recommendations are always welcomed!

 

If you are interested in this space, here are the books I recommend:

A crazy Savas trip :-)
10 Dec 2011
, Categories: Music-Festivals, Travel

When Lee Dirks asked me to represent the Microsoft Research Connections team at a Digital Public Library of America meeting at Harvard Law School in Boston, I immediately checked my calendar. The DPLA folks are starting a wonderful new journey of defining a platform for digital libraries so they needed input from industry Software Architects.

This is Tony Hey’s group so I am always wiling to help if I can. Also, Lee is such a great guy so it’s difficult for me to say “no” :-) I checked my work calendar to make sure that I didn’t have any meetings before committing to the trip. Then the following two occurred:

  • I changed roles within Microsoft (yes, again! :-)*. My new boss was more than happy for me to help out MSR, so all was good.
  • I checked my personal calendar and realized that I had bought tickets for the Winter Warmth concert with Florence and the Machine on the night I was supposed to fly to Boston :-( I was really looking forward to seeing them live again but I couldn’t let Lee down. So what is a concert-going geek like me to do?

I knew I was going to be already pumped by the “Deck the Hall Ball 2011” the night before so I had to figure something out :-)

I now find myself at the Logan International Airport in Boston, Saturday 6am, waiting for a flight to San Diego. I got tickets for the “91X Wrex the Halls” festival over there. Oh, it’s going to be awesome.

My Wed-Sun schedule:

image

Next time I will try to cover all four corners of the US! Now I am looking for the right place to make a donation in order to offset the resulting carbon footprint for this trip.

 

Crazy eh? I love it! :-)

 

* I am super super excited about the new role but I am not going to say anything else until the new effort sticks and delivers to its huge potential. Given Microsoft’s investment and the caliber of people involved, I am extremely optimistic :-) I am having a blast so far!

Trip report - XLSD Conference

Venue: SLAC (Stanford Linear Accelerator Center)
Conference: XLDB 2011

Overview

I attended the XLDB 2011 conference earlier in the week. XLDB seems to be an emerging conference. There were 280-300 people in attendance and more had to be turned away. In fact, there were people who turned up anyway, even though they hadn’t secured a place, hoping that there would be cancellations.

My understanding is that XLDB emerged from the need to discuss technology, issues, and requirements for Big Data in the scientific community, especially when using supercomputing facilities. However, it is clear to everyone that Big Data is not specific to those who are doing supercomputing anymore. Even though we are moving into the exascale supercomputing era (the fastest supercomputer today can do >8 PFlops), having to deal with GBs, PBs, or even EBs of data is not an exclusive problem to those who have access to supercomputing facilities. Hence, it was not a big surprise to find out that the conference attracted the interest of the industry.

A number of leading industry leaders had a presence at the conference: Google, Microsoft, eBay, Netflix, LinkedIn, Facebook, Amazon (even though they decided at the last minute not to present), and IBM (to name few). There were talks by scientists as well.

Most of the talks were great and I enjoyed them a lot. I chose to highlight the Netflix, Metamarkets, and Novartis ones as the driving examples for my observations. The conference organizers have promised to publish the slides and the videos of the presentations.

The value of data

In my mind, the Big Data space is not a niche any more. It’s not a space that any company offering enabling technologies, solutions, and services to its customers can afford to ignore. Many customers already have real problems, they already take advantage of Big Data processing infrastructures, and their competiveness is based on their ability to extract value and insights from the data they collect.

Take Netflix for example. Their VP of Data Science and Engineering (highlighting “Data Science” in the title!!!) gave an excellent talk on how Netflix won the DVD-shipping game, how they became competitive. It was all because of the data they collected and then analyzed. They heavily instrumented their DVD-handling equipment. Every single aspect of a DVD’s route was recorded and sent to Netflix’s data warehouse. Decisions within msecs had to be made about how to best route each disc. Data was collected and then processed in order to optimize all aspects of their business. They had become so good at it that their only bottleneck became the post office. They had reached such a level of data-based business intelligence that they even went to the post office and started helping them optimize their operations.

The VP of Data Science and Engineering at Netflix was a happy person until Netflix decided to get into the streaming business. Their data collection and analysis requirements skyrocketed!

Here comes the cloud

Netflix needed to expand its ability to process data and make business decisions. They really wanted to move away from the business of managing infrastructure. They didn’t want to have to deal with operations, data centers, machines, and so on. They went through a migration period of progressively moving their entire data collection and processing infrastructure into Amazon’s cloud.

Granted, Netflix had to build their own pipeline based on open source technologies. They used the right tool for the job. They used a NoSQL solution for reliably gathering/recording their data at scale. They used an RDBMS where it made sense.

Netflix is a big company. They can build their own data processing infrastructure from the various pieces. However, what about all those smaller companies that want to collect and process data that is critical to their growth, competitiveness, survival? Wouldn’t they benefit from cloud solutions that are scalable, reliable, and NOT managed by them!

Take Metamarkets as an example. They are doing predictive analytics that help advertisers around the world. Apparently the advertising game is following that of the financial markets. Advertisers need to be able to make decisions within few seconds. They need to analyze large amounts of data (billions of microtransactions per day) very fast.

Their needs for a very fast engine for doing almost real time analytics was not addressed by any existing solution. Metamarkets was born in the cloud and continues to operate in the cloud. They didn’t have to transition to it like Netflix did. Nevertheless, they still had to build their own distributed, in-memory database (Druid) because none of the solutions they tried could meet their requirements. Given their domain of focus, that’s effort that could have been avoided. Rather than focusing on infrastructure, they could have diverted their investments in offering better services to their customers. As it turned out, they managed to build a very good infrastructure that serves them well today.

The data analytics ecosystem

Companies like Vertica provide solutions for companies like Metamarkets. The value proposition is obvious. If you want to build a service or a product that is based/depends upon the processing of data at scale, then you don’t have to build the infrastructure yourself.

This is not about deploying a database management system. This is not about just deploying Hadoop or a NoSQL store. This is about getting a complete solution for your big data analytics needs, tailored to your specific requirements (e.g. close-to-real-time processing, batch processing, scale, cloud, etc.).

Novartis happens to concentrate on providing solutions for the genomics/life sciences community. They utilize SciDB, an array-oriented parallel database. There are many companies like Novartis out there addressing different domains. We’ve all heard about them and already monitoring them. The point is that such companies are offering solutions for real customer needs today. They reuse open source technologies in order to build an ecosystem of tools and services for their customers.

In my mind, a great opportunity resides in democratizing the data analytics ecosystem by offering scalable solutions at scale; that is, solutions that meet the compute- and data-processing scalability requirements of customers while doing so for 100s of millions of customers at the same time. An ecosystem that addresses all aspects of the Big Data space… data collection, management, processing, visualization, analysis, data mining, machine-based reasoning, and many more!

 

Isn’t it a great time to be in the cloud + big data space? :-)

 

XLDB was a great conference.

I moved to the US in September of 2005. I set a goal for myself to stay organized with regards to my finances and collect as much data as possible for data analysis. Since then, every single entry in my credit card statements is explicitly reviewed and categorized. I used Microsoft Money and then Quicken (after Money got discontinued) to collect and manage all the transactions.

A question that recently came into my mind was this…

Has the almost exclusive use of my motorcycle* for commuting had any positive impact on the consumption of gasoline over the years?

(* for reference: my BMW R1200GS :-)

I thought that I should be able to figure this out by doing some data analysis. Ok… it’s not a “big data” analysis problem but the exercise does incorporate the necessary steps one needs to undertake in order to get insight from some data, whether big or small.

Total spending per month

First step is to figure out my total spending on gasoline per month, which should be easy. Indeed, Quicken allowed me to sort all the transactions from the last 6 years. I copied the ones that were under the “gasoline” category to Excel and voila…

image

It was easy to calculate the monthly spending using Excel’s grouping function. I did have to play a bit with the dates so that I could make grouping work.*

There is definitely a trend towards less spending. Of course, the above doesn’t take into consideration the fluctuating price of the gasonline.

Finding historical gasoline prices

So, I had to go find the gasosline prices over time for the state of Washington. It took me a while on Bing to find a free source. The Department of Energy maintains historical data. Since I always use premium gas (better for the environment), I didn’t have to worry about averaging between the different types. I downloaded the Seattle data.

Grouping the data

Now that I had the data, I had to bring it into the same shape as my monthly-spending data. Again, some massaging of the dates, grouping, and I have the monthly average price of premium gasonline in the Seattle area.

Comparing the data

Unfortunately, I haven’t been collecting the actual miles that I did on per month basis. This makes it difficult to get an accurate view of the monthly spending in relation to the actual miles travelled and when compared to the prices of gasoline. Some months I travelled many more miles than others (e.g. road trips, visitors).

I do know that I have 63,000mi and 18,000mi on my car and motorcycle respectively. If I distribute these miles throughout the months, I find that on average I have been driving/riding 1,094.5mi per month. WOW!!!

With that information at hand, I can now calculate my monthly gallons consumption (again, given the even distribution of my total miles throughout the months**).

image

So, even though my car has been getting older, my miles/gallon efficiency has been going up. This is definitely a result of the heavier use of the motorcycle over the last two years. Also notice how the average price of gasonline has been going up in Seattle.

 

Lessons

Here are few things I’ve learnt through this exercise…

  • Excel is a great tool for playing with small amounts of data, even though some necessary features (for this scenario) were difficult to discover.
  • The discovery of data that I didn’t have seemed to be the most difficult part.
  • Filtering and massaging the data took most of the time.
  • I felt that reporting and making sense of the data could be automated.
  • Visualizing the numbers makes all the difference in the world :-)

* Perhaps the most difficult part of the entire process was my inability to copy-paste only the results of the grouping. Finally I discovered the “Go to special…” feature in Excel that allows one to copy only the visible parts of a selection.

** I can probably figure out the miles per month if I first calculate the miles/gallon efficiency of my car and motorcycle.

Steve Jobs, 1955 - 2011
6 Oct 2011, Updated: 6 Oct 2011
, Categories: Technology

To me Steve represents the perfect example of a visionary, a creator, leader, an artist, a world changer. In my eyes, the world is a better, more beautiful place, because of Steve.

Steve Jobs

Source: apple.com, Oct 5, 2011

Steve Jobs

Glastonbury 2011
1 Jul 2011, Updated: 1 Jul 2011
, Categories: Art, Music-Festivals, Travel

Another year, another Glastonbury!!! And as always, it didn’t disappoint. This year it was a different experience for me but it was still a lot of fun! :-) It was the first Glastonbury since 2004 that I didn’t experience on my own. Mary was with me, which meant that camping with a small tent, no showering for 5 days, and toilets shared with other 200,000 people were not even considered as options :-) But first things first.

We arrived in London in order to spend two nights at Jim’s place with him and his wife. Due to a miscommunication though, he was expecting us the week after Glastonbury so he ended up being at the Neo offices in Sweden while we spent time with his better half :-) Thanks K! We did end up seeing Jim and K for drinks when we got back from Glastonbury. It’s always a pleasure hanging out with my mate!

We took the bus to Glastonbury. As we were approaching, we started noticing the effects of the rain from the previous days. It was going to be another wet one!

WP_000208 WP_000211 WP_000184
Arriving at the Glastonbury site, thinking while playing Scramble (but still losing badly :-), waiting to be picked up.

 

So… We stayed at a yurt. I was very skeptical at the beginning but, I must admit, that it was much better than camping. It was spacious, clean, and we had to share a toilet only with another couple. That made Mary happy, which meant I was happy too :-)

IMG_5460 IMG_5462 IMG_5467 WP_000218
The yurt (inside, outside, and with its own toilet).

 

The site even had its own chill-out bar, which was another yurt, larger of course.
Panorama 2
The yurt chill-out bar... substantial amounts of alcohol was consumed here :-)

 

Yes, Glastonbury was muddy again, especially on Friday and Saturday. Really muddy! Not as bad as the 2007 one but still bad enough so that Mary was often seen making expressions like these:

IMG_5512 IMG_5586 IMG_5539 IMG_5588 IMG_5526 WP_000190 WP_000191 IMG_5522
There was a LOT of mud! Usually these fields are covered with grass but we could only see mud :-(

 

This Glastonbury festival was all about exploring the site. I wanted Mary to get a feel of the arts, of the vibe, of everything that was happening. So we walked a lot, we went to different tents for comedy, dance, poetry, random performances. I think Mary really enjoyed the modern dance performances we saw. There is always something for everyone.

IMG_5472 IMG_5484 IMG_5505 IMG_5568 IMG_5577 IMG_5531
Performances at various stages/tents... Random drums, young kids doing circus (they were great), weird trio performing Mozart and Vivaldi, the Black Eagles were great as always, the theater/dance/acrobatics group Mimbre was amazing, drinks at the “Fluffy Rock Café” :-)

 

And of course there were the random “street” performers and acts.

IMG_5502 IMG_5521 IMG_5592 IMG_5595 IMG_5603 IMG_5771
A very small sample of the random “street” acts. Of note is the twister game, apparently only a small portion of the larger board that was deployed the previous day.

 

We also experienced the food at the various stands with various degrees of success, at least for Mary.

IMG_5487 IMG_5488 IMG_5489 IMG_5490 IMG_5491
Mary enjoying a Mediterranean wrap, which of course was my choice but since she didn’t like hers... :-)

 

The yurt company also provided breakfast and dinner, which we thoroughly enjoyed.

IMG_5630
Saturday’s pork roast... mmm!

 

Of course, it wouldn’t be Glastonbury without good music. This time I stayed away from the mosh pit (as I said, a different Glastonbury experience :-) It was still a lot of fun! We saw U2 in the rain, Coldplay, Elbow (they were fantastic!!!), Paul Simon, Dan McLean, Queens of the Stone Age, and more. Not so many bands as other times.

IMG_5552 IMG_5647 IMG_5648 IMG_5662 IMG_5732 IMG_5750 IMG_5754
“Pyramid” and “Other Stage” stages.

 

When the sun came out on Sunday, Glastonbury felt completely different. The paths dried out making it much easier to move around.

IMG_5675 IMG_5678 IMG_5719 WP_000220 Panorama 1 (1280x446)
Sunday was a gorgeous day. We even met with Carole Goble and Dave De Roure, my usual partners in crime at each Glastonbury.

 

Mary was a great sport. It was her first festival experience of such scale and in such conditions but she handled it great. I was very proud of her :-) Not every moment was filled with smiles but we both managed to maintain our cool and make Glastonbury a really memorable experience.

IMG_5496 IMG_5468 IMG_5530 IMG_5557 IMG_5600 IMG_5611   IMG_5642 IMG_5682
Mary at Glastonbury

 

And then it was time to go...

WP_000269 WP_000198 WP_000210
The aftermath at the Pyramid stage (so much rubbish :-(. Mary was exhausted as we waited for the bus. She was full of energy the following day, discovering how addictive Angry Birds can be on the iPad during our flight back to Seattle.

 

Definitely another memorable Glastonbury. Too bad there isn’t going to be one next year. We are going to be there in 2013 though!!!

We are having fun at work with some investigation work in the NoSQL space. We’ve been playing with actors in our distributed system. There has been a lot of prior work in this space so we are not really doing anything new with actors but it’s still fun. Scala, F#, Erlang, and many others have successfully applied the model for managing concurrency.

Earlier today, Aaron Lahman attempted to concurrently send messages to 20,000 actors in the same process. My implementation of the actor model didn’t handle it. As we started investigating, we came up with an implementation pattern that we thought of sharing in case others find it interesting. Lots of credit goes to Aaron for his insight and feedback.

Let’s start with what was wrong.

Asynchrony is baked in everything that we are building. Every message sent between components or every method invoked has to be done so asynchronously. We are heavy users of the Task Parallel Library (TPL) and C#’s upcoming async/await feature (Microsoft Visual Studio Async CTP). So, what happened with the actor?

Our actors look just like any other actor out there :-) There is a message queue and some optional state. Each message is processed in turn and is given a copy of the state (if any). Then, it is expected to produce an updated version of that state (if it wishes) and some output to be sent to the original sender of the message.

image

So, my original implementation used a blocking queue for the incoming messages (effectively calls to Execute() are treated as messages). Each actor in the system would start a task (line 7) and then block waiting for a message to arrive (line 30). Each message is converted to a task which was queued (line 17).

(As you can tell from the simple implementation above, our actors don’t have any state. I just wrote these two simple actor implementations for illustration purposes.)

   1:  class BlockingActor
   2:  {
   3:      private BlockingQueue<Task> operationQueue = new BlockingQueue<Task>();
   4:   
   5:      public BlockingActor()
   6:      {
   7:          Task.Factory.StartNew(this.ProcessMessage);
   8:      }
   9:   
  10:      public Task<R> Execute<T, R>(Func<T, Task<R>> function, T arg)
  11:      {
  12:          if (!function.Method.IsStatic)
  13:          {
  14:              throw new ArgumentException("Function must be static");
  15:          }
  16:   
  17:          var task = new Task<R>(() => function(arg).Result);
  18:          lock (this.operationQueue)
  19:          {
  20:              this.operationQueue.Enqueue(task);
  21:          }
  22:   
  23:          return task;
  24:      }
  25:   
  26:      void ProcessMessage()
  27:      {
  28:          while (true)
  29:          {
  30:              var task = this.operationQueue.Dequeue();
  31:              task.RunSynchronously();
  32:          }
  33:      }
  34:  }

 

As you can see from line 17, I had to wrap the given function inside a Task so that I could control the timing of its execution. The use of Task.Result meant that when the task actually run (line 31), the allocated thread would block. Here’s a sample program to demonstrate the behavior…

   1:  class Program
   2:  {
   3:      static int NoActors = 50000;
   4:   
   5:      static async void Main(string[] args)
   6:      {
   7:          var actors = new BlockingActor[NoActors];
   8:          var tasks = new List<Task>();
   9:   
  10:          for (int i = 0; i < NoActors; i++)
  11:          {
  12:              actors[i] = new BlockingActor();
  13:              actors[i].Execute(Foo, i);
  14:          }
  15:   
  16:          Console.WriteLine("done queuing");
  17:          Task.WaitAll(tasks.ToArray());
  18:          Console.ReadLine();
  19:      }
  20:   
  21:      static async Task<string> Foo(int i)
  22:      {
  23:          await TaskEx.Delay(TimeSpan.FromSeconds(4));
  24:          Console.WriteLine("done " + i);
  25:          return "done";
  26:      }
  27:  }

 

The blocking calls make the Task scheduler’s life very difficult since it has to keep allocating threads. All our asynchrony-related effort goes to waste because the processing of a message doesn’t yield the processor when it blocks, like it happens in line 23 above.

What Aaron and I came up with is an Actor implementation that uses Task continuations to simulate the message queue. Since the semantics of the actor model require us to process a message only after the processing of the previous one has finished, all we have to do is to chain the respective tasks. Here’s how our Actor looks like now…

   1:  class Actor
   2:  {
   3:      private Task lastTask = TaskEx.FromResult(0);
   4:      private object objLock = new object();
   5:   
   6:      public Task<R> Execute<T, R>(Func<T, Task<R>> function, T arg)
   7:      {
   8:          if (!function.Method.IsStatic)
   9:          {
  10:              throw new ArgumentException("Function must be static");
  11:          }
  12:   
  13:          var tcs = new TaskCompletionSource<R>();
  14:   
  15:          Task<R> task = null;
  16:          lock (this.objLock)
  17:          {
  18:              task = this.lastTask.ContinueWith(_ => function(arg)).Unwrap();
  19:              this.lastTask = task;
  20:          }
  21:   
  22:          task.ContinueWith(t => tcs.TrySetResult(t.Result), TaskContinuationOptions.OnlyOnRanToCompletion);
  23:          task.ContinueWith(t => tcs.TrySetException(t.Exception), TaskContinuationOptions.OnlyOnFaulted);
  24:          task.ContinueWith(t => tcs.TrySetCanceled(), TaskContinuationOptions.OnlyOnCanceled);
  25:   
  26:          return tcs.Task;
  27:      }
  28:  }

 

With the above implementation we are only restricted by the amount of available memory for the running process. We always maintain a reference to the last task in the chain so that we can attach incoming tasks to it. The first ever task in the chain is created to be in the “complete” state (line 3) so the first incoming message will run immediately. We only call the Task.Result blocking operation after a given task has been completed successfully (line 22).

We can now have thousands of concurrent actors together as long as the Tasks representing the processing of a message use the async/await pattern to yield the processor when necessary. We are making sure that the libraries we build on top do just that so that we can achieve maximum possible performance.

Task continuations are pretty cool.

New challenge
15 Mar 2011
, Categories: Personal

It was more than 18 months that I moved to the Technical Computing organization inside Microsoft. I had joined a startup team that worked on building a platform for large-scale compute and data processing on top of Azure. It was an amazing experience. We worked hard, produce great results, and learnt a lot. I met great people along the process. Even though there isn’t a specific product to which I can point as outcome of our effort, our learnings/ideas are now distributed throughout Microsoft and are gradually finding their way into other products.

I stayed with Technical Computing to help with a new product in the modeling space. I was responsible for the design of the Azure-hosted service of the product. Given that the team is executing great, I decided that it was time to move on.

I talked with many teams, inside and outside Microsoft. I met some great people along the way. It was a very educating experience which lasted around 2-3 months. I considered joining startups, moving to other larger companies, and even starting my own.* There were times of stress of not finding something interesting and times of being overwhelmed from the amazing offers I got. At the end, I decided that money was not the primary motivation for me. I chose to decline good offers in favor of joining a group of people next to whom I can learn a lot.

 

I am joining the team that David Campbell and Erik Meijer have started putting together. Their aspiration is to build something new and exciting in the data management and processing space (not necessarily relational :-). I’m humbled to be joining a team of really smart and talented folks. I am going to be responsible for one of the major areas of work and will be leading a team.

It’s an exciting opportunity and I can’t wait to start talking about all the things that we will be doing. The team wants to execute fast and release often so stay tuned.

 

I am very thankful to all the people with whom I talked throughout my search. They opened doors of opportunities for me and trusted me with their offers. I’d like to believe that I made some new friends!

 

* In fact, I did start a small company for fun and I plan to continue working on a couple of ideas when I have free time.

I hadn’t come across the “HOME” movie before. I watched it earlier today.

Our home, the planet that we call Earth, deserves at least 1 1/2h of our time. Please please please watch this movie in its entirety!!! The cinematography is spectacular; the narration really beautiful; the conveyed message powerful, inspiring, moving. Make sure you watch it all the way to the very end!

Watch “HOME” on YouTube.

"HOME” web site.

Squaw Valley skiing weekend
6 Feb 2011, Updated: 6 Feb 2011
, Categories: Personal, Travel

The last weekend of January I traveled to California to meet with Jim and Emil for a weekend of skiing at Squaw Valley. We were treated with an excellent weather on Saturday and amazing new 20-30cm snow on Sunday. Soooooo much fun!

I hadn’t seen my pal Jim for a long time so I really wanted to catch up. Even though I had exchanged messages and talked on Skype with Emil, I was looking forward to meeting him in person. He’s a lot of fun! :-)

The weekend consisted of a lot of geek talk, especially around graphs, coffee breaks in the slopes, and skiing. Emil, Jim, and the rest of the gang over at NeoTechnology are doing some really really cool stuff. I am soooo tempted ;-)

Earlier today I decided to create a short movie from the videos and photos :-)

Enjoy...

Please note that the video might be blocked in some countries (e.g. Germany) because of the soundtrack. According to YouTube’s copyright notice, even though I own the music, I can’t use it on home videos that I publish online. Oh well.

“The Joy of Stats”
5 Jan 2011
, Categories: Science, Research

Happy New Year everyone!

The first blog entry of 2011 is about an 1-hour BBC documentary on the “Joy of Stats” (video on YouTube) with Hans Rosling as the presenter.

Hans Rosling is a great inspiration to me. He always talks about the power of data and its analysis, a subject about which I care a great deal. Most importantly, however, it’s the way he delivers his message, in talks and videos. He’s absolutely great!

In this particular documentary, Rosling advocates the power of statistics, visits some of its history, gives examples, mentions the data deluge, talks about data-intensive Science (Stephen Emmott talks about the transition from hypothesis-oriented to data-oriented science, which is of course Jim Gray’s fourth paradigm for science), and so on.

Ah… I think this documentary made me realize again how much I miss working with researchers around big data problems. Rosling talks about the SLOAN digital sky survey. I had the pleasure, after Jim Gray’s suggestion, of working with a small subset of the data when I was in Newcastle and met wonderful people in the process… Jim Gray, Alex Szalay, and my good friend Maria Nieto-Santisteban (Happy New Year Maria!!!), and many others involved in the project. As you can hear in “joy of stats”, it took SLOAN 8 years to complete a survey of 1/4 of our sky. It’ll take its replacement 3 days to complete a survey of the entire sky. Amazing progress! And there is going to be a LOT of data available for processing!

I also loved the fact that Rosling closed with a mention of the work behind the “We Feel Fine” web site. I very much love the philosophy behind the site and the premise it represents, as those who are close to me know very well given my recent endeavors and long nights :-)

As Rosling said in the documentary, “pretty neat, eh?” :-)

BTW... I had twitted about a small clip from this documentary a month ago. This blog post is about the entire one-hour documentary “Joy of stats” which I definitely think is worth your time!

This is cool. O’Reilly had an ebook promotion for cyber-Monday here in the US. “REST in Practice” did very well. It was the fifth best-selling title :-) Tim O’Reilly published a blog entry about the top selling books.

Not bad! :-)

If you have read the book and would like to share your views with the rest of the world, please consider submitting a review over at Amazon.

Diplomacy 2.0?
7 Dec 2010
, Categories: Politics, Web

I feel that I need to express my opinion on the recent Wikileaks-related events. I know that many of my friends or those who follow my thoughts on the Web might not agree with me but I always believe that it’s healthy to have a difference in opinions since it makes for interesting conversations :-)

I’ve been monitoring the news stories over the last few days. From one side, it’s a celebration of social media and the Internet. The speed by which information, opinions, and news are being distributed around the world is a great demonstration of the tremendous power that we have in our hands in the age of information. On the other hand, the reaction by our governments throughout the world shows, in my opinion, that there are ever-increasing efforts by those in power to control us, to keep the population of the planet blind about their actions, to filter the truth, to hide their dealings.

Governments are accountable for their actions. That’s at the heart of democracy. Politicians should only be afraid of the truth if they have something to hide, if their actions do not represent the will of the people. Those close to me know that I believe in “if you don’t want your actions to be discovered, you shouldn’t do them in the first place”. The truth about a government’s actions is not what puts peoples’ lives in danger; it’s the actions themselves, the greed that drives those actions, our inability as people to co-exist, to collaborate, and to co-evolve.

I have been negatively surprised by many of the politicians’ reactions. Do calls to “hunt Wikileaks founder like a terrorist” belong in a democratic and civil society? Are we going to start hunting down all those who reveal information that governments want to keep secret? Are we going to stop journalism? Would have the reaction been the same if the released documents were from the Chinese or Russian governments? I suspect we might have been celebrating a “hero” now.

Please note that I am not supporting any type of illegal activity, by either side. If laws were broken, people should be punished. Our modern society has a legal system that should be used to decide blame and judge.

I am disappointed that many companies (Internet services and financial institutions) out there have taken action against Wikileaks under the pressure of governments around the world. Leaks of confidential documents have taken place since the concept of “confidential document” existed. And know what? They will continue to take place.

Any action against Wikileaks is really an attempt to control information dissemination, an attack against the foundation of information access on which the Internet is based. Diplomacy really needs to evolve.

People talked about “privacy 2.0”, about how social media is transforming our approach to privacy. I wonder whether the world should transition to “diplomacy 2.0”, a world where countries are more transparent about their interactions and deals, where we use information sharing as a way to coordinate for our common goals, for co-existence.

I suspect that many of the worlds’ governments might be elevating Wikileaks and its editor-in-chief to “martyr” status because of the way they are reacting. Here’s what Julian Assange, the editor-in-chief, wrote earlier today: “Don’t shoot the messenger for revealing uncomfortable truths”.

I am a technologist. I am a geek. I know that.

I make use of all sorts of gadgets or read about them. I try to follow all the latest developments. I read books to educate myself about new things. At work and at my free time I try to generate new technology. Technology is a large part of my life.

So, it pains me to see technology getting in the way of doing things better, faster, easier. It’s always a human mistake of course. It’s either someone who doesn’t know how to apply the tools at hand or those who designed the tools in the first place.

Here’s a recent story that negatively surprised me.

 

I love Apple’s hardware design. Their laptops are gorgeous. It was time for a new MacBookPro.* (Yes, I do choose to run Windows 7 on it... beautiful hardware with beautiful software :-)

Last week, while I was traveling in New Orleans for SuperComputing 2010, I logged into Apple’s Online Store and placed my order, completely forgetting that Apple has a substantial discount for Microsoft employees. This was, of course, my mistake! I had to configure and order the laptop through the “Microsoft Employee Purchase” part of Apple’s web site, which automatically applies the discount. Well, I forgot to do so and ordered the laptop through their consumer website.

Earlier today I got back to the office and was going through my large mailbox when I noticed Apple’s receipt for my order. At that moment, I realized my mistake. Oh well! I thought that since the laptop is still en route, this should be an easy mistake to fix. Little I knew :-(

I called Apple Customer Service and explained the situation. The lady on the phone needed to talk to her supervisor before offering me a solution...

They offered to email me FedEx labels so that I can return the laptop at the moment of delivery. Once the laptop reaches their returns center, they will issue a refund. The refund will take around 5 business days to reach my credit card. I will then be able to order a new laptop, which is going to take more than a week to arrive.

I tried to ask them whether they could just refund me the difference, whether they could just fix my original mistake through the use of technology but, unfortunately, their systems couldn’t do that. It seems to me that either their systems weren’t designed correctly or they just don’t know how to use them.

The end result?

  • They are getting charged with the costs associated with a returned laptop (even though I am sure that those costs are included in the Apple premium we all pay :-)
  • I don’t get to play with my latest toy for another 2-3 weeks
  • Apple gets a mostly perplexed and mildly dissatisfied (mildly since it was my mistake after all) customer

 

* Jim got a new one so I had, of course, to do the same :-) (an ongoing joke between us :-)

Zentity v2.0
17 Oct 2010
, Categories: Semantics, SciFi, Research, Microsoft

Ahhhh! One of my “children” is growing up and maturing!

imageI am so proud of the team over at Microsoft’s External Research. They have done an amazing job with Zentity. The new version packs a great set of cool new features, building on the extensible nature of the graph store which is at its heart. I know that they put a lot of emphasis on making the tasks of navigating and visualizing through the information in the graph store very easy by incorporating other Microsoft technologies.

Well done to the team and special congratulations to Oscar Naim who lead the v2.0 effort. The other usual suspects are Lee Dirks, Alex Wade, and Derick Campbell of course.

Check out the new set of features:

  • New services (i.e. Pivot Collection Service and Zentity Data Service)
  • New client applications:
    • Pivot Viewer and ODATA Viewer, in collaboration with Microsoft Live Labs
    • Visual Explorer, in collaboration with MSR Asia
    • PowerShell admin console
  • .NET 4.0 support
  • ODATA support
  • Data model agnostic
  • Multi-tier application support
  • Zentity SDK
  • Improved deployment experience

There is also an introductory video featuring Oscar :-) And the code project site is coming soon! I know we promised to open source Zentity long time ago but but despite Alex’s great efforts, we’ve been hitting one obstacle after the other. The source code is definitely coming though.

Zentity v2.0 download