24 July, 2015

Programming exercise: find palindromic numbers!

Here is the task:
"A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99. Find the largest palindrome made from the product of two 3-digit numbers." (originally posted by Joseph Farah)
Here is my quick solution. I know that the palindrome test may be improved, e.g., by excluding some of the numbers based on the arithmetic properties.
#!/usr/bin/env coffee
{floor} = Math
{now}   = Date

print   = console.log.bind console, "palindromes:"
assert  = (expr,msg) -> throw new Error "AssertionError: #{msg}" unless expr
bench   = (name,fn) -> t = now(); r = fn(); print "#{name} took #{now() - t}ms"; r

isPalindrome = (n) ->
  str  = "#{n}";        len  = str.length
  mid  = floor len / 2; last = len - 1
  for i in [0..mid]
    if str[i] != str[last - i] then return false
  return true

findPalindromes = (n=999) ->
  list = []
  for x in [0..n]
    for y in [x..n]
      list.push {x,y,palindrome: p} if isPalindrome p = x*y

byPalindrome = (a,b) -> a.palindrome - b.palindrome

testSuite1 = do ->
  f1 = (n) -> assert     isPalindrome(n), "'#{n}' must be a palindrome!"
  f2 = (n) -> assert not isPalindrome(n), "'#{n}' must not be a palindrome!"

  [9009, 919, 1, 11, 123321, "abba", "aba", NaN, "" ].forEach f1
  [991,9919,10,"01","abc",null,-1,undefined].forEach f2

main = (n) ->
  list = bench "findPalindromes(#{n})", -> findPalindromes n
  largest = list.sort(byPalindrome)[list.length - 1]
  print "found #{list.length} palindromes:", {last5: list[-5..], largest}

module.exports = {isPalindrome,findPalindromes,byPalindrom}

if process.mainModule == module then main 1 * process.argv[2]

Example output on my 2.5Ghz Macbook:
$ ./ 999
palindromes: findPalindromes(999) took 52ms
palindromes: found 3539 palindromes: { last5:
   [ { x: 894, y: 957, palindrome: 855558 },
     { x: 924, y: 932, palindrome: 861168 },
     { x: 916, y: 968, palindrome: 886688 },
     { x: 924, y: 962, palindrome: 888888 },
     { x: 913, y: 993, palindrome: 906609 } ],
  largest: { x: 913, y: 993, palindrome: 906609 } }

$ ./ 9999
palindromes: findPalindromes(9999) took 6893ms
palindromes: found 36433 palindromes: { last5:
   [ { x: 9631, y: 9999, palindrome: 96300369 },
     { x: 9721, y: 9999, palindrome: 97200279 },
     { x: 9811, y: 9999, palindrome: 98100189 },
     { x: 9867, y: 9967, palindrome: 98344389 },
     { x: 9901, y: 9999, palindrome: 99000099 } ],
  largest: { x: 9901, y: 9999, palindrome: 99000099 } }

Update: I just needed to take a break from writing my PhD thesis and optimized the search for the largest palindrome. Here is an improved version of the above script.

findPalindromes = (n=999,pruneSmallProducts=false) ->
  list = []
  pmax = x:0, y:0, palindrome:0
  for x in [n..0]
    break if x * n < pmax.palindrome and pruneSmallProducts
    for y in [n..x]
      p = x * y
      break if p < pmax.palindrome and pruneSmallProducts
      if isPalindrome p
        list.push o = {x,y,palindrome:p}
        pmax = o if o.palindrome > pmax.palindrome


main = (n) ->
  list = bench "findPalindromes(#{n})", -> findPalindromes n,true
  largest = list.sort(byPalindrom)[list.length - 1]
  print "found #{list.length} palindromes:", {last5: list[-5..], largest}

The speed up is significant.
$ ./ 999
palindroms: findPalindromes(999) took 1ms
palindroms: found 2 palindromes: { last5:
   [ { x: 924, y: 962, palindrome: 888888 },
     { x: 913, y: 993, palindrome: 906609 } ],
  largest: { x: 913, y: 993, palindrome: 906609 } }
$ ./ 9999
palindroms: findPalindromes(9999) took 1ms
palindroms: found 1 palindromes: { last5: [ ... ],
  largest: { x: 9901, y: 9999, palindrome: 99000099 } }
$ ./ 99999
palindroms: findPalindromes(99999) took 19ms
palindroms: found 1 palindromes: { last5: [ ... ],
  largest: { x: 99681, y: 99979, palindrome: 9966006699 } }
$ ./ 999999
palindroms: findPalindromes(999999) took 129ms
palindroms: found 1 palindromes: { last5: [ ... ],
  largest: { x: 999001, y: 999999, palindrome: 999000000999 } }
$ ./ 9999999
palindroms: findPalindromes(9999999) took 1402ms
palindroms: found 1 palindromes: { last5: [ ... ],
  largest: { x: 9997647, y: 9998017, palindrome: 99956644665999 } }

How fast does your browser render line charts?

Here is quick demo that tries to draw hundreds to millions of lines using your browsers drawing APIs of the HTML canvas. (Click on the "Result" of tab of the JSFiddle to show the app)

For instance, you can try to set the number of lines to 6 Million, for which I get the following results on my Macbook.

Firefox 39 Chrome 44
= 817 ms
= 3047 ms
= 5484 ms
= 55383 ms  
= 1490 ms
= 3066 ms
= 4116 ms
= 13789 ms  

You may need to increase tmax to 60k or more milliseconds on slower machines or browsers. Otherwise the processing will stop after  tmax ms

Feel free to post your results in the comments. Note that this demo is copy my former jsPerf test. However, since jsPerf is down due to spam activities, I moved it over to JSFiddle for everyone to experiment with it.

Cheers, Juve

23 July, 2015

Line Rendering Demo

Sorry, I had to remove this older post of my line rendering demo, to be able to modify the permalink for one of my publications.

Disable page numbers in table of contents in Lyx/LaTeX

I am using the KOMA-Script book document class for my thesis and was irritated that my


commands were ignored in LyX (LaTeX). Luckily, there is a soluton. Just add the following code after the TOC.


Hint: In "book" classes you may often also use \frontmatter and \mainmatter to indicate where the parts of your book start. This way you do not have to change the \pagenumbering manually.

13 July, 2015

Tiny linereader implementation (CoffeeScript)

Data processing in nodejs requires handling data in Buffer objects.
Here is a quick implementation for parsing + appending incoming buffer data to an array.

  readline = (trg) ->
    len0 = trg.length
    (buf) ->
      len    = trg.length
      lines  = "#{buf}".split "\n"
      if len - len0 > 0 then trg[len - 1] = trg[len - 1] + lines[0]
      else                   trg.push lines[0]
      trg.push line for line in lines[1..]

  selfTest = do ->
    input = []
    buf = (str) -> input.push new Buffer str
    buf "abc\nx"
    buf "yz\n"
    buf "12"
    buf "34"
    buf "\n"

    output = ["opq",""]
    input.forEach readline(output)

    assert.equal output.join(";"), "opq;;abc;xyz;1234;", "Messed up!"

22 April, 2015

Started using Spark + Scala this week. Very impressive!

As the data for my dissertation is growing to become really "big data" (several GB), I was looking for new tools, beyond my trusted relational databases (PostgreSQL, MonetDB, etc.).


I found Apache Spark, which provides Python, Java, and Scala APIs to define queries on big data files. The files are served via Hadoop (delivered with Spark) to parallelize operations on the data.

Starting a Spark cluster is very easy, once you have configured the master correctly. There are some pitfalls, as Spark is very picky regarding hostnames, i.e., you better always use the full hostname with correct domains in all start scripts, config files and your application code. I won't go into the details here.

The performance of Spark is really good. It can run an M4 query on 1×10M records (200MB) in 900ms, and easily handles large data volumes, e.g. 100×1M records (2GB, 8s) or 10k×100k records (20GB, 13min). Very nice for analytical workloads on big data sources. During query execution, Spark effectively uses all 8 cores of my Macbook and I plan to improve the query response times  by running my tests on a really big cluster to provide "near-interactive" response times.


Spark is nice, but what actually motivated me for this post was to praise Scala. As a big fan of CoffeeScript, I like short (but readable) notations instead of useless repetition of names and keywords, as required in many legacy programming languages.

Scala has everything that makes a programmers life easier. Here are my favorite features:
  • Implicit variable declarations (val obj = MyType())
  • Short notation for finals (val for final values, var for variables)
  • Lambda expressions (definition of short inline, anonymous functions)
  • List comprehension (returning loop results as lists)
  • Easily passing functions as objects (as in Javascript)
  • Implicit function calls (using obj.someFunc instead of obj.someFunc())
  • Everything is an expression (no return required)
  • Short function keyword (def or => instead of function)
Awesome, I can have all these features and still get the bonus of type-safety! The code-completion in Scala IDE works quite nicely.

Here are a few Scala code examples, implementing the subqueries of my visualization-driven data aggregation  (VDDA).

Example 1: M4 grouping function.
    val Q_g  = Q_f.keyBy( row =>
      ( Math.floor( w*(row(iT) - t1)/dt ) + row(iID) * w ).toLong

Example 2: M4 aggregation.
    def aggM4Rows ...
    def toRows4 ...
    val A_m4 ={case (k,row) => (k,toRows4(row))}).reduceByKey(aggM4Rows)   

Example 3: Counting the number of unique records.
    val recordCount = Q_m4.distinct.count  

Using Spark's Scala API makes these queries easy to define and to read, so that my Spark/Scala implementation of M4/VDDA is not much longer than the SQL queries in my research papers.

Spark + Scala = Big Data processing made easy!

Use rsync instead of scp to resume copying big data files!

For my dissertation I am conducting experiments on big data sources, such as 10k time series with 100k+ records each. The corresponding files comprise several gigabytes of data. Copying such files may take very long, since I work from a remote location, not sitting next to the data centers where the data is to be processed. Therefore, I need to be able to resume big data file uploads to the machines of the data centers.

I usually use scp to copy files between machines:
scp data/*.csv
Unfortunately, scp can't resume any previous file transfers. However, you can use rsync with ssh to be able to resume:
rsync --rsh='ssh' -av --progress --partial data/*.csv \ 
If you cancel the upload, e.g., via CTRL+C, yo can later resume the upload using the --partial option for rsync.

Very simple. No GUI tools required. Ready for automation.