Pages

04 December, 2015

Useful command to help tuning your WiFi antenna

WiFi throughput depends on the placement of your router and connected WiFi adapters, i.e., the position of their antennas.

I just tuned my WiFi, by testing various placements and observing the detailed signal quality in real time using the following bash command,

while sleep 0.5; do clear; iwconfig wlan1 | grep -iE "rate|quality"; done

which will continuously output the data rate and signal quality of your WiFi adapter wlan1. Note that, depending on your setup, your device name may be different, e.g., wlan0, wlan2, etc.

19 October, 2015

Enable sectioned bibliography in LaTeX/LyX under Linux

Today I switched my dissertation workplace from my corporate MacBook to my Ubuntu Linux PC at home. Nearly everything worked fine, since I already carefully defined all figures and child documents to use relative paths.

However, I got one error that puzzled me:
LaTeX Error: File `bibtopic.sty' not found
Obviously some LaTeX package was missing, which did not happen before under Linux. Now what to do -- in absence of a LaTeX package manager on Linux? 

Answer: Just use your Linux package manager!

So I did a search for "bibtex":
$ aptitude search bibtex
p   bibtex2html                - filters BibTeX files and translates
p   bibtexconv                 - BibTeX Converter                   
p   bibtexconv:i386            - BibTeX Converter                   
p   jbibtex-base               - make a bibliography for ASCII p(La)
p   kbibtex                    - BibTeX editor for KDE              
p   kbibtex:i386               - BibTeX editor for KDE              
p   libtext-bibtex-perl        - Perl extension to read and parse Bi
p   libtext-bibtex-perl:i386   - Perl extension to read and parse Bi
p   nbibtex                    - Powerful, flexible replacement for 
p   nbibtex:i386               - Powerful, flexible replacement for 
p   nbibtex-doc                - Documentation of source code for nb
p   python-bibtex              - Python interfaces to BibTeX and the
p   python-bibtex:i386         - Python interfaces to BibTeX and the
v   python2.7-bibtex           -                                    
v   python2.7-bibtex:i386      -                                    
p   texlive-bibtex-extra       - TeX Live: BibTeX additional styles 
OK, "texlive-bibtex-extra", "additional styles". This was what I was missing and installing this package fixed the problem.
$ sudo aptitude install texlive-bibtex-extra 
Done!

22 September, 2015

LaTeX \sloppy and \fussy line breaking

In my thesis, I use words like "visualization-related" a lot. Such words should not be further hyphenated, since it may look awkward.

Example: manual hyphenation

Words  break awkwardly

like visualization-rel-
ated or like visualiza-
tion-related.

The hyphenation for line breaking will interfere with the initial hyphenation and impair readability.

In LaTeX, the default line breaking mode is \fussy. This mode tries to present the words on each line very condensed, not allowing for the space between two words becoming too large. However, this causes problems with hyphenated words that should not be hyphenated any further. If the entire word would be moved to the next line, then the inter-word space on the current line would suddenly be too large. LaTeX then decides not to hyphenate AND not to break, but instead let the word flow over.

Example: default \fussy mode

All words break normally but
may overflow like visualization-
related or visualization-related.


The overflow can be removed by setting \sloppy mode instead of \fuzzy mode in your preabmle or locally using {\sloppy ... }. This forces overflowing words to move to the next line, but may lead to large inter-word spaces on the current line.

Example: alternative \sloppy mode

All  words  break  normally
and  without  overflow like 
visualization-related,  but
also   like   visualization-
related.   But   inter-word
space can be very large.

Getting to best of both worlds


Lately, I am writing my documents as follows:
  1. Start writing your document in \sloppy mode.
  2. Extend, rewrite, review, revise the document until it is nearly ready for publication/submission.
  3. Switch to \fuzzy mode and fix any problems manually.
Manual fixing will be mostly done by (1) rewriting a few sentences, or by (2) telling LaTeX exactly where to break.

For Option 1, just add a few words, such as a "the" or rephrase some words from short nouns to using a phrase with an "ing"-form. Exchanging verbs also helps. 

Example: rephrasing to push over hyphenated words (before and after)

All words break normally but
may overflow like visualization-
related or visualization-related.


All words break normally but
also  suffer  from  overflow

like visualization-related or
visualization-related.

The problem is that the overflow may be pushed down, i.e., other words following long hyphenated words may not be allowed to wrap over,  due to the prioritization of inter-word spaces. In this case, LaTeX can be told where to break and thus occasionally violate the inter-word spacing rule.

Therefore, you need Option 2, the \linebreak{} command to manually wrap over while leaving the previous line justified; however awkward it may look thereafter. In LyX this special line break can be inserted via Insert > Formatting > Justified Line Break and depending on your OS it should have a hotkey you should remember. 

Example: additional line breaks (before and after)

All words break normally but
also  suffer  from  overflow
 
like visualization-related or
visualization-related.

All words break normally but
also  suffer  from  overflow
 
like   visualization-related \linebreak{}
or visualization-related.

The broken line may suffer from larger inter-word space. You just have to find a nice compromise here. In texts with more words per line than the shown examples, additional white-space is less notable and the fixes should be easier, i.e., the overflow will not be pushed down to following words.

There is also an Option 3: Instead of manually fixing, you can wrap a paragraph with {\sloppy ... } to temporarily allow large inter-word spaces. However, I like Options 1 and 2 better.








05 September, 2015

How to fix a broken Logitech mouse that is clicking multiple times on single click

My Logitech was sending false mouse up and down events to my MacBook. I first though it was a software problem or that the mouse, which I bought as used hardware on Ebay, has some broken electronic component. Other users had similar problems. However, here is how I fixed it.






02 September, 2015

Copy & paste tabular data to tables in LyX

As it can be cumbersome to manually transfer data from an external source to a table in LyX, here is trick to speed up the process.

Let's assume the data is stored comma-separated in a text file with the following content.
1920,1080,48,24,40,45,1920,1080,0%
1920,1080,46,23,41,46,1886,1058,3.8%
1920,1080,45,23,42,46,1890,1058,3.6%
1920,1080,44,22,43,49,1892,1078,1.6%
1920,1080,43,22,44,49,1892,1078,1.6%
1920,1080,42,21,45,51,1890,1071,2.4%
1920,1080,41,21,46,51,1886,1071,2.6%
1920,1080,40,20,48,54,1920,1080,0.0%
Then copy&pasting (CMD+SHIFT+V or CTRL+SHIFT+V) this text into a table in LyX will only fill the first table column. However, if the data were TAB-separated as follows,
1920 1080 48 24 40 45 1920 1080 0%
1920 1080 46 23 41 46 1886 1058 3.8%
1920 1080 45 23 42 46 1890 1058 3.6%
1920 1080 44 22 43 49 1892 1078 1.6%
1920 1080 43 22 44 49 1892 1078 1.6%
1920 1080 42 21 45 51 1890 1071 2.4%
1920 1080 41 21 46 51 1886 1071 2.6%
1920 1080 40 20 48 54 1920 1080 0.0%
then LyX will nicely fill your table. This should also work for data that is copy pasted from spreadsheet applications.

If your spreadsheet does not copy&paste using TAB-separated data in the clipboard, then you can use some Vim magic. Paste the, e.g., space-separated, data into Vim and replace spaces with tabs, i.e., type :%s# #\t#gc (ENTER).

Note: Don't forget to use format-preserving paste (CMD+SHIFT+V or CTRL+SHIFT+V) instead of the normal paste in LyX (CMD+V or CTRL+V)

PS: I know that LyX also offers to import external files, but I often like to have all my text editable inside the document, e.g., to add colors, change font sizes, add footnotes, or use Math enviroments in the data.

27 August, 2015

Using Skim as preview tool with LyX/LaTeX

OS X's Preview app does not support reloading of updated PDFs and has no PDFSync support.
Here is quick guide to set up Skim, which is a better app for this purpose.

1. Installation

Simply download and setup Skim and LyX, following their instructions.

2. Setup PDFSync Support

Skim -> Prefs -> Sync
    [x] Check for file changes 
    Preset: [Lyx] 

LyX -> Prefs -> Output 
    PDF command: /Applications/Skim.app/Contents/SharedSupport/displayline $$n $$o $$t 

LyX -> Prefs -> File Handling -> File Formats -> Format: [PDF (pdflatex)]
    Viewer: [Custom] [open -a Skim.app $$i] (click Apply)

Repeat previous step for PDF (LuaTeX) and PDF (XeTeX)

3. Improve Output

The text in Skim can be blurry on some systems OS X versions. You can try messing around with the font smoothing to fix that.

defaults write -app Skim AppleFontSmoothing -integer 1

worked well for me.

Links

Skim hidden preferences: http://sourceforge.net/p/skim-app/wiki/Hidden_Preferences/#system-overrides
LyX SyncTeX help: http://wiki.lyx.org/LyX/SyncTeX

24 August, 2015

LaTeX/LyX positioning floating text on top of a page

To publish preprint versions of papers, you often need to add a remark on top of the first page. If you do not want to mess around with the final PDF, there is are several ways to add floating text, with absolute positioning.

Here is my current solution.

% optional box and link packages (LyX adds them automatically)
%\usepackage{framed}
%\usepackage{color}
%\usepackage{hyperref}

% import textpos in LaTeX preamble
\usepackage[absolute]{textpos}

% set default positioning parameters
\setlength{\TPHorizModule}{10mm}
\setlength{\TPVertModule}{\TPHorizModule}
\textblockorigin{0mm}{0mm} % start content at the top-left corner
\setlength{\parindent}{0pt}

\definecolor{shadecolor}{rgb}{1, 0.80078125, 0}

\begin{textblock}{2}(0,0)
\begin{minipage}[t]{1\paperwidth}%
\begin{shaded}%
\begin{center}
\textbf{\textcolor{blue}{Preprint version for self-archiving.}}\\
The final publication is available at [Name of Publisher] via
\textbf{\href{http://dx.doi.org}{http://dx.doi.org/[DOI Number]}.} 
\end{center}
\end{shaded}%
\end{minipage}
\end{textblock}

The output looks as follows.

15 August, 2015

Fun with functions on WolframAlpha

Trying to plot some functions on WolframAlpha I stumbled upon this one.

 f(x,y) = (x2 + y2) * sin( 1 / (x2 + y2) ) 

It creates some nice apparently not-fully symmetric contour plot, such as this one [1]:

Here are two different views [2,3] on the function:


Seeing such images, always inspires me to start my own algorithmic fine arts projects.

Image Sources

[1] WolframAlpha plot of f(x,y) with -0.01 < x,y < 0.01
[2] WolframAlpha plot of f(x,y) with -1 < x,y < 1
[3] WolframAlpha plot of f(x,y) with -0.5 < x,y < 0.5


PS: I will let you know, when I found the time for such procrastination.

09 August, 2015

VLDB Journal paper on visualization-driven data aggregation

Dear readers,
I like to inform you that the main paper about my research of the past 2-3 years has been published [1] in the The VLDB Journal. It is based on my award-winning paper about the M4 aggregation [2] for line charts. It generalizes and extends the M4 approach to the most common chart types, such as bar charts, scatter plots, space-filling visualizations, and also describes how to conduct visualization-driven data aggregation in chart matrices.

For more details, I suggest to have a look at the papers [1, 2].

[1VDDA: automatic visualization-driven data aggregation in relational databases
U Jugel, Z Jerzak, G Hackenbroich, V Markl
The VLDB Journal
2014, DOI 10.1007/s00778-015-0396-z

[2M4: A Visualization-Oriented Time Series Data Aggregation
U Jugel, Z Jerzak, G Hackenbroich, V Markl
Proceedings of the VLDB Endowment 7 (10), 797 - 808, 2014, (best paper award!)

 
 

05 August, 2015

Listing code in LaTeX/LyX

In the past I used external tools to provide listings for my papers. However, with 90% of my listings being SQL today, I am regularly editing and tweaking my listings for readability, which is easier when having them directly inside my LyX document. I had the fear that this would impair code highlighting. It doesn't, since the default listings package provides a ton of options [1].

Current working mode and options

  1. Use default listings package
  2. Want floated listings (in LyX)? -> (Right-Click >Settings > Placement > Float [x])
  3. Float needs caption (in LyX)? -> (Menu > Insert > Caption)
  4. Want to add/change colors? -> use colors + listing options
    \definecolor{byzantium}{rgb}{0.44, 0.16, 0.39}
    commentstyle={\color{byzantium}\textit}
  5. Want more colors? -> go to http://latexcolor.com/
  6. Want to add keywords? -> use listing options 
    language=SQL
    morekeywords={WITH} 
  7. Want to emphasize words? -> use listing options
    emph={Q,Q_c,Q_r,Q_b,Q_d,Q_}
    emphstyle={\color{blue}\textbf}
LyX provides the (Document > Settings > LaTeX preamble) to add colors.
LyX also to define default listing properties in (Document  > Settings > Listings).

Here are my current defaults.
basewidth={0.5em}
basicstyle={\ttfamily\small}
breaklines=true
columns=flexible
commentstyle={\color{byzantium}\textit}
emph={Q,Q_c,Q_r,Q_b,Q_d,Q_}
emphstyle={\color{blue}\textbf}
keepspaces=true
keywordstyle={\color{darkmidnightblue}}
language=SQL
morekeywords={WITH}
tabsize=4

Here is some example output.


Links


[1] Carsten Heinz, Brooks Moses, Jobst Ho mann. The Listings Package.
      http://mirror.unl.edu/ctan/macros/latex/contrib/listings/listings.pdf

24 July, 2015

Programming exercise: find palindromic numbers!

Here is the task:
"A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99. Find the largest palindrome made from the product of two 3-digit numbers." (originally posted by Joseph Farah)
Here is my quick solution. I know that the palindrome test may be improved, e.g., by excluding some of the numbers based on the arithmetic properties.
#!/usr/bin/env coffee
{floor} = Math
{now}   = Date

print   = console.log.bind console, "palindromes:"
assert  = (expr,msg) -> throw new Error "AssertionError: #{msg}" unless expr
bench   = (name,fn) -> t = now(); r = fn(); print "#{name} took #{now() - t}ms"; r

isPalindrome = (n) ->
  str  = "#{n}";        len  = str.length
  mid  = floor len / 2; last = len - 1
  for i in [0..mid]
    if str[i] != str[last - i] then return false
  return true

findPalindromes = (n=999) ->
  list = []
  for x in [0..n]
    for y in [x..n]
      list.push {x,y,palindrome: p} if isPalindrome p = x*y
  list

byPalindrome = (a,b) -> a.palindrome - b.palindrome

testSuite1 = do ->
  f1 = (n) -> assert     isPalindrome(n), "'#{n}' must be a palindrome!"
  f2 = (n) -> assert not isPalindrome(n), "'#{n}' must not be a palindrome!"

  [9009, 919, 1, 11, 123321, "abba", "aba", NaN, "" ].forEach f1
  [991,9919,10,"01","abc",null,-1,undefined].forEach f2

main = (n) ->
  list = bench "findPalindromes(#{n})", -> findPalindromes n
  largest = list.sort(byPalindrome)[list.length - 1]
  print "found #{list.length} palindromes:", {last5: list[-5..], largest}

module.exports = {isPalindrome,findPalindromes,byPalindrom}

if process.mainModule == module then main 1 * process.argv[2]


Example output on my 2.5Ghz Macbook:
$ ./palindrom.coffee 999
palindromes: findPalindromes(999) took 52ms
palindromes: found 3539 palindromes: { last5:
   [ { x: 894, y: 957, palindrome: 855558 },
     { x: 924, y: 932, palindrome: 861168 },
     { x: 916, y: 968, palindrome: 886688 },
     { x: 924, y: 962, palindrome: 888888 },
     { x: 913, y: 993, palindrome: 906609 } ],
  largest: { x: 913, y: 993, palindrome: 906609 } }

$ ./palindrom.coffee 9999
palindromes: findPalindromes(9999) took 6893ms
palindromes: found 36433 palindromes: { last5:
   [ { x: 9631, y: 9999, palindrome: 96300369 },
     { x: 9721, y: 9999, palindrome: 97200279 },
     { x: 9811, y: 9999, palindrome: 98100189 },
     { x: 9867, y: 9967, palindrome: 98344389 },
     { x: 9901, y: 9999, palindrome: 99000099 } ],
  largest: { x: 9901, y: 9999, palindrome: 99000099 } }


Update: I just needed to take a break from writing my PhD thesis and optimized the search for the largest palindrome. Here is an improved version of the above script.

findPalindromes = (n=999,pruneSmallProducts=false) ->
  list = []
  pmax = x:0, y:0, palindrome:0
  for x in [n..0]
    break if x * n < pmax.palindrome and pruneSmallProducts
    for y in [n..x]
      p = x * y
      break if p < pmax.palindrome and pruneSmallProducts
      if isPalindrome p
        list.push o = {x,y,palindrome:p}
        pmax = o if o.palindrome > pmax.palindrome
  list

#...

main = (n) ->
  list = bench "findPalindromes(#{n})", -> findPalindromes n,true
  largest = list.sort(byPalindrom)[list.length - 1]
  print "found #{list.length} palindromes:", {last5: list[-5..], largest}

The speed up is significant.
$ ./palindrom.coffee 999
palindroms: findPalindromes(999) took 1ms
palindroms: found 2 palindromes: { last5:
   [ { x: 924, y: 962, palindrome: 888888 },
     { x: 913, y: 993, palindrome: 906609 } ],
  largest: { x: 913, y: 993, palindrome: 906609 } }
$ ./palindrom.coffee 9999
palindroms: findPalindromes(9999) took 1ms
palindroms: found 1 palindromes: { last5: [ ... ],
  largest: { x: 9901, y: 9999, palindrome: 99000099 } }
$ ./palindrom.coffee 99999
palindroms: findPalindromes(99999) took 19ms
palindroms: found 1 palindromes: { last5: [ ... ],
  largest: { x: 99681, y: 99979, palindrome: 9966006699 } }
$ ./palindrom.coffee 999999
palindroms: findPalindromes(999999) took 129ms
palindroms: found 1 palindromes: { last5: [ ... ],
  largest: { x: 999001, y: 999999, palindrome: 999000000999 } }
$ ./palindrom.coffee 9999999
palindroms: findPalindromes(9999999) took 1402ms
palindroms: found 1 palindromes: { last5: [ ... ],
  largest: { x: 9997647, y: 9998017, palindrome: 99956644665999 } }

How fast does your browser render line charts?

Here is quick demo that tries to draw hundreds to millions of lines using your browsers drawing APIs of the HTML canvas. (Click on the "Result" of tab of the JSFiddle to show the app)

For instance, you can try to set the number of lines to 6 Million, for which I get the following results on my Macbook.

Firefox 39 Chrome 44
= 817 ms
= 3047 ms
= 5484 ms
= 55383 ms  
= 1490 ms
= 3066 ms
= 4116 ms
= 13789 ms  

You may need to increase tmax to 60k or more milliseconds on slower machines or browsers. Otherwise the processing will stop after  tmax ms

Feel free to post your results in the comments. Note that this demo is copy my former jsPerf test. However, since jsPerf is down due to spam activities, I moved it over to JSFiddle for everyone to experiment with it.

Cheers, Juve

23 July, 2015

Line Rendering Demo

Sorry, I had to remove this older post of my line rendering demo, to be able to modify the permalink for one of my publications.

Disable page numbers in table of contents in Lyx/LaTeX

I am using the KOMA-Script book document class for my thesis and was irritated that my

\thispagestyle{empty}

commands were ignored in LyX (LaTeX). Luckily, there is a soluton. Just add the following code after the TOC.

\addtocontents{toc}{\protect\thispagestyle{empty}}

Hint: In "book" classes you may often also use \frontmatter and \mainmatter to indicate where the parts of your book start. This way you do not have to change the \pagenumbering manually.

13 July, 2015

Tiny linereader implementation (CoffeeScript)

Data processing in nodejs requires handling data in Buffer objects.
Here is a quick implementation for parsing + appending incoming buffer data to an array.


  readline = (trg) ->
    len0 = trg.length
    (buf) ->
      len    = trg.length
      lines  = "#{buf}".split "\n"
      if len - len0 > 0 then trg[len - 1] = trg[len - 1] + lines[0]
      else                   trg.push lines[0]
      trg.push line for line in lines[1..]
      return

  selfTest = do ->
    input = []
    buf = (str) -> input.push new Buffer str
    buf "abc\nx"
    buf "yz\n"
    buf "12"
    buf "34"
    buf "\n"

    output = ["opq",""]
    input.forEach readline(output)

    assert.equal output.join(";"), "opq;;abc;xyz;1234;", "Messed up!"

22 April, 2015

Started using Spark + Scala this week. Very impressive!

As the data for my dissertation is growing to become really "big data" (several GB), I was looking for new tools, beyond my trusted relational databases (PostgreSQL, MonetDB, etc.).

Spark

I found Apache Spark, which provides Python, Java, and Scala APIs to define queries on big data files. The files are served via Hadoop (delivered with Spark) to parallelize operations on the data.

Starting a Spark cluster is very easy, once you have configured the master correctly. There are some pitfalls, as Spark is very picky regarding hostnames, i.e., you better always use the full hostname with correct domains in all start scripts, config files and your application code. I won't go into the details here.

The performance of Spark is really good. It can run an M4 query on 1×10M records (200MB) in 900ms, and easily handles large data volumes, e.g. 100×1M records (2GB, 8s) or 10k×100k records (20GB, 13min). Very nice for analytical workloads on big data sources. During query execution, Spark effectively uses all 8 cores of my Macbook and I plan to improve the query response times  by running my tests on a really big cluster to provide "near-interactive" response times.

Scala

Spark is nice, but what actually motivated me for this post was to praise Scala. As a big fan of CoffeeScript, I like short (but readable) notations instead of useless repetition of names and keywords, as required in many legacy programming languages.

Scala has everything that makes a programmers life easier. Here are my favorite features:
  • Implicit variable declarations (val obj = MyType())
  • Short notation for finals (val for final values, var for variables)
  • Lambda expressions (definition of short inline, anonymous functions)
  • List comprehension (returning loop results as lists)
  • Easily passing functions as objects (as in Javascript)
  • Implicit function calls (using obj.someFunc instead of obj.someFunc())
  • Everything is an expression (no return required)
  • Short function keyword (def or => instead of function)
Awesome, I can have all these features and still get the bonus of type-safety! The code-completion in Scala IDE works quite nicely.

Here are a few Scala code examples, implementing the subqueries of my visualization-driven data aggregation  (VDDA).

Example 1: M4 grouping function.
    val Q_g  = Q_f.keyBy( row =>
      ( Math.floor( w*(row(iT) - t1)/dt ) + row(iID) * w ).toLong
    )

Example 2: M4 aggregation.
    def aggM4Rows ...
    def toRows4 ...
    val A_m4 = Q_g.map({case (k,row) => (k,toRows4(row))}).reduceByKey(aggM4Rows)   

Example 3: Counting the number of unique records.
    val recordCount = Q_m4.distinct.count  

Using Spark's Scala API makes these queries easy to define and to read, so that my Spark/Scala implementation of M4/VDDA is not much longer than the SQL queries in my research papers.

Spark + Scala = Big Data processing made easy!


Use rsync instead of scp to resume copying big data files!

For my dissertation I am conducting experiments on big data sources, such as 10k time series with 100k+ records each. The corresponding files comprise several gigabytes of data. Copying such files may take very long, since I work from a remote location, not sitting next to the data centers where the data is to be processed. Therefore, I need to be able to resume big data file uploads to the machines of the data centers.

I usually use scp to copy files between machines:
scp data/*.csv juve@machine.company.corp:/home/juve/data
Unfortunately, scp can't resume any previous file transfers. However, you can use rsync with ssh to be able to resume:
rsync --rsh='ssh' -av --progress --partial data/*.csv \
   juve@machine.company.corp:/home/juve/data 
If you cancel the upload, e.g., via CTRL+C, yo can later resume the upload using the --partial option for rsync.

Very simple. No GUI tools required. Ready for automation.

25 March, 2015

Readable and Fast Math in SQL

For my dissertation, I write a lot of SQL queries, doing some Math on the data. For instance, the following query computes the relative times from a numeric timestamp t, and scales the result up by 10000.

-- query 1
with Q    as (select t,v from csv_upload),
     Q_b  as (select min(t) as t_min, max(t) as t_max from Q)
select 10000 * (t - (select t_min from Q_b))
             / (select t_max - t_min from Q_b) as tr from Q

As you can see, I use CTEs to be able to read my code ;-). However, the select statements in the final subqueries, extracting scalar values from the computed relations with one record, impair the readability of the actual Math that is to be computed.

That is why modern SQL databases allow columns from parent subqueries to be used in nested child subqueries. The following query computes the same result.

-- query 2
with Q    as (select * from csv_upload),
     Q_b  as (select min(t) as t_min, max(t) as t_max from Q)
select (select 10000 * (t     - t_min)
                     / (t_max - t_min) from Q_b) as tr from Q

Finally, another, if not the best way of writing such queries is the following.

-- query 3
with Q    as (select * from csv_upload),
     Q_b  as (select min(t) as t_min, max(t) as t_max from Q)
select 10000 * (t     - t_min)
             / (t_max - t_min) as tr from Q,Q_b

Even though all three queries are very similar, and yield the same result, I saw notable differences in query execution time. In general, query 2 was a bit slower, and query 3 was a bit faster than the others.

Conclusion
For my queries, using nested columns improves readability but decreases performance. If you have computed relations with one record, such as the boundary subquery Qb, it is safe to join these relations with your data.

11 March, 2015

A Case for CoffeeScript: Object Composition

I have been using CoffeeScript for over four years now (since 2011) and will never go back.1
Here is a snippet that may tell you why. It uses several basic features of CoffeeScript that make code more readable and much shorter than the vanilla JavaScript version of the same code (at the right side).

# Use CoffeeScript and stay DRY! (Don't repeat yourself)     var $f, abc1, abc2, abc3, framework;
# For instance, by using short notation {a,b,c,...}
# for object composition from variables.                       $f = framework = (function() {
#                                                              var count;
# Here is a complete example, using the notation               count = 0;
# to reduce the number of lines of code (LoC)                  return {
# of an artificial object creation framework:                    createProp: function(name, n) {
                                                                   return "This is " + name + " no. " + n;
$f = framework = do ->                                           },
  count = 0                                                      enhanceProp: function(prop) {
  createProp:  (name,n) -> "This is #{name} no. #{n}"              return "" + prop + ", enhanced!";
  enhanceProp: (prop)   -> "#{prop}, enhanced!"                  },
  createAbcObject: ->                                            createAbcObject: function() {
    # 1. basic variable setup                                      var a, b, c;
    a = $f.createProp "a",count                                    a = $f.createProp("a", count);
    b = $f.createProp "b",count                                    b = $f.createProp("b", count);
    c = $f.createProp "c",count                                    c = $f.createProp("c", count);
                                                                   if (count === 0) {
    # 2. more fiddling with the variables ...                        a = $f.enhanceProp(a);
    if count == 0 then a = $f.enhanceProp a                        }
    count++                                                        count++;
                                                                   return {
    # 3. finally compose and return the a-b-c object                 a: a,
    {a,b,c}                                                          b: b,
                                                                     c: c
abc1 = $f.createAbcObject()                                        };
                                                                 }
abc2 = $f.createAbcObject()                                    };
                                                             })();
abc3 = $f.createAbcObject()
                                                             abc1 = $f.createAbcObject();
# You can also use it for DRY logging
# to avoid quoting var names                                 abc2 = $f.createAbcObject();

console.log "objects created", {abc1,abc2,abc3}              abc3 = $f.createAbcObject();

# OMG! Over 50% LoC saved. Even with all these               console.log("objects created", {
# comments, CoffeeScript is still shorter and more             abc1: abc1,
# readable than the JavaScript version of the code.            abc2: abc2,
#                                                              abc3: abc3
# Stay DRY! Use CoffeeScript!                                });


1Unless somebody pays me enough money to waste my time using vanilla JS ;-).

16 February, 2015

Showing the progress of awk scripts

When running awk scripts on big data files, you may want to know how long the process will take. Here is a simple script that will output the fraction of the data that has been processed and an estimate when the processing is finished:
BEGIN {
    ecat="cat >&2"
    clear="\33[2K\r"
    start=systime()
    lines=18000000
}

{
    if(NR%1000  == 0) {
        frac = NR/lines
        elapsed = systime() - start
        eta = elapsed/frac/60
        printf("%s %f% (ETA: %i minutes)", clear, frac*100, eta)  | ecat
    }
}
The script uses the shell escape commands to reset the last printed line, so that the fraction and ETA values are always on the same line in your shell. It outputs to stderr and does not interfere with the data output to stdout. Example output: 7.061% (ETA: 4 minutes)