On Tao’s talk and the 3-dimensional Hilbert-Smith conjecture

May 6, 2012

Last Wednesday Terry Tao briefly dropped by our little town and gave a colloquium. Surprisingly this is only the second time I hear him talking (the first one goes back to undergrad years in Toronto, he talked about arithmetic progressions of primes, unfortunately it came before I learned anything [such as those posts] about Szemeredi’s theorem). Thanks to the existence of blogs, feels like I knew him much better than that!

This time he talked about Hilbert’s 5th problem, Gromov’s polynomial growth theorem for discrete groups and their (Breuillard-Green-Tao) recently proved more general analogy of Gromov’s theorem for approximate groups. Since there’s no point for me to write 2nd-handed blog post while people can just read his own posts on this, I’ll just record a few points I personally found interesting (as a complete outsider) and moving on to state the more general Hilbert-Smith conjecture, very recently solved for 3-manifolds by John Pardon (who now graduated from Princeton and became a 1-st year grad student at Stanford, also appeared in this earlier post when he gave solution to Gromov’s knot distortion problem).

Warning: As many of you know I never take notes during talks, hence this is almost purely based on my vague recollection of a talk half a week ago, inaccuracy and mistakes are more than possible.

All topological groups in this post are locally compact.

Let’s get to math~ As we all know, a Lie group is a smooth manifold with a group structure where the multiplication and inversion are smooth self-diffeomorphisms. i.e. the object has:

1. a topological structure
2. a smooth structure
3. a group structure

It’s not too hard to observe that given a Lie group, if we ‘forget’ the smooth structure and just see it as a topological group which is a (topological) manifold, then we can uniquely re-construct the smooth structure from the group structure. From my understanding, this is mainly because given any element in the topological group we can find a unique homomorphism of the group \mathbb{R} into the manifold, sending 0 to identity and 1 to the element. resulting a class of curved through the identity, a.k.a the tangent space. Since the smooth structure is determined by the tangent space of the identity, all we need to know is how to ‘multiply’ two such parametrized curves.

The way to do that is to ‘zig-zag’:

Pick a small \varepsilon, take the image of \varepsilon under the two homomorphisms, alternatingly multiplying them to obtain a sequence of points in the topological group. As \varepsilon \rightarrow 0 the sequence becomes denser and converges to a curve.

The above shows that given a Lie group to start with, the smooth structure is uniquely determined by the topological group structure. Knowing this leads to the natural question:

Hilbert’s fifth problem: Is it true that any topological group which are (topological) manifolds admits a smooth structure compatible with group operations?

Side note: I had a little post-colloquium discussion with our fellow grad student Sam Lewallen, he asked:

Question: Is it possible for the same topological manifold to have two different Lie group structures where the induced smooth structures are different?

Note that neither the above nor Hilbert’s fifth problem shows such thing is impossible, since they both start with the phase ‘given a topological group’. My *guess* is this should be possible (so please let me know if you know the answer!) The first attempt might be trying to generate an exotic \mathbb{R}^4 from Lie group. Since the 3-dimensional Heisenberg group induces the standard (and unique) smooth structure on \mathbb{R}^3, I guess the 4-dimensional Heisenberg group won’t be exotic.

Anyways, so the Hilbert 5th problem was famously solved in the 50s by Montgomery-Zippin and Gleason, using set-theoretical methods (i.e. ultrafilters).

Gromov comes in later on and made the brilliant connection between (infinite) discrete groups and Lie groups. i.e. one see a discrete group as a metric space with word metric, ‘zoom out’ the space and produce a sequence of metric spaces, take the limit (Gromov-Hausdorff limit) and obtain a ‘continuous’ space. (which is ‘almost’ a Lie group in the sense that it’s an inverse limit of Lie groups.)

Hence he was able to adapt the machinery of Montgomery-Zippin to prove things about discrete groups:

Theorem: (Gromov) Any group with polynomial growth is virtually nilpotent.

Side note: I learned about this through the very detailed and well-presented course by Dave Gabai. (I thought I must have blogged about this, turns out I haven’t…)

The beauty of the theorem is (in my opinion) that we are given any discrete group, and all that’s known is how large the balls are (in fact, not even that, we know how large the large balls grow), yet the conclusion is all about the algebraic structure of the group. To learn more about Gromov’s work, see his paper. Although unrelated to the rest of this post, I shall also mention Bruce Kleiner’s paper where he proved Gromov’s theorem without using Hilbert’s 5th problem, instead he used space of harmonic maps on graphs.

Now we finally comes to a point of briefly mentioning the work of Tao et.al.! So they adopted Gromov’s methods of limiting and ‘ultra-filtering’ to apply to stuff that’s not even a whole discrete group: Since Gromov’s technique was to take the limit of a sequence of metric spaces which are zoomed out versions of balls in a group, but the Gromov-Hausdorff limit actually doesn’t care about the fact that those spaces are zoomed out from the same group, they may as well be just a family of subsets of groups with ‘bounded geometry’ of a certain kind.

Definition: An K-approximate group S is a (finite) subset of a group G where S\cdot S = \{ s_1 s_2 \ | \ s_1, s_2 \in S \} can be covered by K translates of S. i.e. there exists p_1, \cdots, p_K \in G where S \cdot S \subseteq \cup_{i=1}^k p_i \cdot S.

We shall be particularly interested in sequence of larger and larger sets (in cardinality) that are K-approximate groups with fixed K.

Examples:
Intervals [-N, N] \subseteq \mathbb{Z} are 2-approximate groups.

Balls of arbitrarily large radius in \mathbb{Z}^n are C \times 2^n approximate groups.

Balls of arbitrarily large radius in the 3-dimensional Heisenberg group are C \times 2^4 approximate groups. (For more about metric space properties of the Heisenberg group, see this post)

Just as in Gromov’s theorem, they started with any approximate group (a special case being sequence of balls in a group of polynomial growth), and concluded that they are in fact always essentially balls in Nilpotent groups. More precisely:

Theorem: (Breuillard-Green-Tao) Any K-approximate group S in G is covered by C(K) many translates of subgroup G_0 < G where G_0 has a finite (depending only on K) index nilpotent normal subgroup N.

With this theorem they were able to re-prove (see p71 of their paper) Cheeger-Colding’s result that

Theorem: Any closed n dimensional manifold with diameter 1 and Ricci curvature bounded below by a small negative number depending on n must have virtually nilpotent fundamental group.

Where Gromov’s theorem yields the same conclusion only for non-negative Ricci curvature.

Random thoughts:

1. Can Kleiner’s property T and harmonic maps machinery also be used to prove things about approximate groups?

2. The covering definition as we gave above in fact does not require approximate group S to be finite. Is there a Lie group version of the approximate groups? (i.e. we may take compact subsets of a Lie group where the self-product can be covered by K many translates of the set.) I wonder what conclusions can we expect for a family of non-discrete approximate groups.

As promised, I shall say a few words about the Hilbert-Smith conjecture and drop a note on the recent proof of it’s 3-dimensional case by Pardon.

From the solution of Hilbert’s fifth problem we know that any topological group that is a n-manifold is automatically equipped with a smooth structure compatible with group operations. What if we don’t know it’s a manifold? Well, of course then they don’t have to be a Lie group, for example the p-adic integer group \mathbb{Z}_p is homeomorphic to a Cantor set hence is not a Lie group. Hence it makes more sense to ask:

Hilbert-Smith conjecture: Any topological group acting faithfully on a connected n-manifold is a Lie group.

Recall an action is faithful if the homomorphism \varphi: G \rightarrow homeo(M) is injective.

As mentioned in Tao’s post, in fact \mathbb{Z}_p is the only possible bad case! i.e. it is sufficient to prove

Conjecture: \mathbb{Z}_p cannot act faithfully on a finite dimensional connected manifold.

The exciting new result of Pardon is that by adapting 3-manifold techniques (finding incompressible surfaces and induce homomorphism to mapping class groups) he was able to show:

Theorem: (Pardon ’12) There is no faithful action of \mathbb{Z}_p on any connected 3-manifolds.

And hence induce the Hilbert-Smith conjecture for dimension 3.

Discovering this result a few days ago has been quite exciting, I would hope to find time reading and blogging about that in more detail soon.


A train track on twice punctured torus

April 22, 2012

This is a non-technical post about how I started off trying to prove a lemma and ended up painting this:

One of my favorite books of all time is Thurston‘s ‘Geometry and Topology of 3-manifolds‘ (and I just can’t resist to add here, Thurston, who happen to be my academic grandfather, is in my taste simply the coolest mathematician on earth!) Anyways, for those of you who aren’t topologists, the book is online and I have also blogged about bits and parts of it in some old posts such as this one.

I still vividly remember the time I got my hands on that book for the first time (in fact I had the rare privilege of reading it from an original physical copy of this never-actually-published book, it was a copy on Amie‘s bookshelf, which she ‘robbed’ from Benson Farb, who got it from being a student of Thurston’s here at Princeton years ago). Anyways, the book was darn exciting and inspiring; not only in its wonderful rich mathematical content but also in its humorous, unserious attitude — the book is, in my opinion, not an general-audience expository book, but yet it reads as if one is playing around just to find out how things work, much like what kids do.

To give a taste of what I’m talking about, one of the tiny details which totally caught my heart is this page (I can’t help smiling each time when flipping through the book and seeing the page, and oh it still haunts me >.<):

This was from the chapter about Kleinian groups, when the term ‘train-track’ was first defined, he drew this image of a train(!) on moving on the train tracks, even have smoke steaming out of the engine:

To me such things are simply hilarious (in the most delightful way).

Many years passed and I actually got a bit more into this lamination and train track business. When Dave asked me to ‘draw your favorite maximal train track and test your tube lemma for non-uniquely ergodic laminations’ last week, I ended up drawing:

Here it is, a picture of my favorite maximal train track, on the twice punctured torus~! (Click for larger image)

Indeed, the train is coming with steam~

Since we are at it, let me say a few words about what train tracks are and what they are good for:

A train track (on a surface) is, just as one might expect, a bunch of branches (line segments) with ‘switches’, i.e. whenever multiple branches meet, they must all be tangent at the intersecting point, with at least one branch in each of the two directions. By slightly moving the switches along the track it’s easy to see that generic train track has only switches with one branch on one side and two branches on the other.

On a hyperbolic surface S_{g,p}, a train track is maximal if its completementry region is a disjoint union of triangles and once punctured monogons. i.e. if we try to add more branches to a maximal track, the new branch will be redundant in the sense that it’s merely a translate of some existing branch.

As briefly mentioned in this post, train tracks give natural coordinate system for laminations just like counting how many times a closed geodesic intersect a pair of pants decomposition. To be slightly more precise, any lamination can be pushed into some maximal train track (although not unique), once it’s in the track, any laminations that’s Hausdorff close to it can be pushed into the same track. Hence given a maximal train track, the set of all measured laminations carried by the train track form an open set in the lamination space, (with some work) we can see that as measured lamination they are uniquely determined by the transversal measure at each branch of the track. Hence giving a coordinate system on \mathcal{ML})(S).

Different maximal tracks are of course them pasted together along non-maximal tracks which parametrize a subspace of \mathcal{ML}(S) of lower dimension.

To know more about train tracks and laminations, I highly recommend going through the second part of Chapter 8 of Thurston’s book. I also mentioned them for giving coordinate system on the measured lamination space in the last post.

In any case I shall stop getting into the topology now, otherwise it may seem like the post is here to give exposition to the subject while it’s actually here to remind myself of never losing the Thurston type childlike wonder and imagination (which I found strikingly larking in contemporary practice of mathematics).


Filling and unfilling measured laminations

April 10, 2012

(images are gradually being inserted ~)

I’m temporarily back into mathematics to (try) finish up some stuff about laminations. While I’m on this, I figured maybe sorting out some very basic (and cool) things in a little post here would be a good idea. Browsing through the blog I also realized that as a student of Dave’s I have been writing surprisingly few posts related to what we do. (Don’t worry, like all other posts in this blog, I’ll only put in stuff anyone can read and hopefully won’t be bored reading :-P)

Here we go. As mentioned in this previous post, my wonderful advisor has proved that the ending lamination space is connected and locally connected (see Gabai’08).

Definition: Let S_{g,p} be a hyperbolic surface of genus g and p punctures. A (geodesic) lamination L \subseteq S is a closed set that can be written as a disjoint union of geodesics. i.e. L = \sqcup_{\alpha \in I} \gamma_\alpha where each \gamma_\alpha is a (not necessary closed) geodesic, \gamma is called a leaf of L.

Let’s try to think of some examples:

i) One simple closed geodesic

ii) A set of disjoint simple closed geodesics

iii) A non-closed geodesic spirals onto two closed ones

iV) Closure of a single simple geodesic where transversal cross-sections are Cantor-sets

An ending lamination is a lamination where
a) the completement S \backslash L is a disjoint union of discs and once punctured discs (filling)
b) all leaves are dense in L. (minimal)

Exercise: example i) satisfies b) and example iv) as shown satisfies both a) and b) hence is the only ending lamination.

It’s often more natural to look at measured laminations, for example as we have seen in the older post, measured laminations are natural generalizations of multi-curves and the space \mathcal{ML}(S) is homeomorphic to \mathbb{R}^{6g-6+2p} (Thurston) with very natural coordinate charts (given by train-tracks).

Obviously not all measured laminations are supported on ending laminations (e.g. example i) and ii) with atomic measure on the closed curves.) It is well known that if a lamination fully supports an invariant measure, then as long as the base lamination satisfies a), it automatically satisfies b) and hence is an ending lamination. This essentially follows from the fact that having a fully supported invariant measure and being not minimal implies the lamination is not connected and hence won’t be filling.

Exercise:Example iii) does not fully support invariant measures.

Scaling of the same measure won’t effect the base lamination, hence we may eliminate a dimension by quotient that out and consider the space of projective measured laminations \mathcal{PML}(S) \approx \mathbb{S}^{6g-7+2p}. Hence we may decompose measured laminations into filling and unfilling ones. i.e.

\mathcal{PML}(S) = \mathcal{FPML}(S) \sqcup \mathcal{UPML}(S)

where \mathcal{FPML}(S) projects to the ending laminations via the forgetting measure map \pi.

This decomposition of the standard sphere \mathbb{S}^{6g-7+2p} is mysterious and very curious in my opinion. To get a sense of this, let’s take a look at the following facts:

Fact 1: \mathcal{UPML} is a union of countably many disjoint hyper-discs (i.e. discs of co-dimension 1).

Well, if a measured lamination is unfilling, it must contain some simple closed geodesic as a leaf (or miss some simple closed geodesic). For each such geodesic C, there are two possible cases:

Case 1: C is non-separating. The set of measured laminations that missed C is precisely the set of projective measured laminations supported on S_{g-1, p+2}, hence homeomorphic to \mathbb{S}^{6g-13+2p+4} = \mathbb{S}^{(6g-7+2p)-2} we may take any such measured lamination, disjoint union with C, we may assign any ratio of wrights to C and the lamination. This corresponds to taking the cone of \mathbb{S}^{(6g-7+2p)-2} with vertex being the atomic measure on C. Yields a disc of dimension (6g-7+2p)-1.

Case 2: C is separating. Similarly, the set of measured laminations missing C is supported on two connected surfaces with total genus g and total punctures p+2.

To describe the set of projective measured laminations missing C, we first determine the ratio of measure between two connected components and then compute the set of laminations supported in each component. i.e. it’s homeomorphic to [0,1] \times \mathbb{S}^{d_1} \times \mathbb{S}^{d_2}/\sim where d_1+d_2 = 6g-2*7+2(p+2) = 6g-10+2p and (0, x_1, y) \sim (0, x_2, y) and (1, x, y_1) \sim (1, x, y_2).

Exercise: check this is a sphere. hint: if d_1 =d_2 = 1, we have:

Again we cone w.r.t. the atomic measure corresponding to C, get a hyper disc.

At this point you may think ‘AH! \mathcal{UPML} is only a countable union of hyper-discs! How complicated can it be?!’ Turns out it could be, and (unfortunately?) is, quite messy:

Fact 2: \mathcal{UPML} is dense in \mathcal{PML}.

This is easy to see since any filling lamination is minimal, hence all leaves are dense, we may simply take a long segment of some leaf where the beginning and end point are close together on some transversal, close up the segment by adding a small arc on the transversal, we get a simple closed geodesic that’s arbitrarily close to the filling lamination in \mathcal{PML}. Hence the set of simple closed curves with atomic measure are dense, obviously implying \mathcal{UPML} dense.

So how exactly does this decomposition look like? I found it very mysterious indeed. One way to look at this decomposition is: we know two \mathcal{UPML} discs can intersect if and only if their corresponding curved are disjoint. Hence in some sense the configuration captures the structure of the curve complex. Since we know the curve complex is connected, we may start from any disc, take all discs which intersect it, then take all discs intersecting one of the discs already in the set, etc.

We shall also note that all discs intersecting a given disc must pass through the point corresponding to the curve at the center. Hence the result will be some kind of fractal-ish intersecting discs:

(image)

Yet somehow it manages to ‘fill’ the whole sphere!

Hopefully I have convinced you via the above that countably many discs in a sphere can be complicated, not only in pathological examples but they appear in ‘real’ life! Anyways, with Dave’s wonderful guidance I’ve been looking into proving some stuff about this (in particular, topology of \mathcal{FPML}). Hopefully the mysteries would become a little clearer over time~!


The travelling salesman problem

March 19, 2012

This is one of those items I should have written about long ago: I first heard about it over a lunch chat with professor Guth; then I was in not one, but two different talks on it, both by Peter Jones; and now, finally, after it appeared in this algorithms lecture by Sanjeev Arora I happen to be in, I decided to actually write the post. Anyways, it seem to live everywhere around my world, hence it’s probably a good idea for me to look more into it.

Has everyone experienced those annoy salesman who keeps knocking on you and your neighbors’ doors? One of their wonderful properties is that they won’t stop before they have reached every single household in the area. When you think about it, in fact this is not so straight foreword to do; i.e. one might need to travel a long way to make sure each house is reached.

Problem: Given N points in \mathbb{R}^2, what’s the shortest path that goes through each point?

Since this started as an computational complexity problem (although in fact I learned the analysts’s version first), I will mainly focus on the CS version.

Trivial observations:

In total there are about N! paths, hence the naive approach of computing all their lengths and find the minimum takes more than N! \sim (N/e)^N time. (which is a long time)

This can be easily improved by a standard divide and concur:

Let V \subseteq \mathbb{R}^2 be our set of points. For each subset S \subseteq V, for each p_1, p_2 \in S, let F(S, p_1, p_2) = length of shortest path from p_1 to p_2 going through all points in S.

Now the value of this F can be computed recursively:

\forall |S| = 2, F(S, p_1, p_2) = d(p_1, p_2);

Otherwise F(S, p_1, p_2) =

\min \{ F(S \backslash \{p_2\}, p_1, q) + d(q, p_2) \ | \ q \in S, q \neq p_1, p_2 \}

What we need is minimum of F(V, p_1, p_2) where p_1, p_2 \in V. There are 2^N subsets of V, for any subset there are \leq N choices for each of p_1, p_2. F(S, p_1, p_2) is a minimum of N numbers, hence O(N) time (as was mentioned in this pervious post for selection algorithm). Summing that up, the running time is 2^N\times N^2\times N \sim O(2^N), slightly better than the most naive way.

Can we make it polynomial time? No. It’s well known that this problem is NP-hard, this is explained well in the wikipedia page for the problem.

Well, what can we do now? Thanks to Arora (2003), we can do an approximate version in polynomial time. I will try to point out a few interesting ideas from that paper. The process involved in this reminded me of the earlier post on nonlinear Dvoretzky problem (it’s a little embracing that I didn’t realize Sanjeev Arora was one of the co-authors of the Dvoretzky paper until I checked back on that post today! >.< ) it turns out they have this whole program about ‘softening’ classic problems and produce approximate versions.

Approximate version: Given N points in \mathbb{R}^2, \forall \varepsilon > 0, find a path \gamma through each point such that length l(\gamma) < (1+\varepsilon)l(\mbox{Opt}).

Of course we shall expect the running time T to be a function of \varepsilon and N, as \varepsilon \rightarrow 0 it shall blow up (to at least exponential in N, in fact as we shall see below, it will blow up to infinity).

The above is what I would hope is proved to be polynomial. In reality, what Arora did was one step more relaxed, namely a polynomial time randomized approximate algorithm. i.e. Given V and \varepsilon, the algorithm produces a path \gamma such that E(l(\gamma)-l(\mbox{Opt}) < \varepsilon). In particular this means more than half the time the route is within (1+\varepsilon) to the optimum.

Theorem (Arora ’03): T(N, \varepsilon) \sim O(N^{1/\varepsilon}) for the randomized approximate algorithm.

Later in that paper he improved the bound to O(N \varepsilon^{C/\varepsilon}+N\log{N}), which remains the best know bound to date.

Selected highlights of proof:

One of the great features in the approximating world is that, we don’t care if there are a million points that’s extremely close together — we can simply merge them to one point!

More precisely, since we are allowing a multiplicative error of \varepsilon, we also have trivial bound l(\mbox{Opt}) > \mbox{    diam}(V), Hence the length can be increased by at least \varepsilon \mbox{diam}(V) which means if we move each point by a distance no more than \varepsilon \mbox{diam}(V) / (4N) and produce a path \gamma' connecting the new points with l(\gamma')< (1+\varepsilon/2)l(\mbox{Opt}), then we can simply get our desired \gamma from \gamma', as shown:

i.e. the problem is “pixelated”: we may bound V in a square box with side length \mbox{diam}(V), divide each side into 8N/\varepsilon equal pieces and assume all points are in the center of the gird cell it lies in (for convenience later in the proof we will assume 8N/\varepsilon = 2^k is a power of 2, rescale the structure so that each cell has side length 1. Now the side length of the box is 8N/\varepsilon = 2^k):

Now we do this so-called quadtree construction to separate the points (reminds me of Whitney’s original proof of his extension theorem, or the diatic squares proof of open sets being countable) i.e. bound V in a square box and keep dividing squares into four smaller ones until no cell contains more than one point.

In our case, we need to randomize the quad tree: First we bound V in a box that’s 4 times as large as our grid box (i.e. of side length 2^{k+1}), shift the larger box by a random vector (-i/2^k,-j/2^k) and then apply the quad tree construction to the larger box:

At this point you may wonder (at least I did) why do we need to pass to a larger square and randomize? From what I can see, doing this is to get

Fact: Now when we pick a grid line at random, the probability of it being an ith level dividing line is 2^i/2^k = 2^{i-k}.

Keep this in mind.

Note that each site point is now uniquely defined as an intersection of no more than k nesting squares, hence the total number of squares (in all levels) in this quad tree cannot exceed N \times k \sim N \times \log{N/\varepsilon}.

Moving on, the idea for the next step is to perturb any path to a path that cross the sides of the square at some specified finite set of possible “crossing points”. Let m be the unique number such that 2^m \in [(\log N)/\varepsilon, 2 (\log N)/ \varepsilon ] (will see this is the best m to choose). Divide sides of each square in our quad tree into 2^m equal segments:

Note: When two squares of different sizes meet, since the number of equally squares points is a power of 2, the portals of the larger square are also portals of the smaller one.

With some simple topology (! finally something within my comfort zone :-P) we may assume the shortest portal-respecting path crosses each portal at most twice:

In each square, we run through all possible crossing portals and evaluate the shortest possible path that passes through all sites inside the square and enters and exists at the specified nodes. There are (2^{4 \times 2^m})^2 \sim (side length)^2 \sim (N/\varepsilon)^2 possible entering-exiting configurations, each taking polynomial time in N (in fact \sim N^{O(1/\varepsilon)} time) to figure out the minimum.

Once all subsquares has their all paths evaluated, we may move to the one-level larger square and spend another \log(N/\varepsilon) \times (N/\varepsilon)^2 operations. In total we have

N \times \log{N/\varepsilon} \times (N/\varepsilon)^2 \times N^{O(1/\varepsilon)}

\sim N^{O(1/\varepsilon)}

which is indeed polynomial in N/\varepsilon many operations.

The randomization comes in because the route produced by the above polynomial time algorithm is not always approximately the optimum path; it turns out that sometimes it can be a lot longer.

Expectation of the difference between our random portal respecting minimum path \mbox{OPT}_p and the actual minimum \mbox{OPT} is bounded simply by the fact that minimum path cannot cross the grid lines more that \mbox{OPT} times. At each crossing, the edge it crosses is at level i with probability 2^{i-k}. to perturb a level i intersection to a portal respecting one requires adding an extra length of no more than 2 \times 2^{k-i}/2^m \sim 2^{k+1-i}/(\log N / \varepsilon):

\displaystyle \mathbb{E}_{a,b}(\mbox{OPT}_p - \mbox{OPT})

\leq \mbox{OPT} \times \sum_{i=1}^k 2^{i-k} \times 2^{k+1-i} / (\log N / \varepsilon)

= \mbox{OPT} \times 2 \varepsilon / \log N < \varepsilon \mbox{OPT}

P.S. You may find images for this post being a little different from pervious ones, that’s because I recently got myself a new iPad and all images above are done using iDraw, still getting used to it, so far it’s quite pleasant!

Bonus: I also started to paint on iPad~

–Firestone library, Princeton. (Beautiful spring weather to sit outside and paint from life!)


Back to the drawing (painting) board

March 12, 2012

OK, so recently I made some attempts in painting on computer (meaning open a blank canvas in Photoshop and paint on a tablet). After spending just a few nights on it, I can’t help wondering ‘why haven’t I done this earlier?!’. Anyways, to help saving fellow mathematicians from making the same mistake, I’m writing this post. (okay you got me, I just want to post because I think it’s cool >..< Oh well, I guess travelling salesman has to wait q few days.)

This one was painted last night (you should blame it if you can’t wait to read about the travelling salesman problem). After reviewing pervious attempts, I found somehow I tend to use less saturated colors when shifting from canvas to digital. Hence I picked a circus scene for color training (the scene was from Disney’s Hunchback of Notre Dame album), seem to work~ Also, before this I have been sketching at a smaller size hence details a little smoother on this one, click to enlarge.

OK, here comes my complete digital painting history to date:

The very first sketch I did the night my tablet arrived (by the way, I’m using a Wacom Intuos, small). This is a scene I saw back in Evanston, when I used to take mid-night long walks along the shore of lake Michigan, there were days when moon rise from the lake, creating a shiny silver region that silently sparkles in the mideast of the otherwise dark water. My picture certainly doesn’t do the justice, but this was one of the views I have been longing to capture but can’t do with a camera. (low light + moving water) People, if you every visit Northwestern, I highly recommend checking the moonrise schedule!

After the first try (which is mainly black and white), I decided that I should get familiar with color mixing techniques, since the eye-dropper tool is non-existant in the real world. So I did a ‘master study’ i.e. pulled out a hard copy of Pixar concept drawing (originally in colored chalk I believe) and tried to get all the colors working ^_^ So here comes my version of radiator springs from Cars~

The next day I went out of my office, sat on the stairs for a couple hour with a computer and tablet, did this quick sketch of “the only interesting piece of architecture around the math department” – Lewis library by Frank Gary. (never liked the green and orange blocks on the building hence I intentionally deleted them :-P) Can’t get into any details because it was past midnight and that day was fairly cold.

In conclusion, digital vs. traditional:

Digital:
1. Much better at anything to do with color (mixing, getting color from another part of the picture, etc);
2. Faster (used to take me weeks to finish a painting)
3. Unlimited resources (creating custom brushes, color won’t run out)

Traditional:
1. More personal satisfactory (well, I guess nothing compares to holding canvas in you hands)
2. Can actually see the brush and stroke at the same time (took me some time to get used to not seeing where my hands are, but in fact not as bad as one might think)

Anyways, I’m about a week old in this field, so don’t believe anything I said :-P. By the way, a side effect of doing this is, I have now started to think about the RGB value each time I see a colored object in real life >.<.

Hope you had fun! (I certainly did)


Adding faulted data to produce accurate results, faster

March 5, 2012

In computer algorithms, there is this thing called ‘soft heap‘, due to Princeton computer scientist Bernard Chazelle (2000), which is basically corrupting a certain percentage of data when building the data structure. When first heard of that, I can’t imagine how on earth could such a thing be anywhere useful.

Indeed it is, and the reason it works is quite interesting to me. Hence I’m writing this post about finding the (real) k-th smallest element in a set via this corrupted data structure. By the way, this was presented to me in a very interesting lecture by Robert Tarjan. For you computer scientists: be warned, I’m completely ignorant in the subject of algorithms.

Part 1: Heap

Naively sorting a set (i.e. compare the first two elements, put the smaller first; compare the 2nd and 3rd, move the third past the 2nd if it’s smaller; then compare the 3rd and the first, etc.) could take \sum_{i=1}^N \sim O(N^2) many comparisons. One can of course do better, the current known fastest sort takes O(N \log N) operations.

One way to achieve O(N \log N) is the so-called heap sort:

A heap is a binary tree with a number (key) in each node and the child of a node always has key larger than that of the node. (i.e. the minimum element is at the root)

Now the problem of sorting breaks into two parts:

i) build a heap with our given keys
ii) sort elements in a heap

As we can see, if we manage to build a heap with all branches having roughly the same length (computer scientist call that a balanced binary tree), there will be 2^n elements at level n. The height of the heap is \log_2 (N+1) \sim O(\log N). Main idea of the sort is to go through all elements and aim for doing a constant times the height many operations at each element (then we would get our N \log N).

Building the heap: Suppose we already have the first n element put in a heap, occupying all nodes in the first k levels and some nodes in the k+1-th level. We put the n+1-th element at the first empty node on the k+1-th level; compare it to its parent, swap if it’s smaller than it’s parent; compare to its parent again… Do this until it’s larger than it’s parent. Note that 1) swap does not change the shape of the tree. 2) when the process stops, we end up with a new heap. Hence inserting this new element takes at most twice the height many operations (compare and swap once at each level).

In total, building the heap takes no more than N \times 2 \log N \sim O(N \log N) operations. (in fact one can do this in O(N) time, but for our purpose it’s not needed)

Sort from the heap: Take the root as the minimum element, delete it from the heap; now compare the two elements in level 1, put the smaller into the root, now compare the two children of the empty node, etc. (if at some stage the empty node has only one child then just move it, stop if the empty node has no child.) At the end, delete the leaf node that’s empty. Again this process takes no more than
twice the height of the tree many operations.

Combining the above two steps, we have N \log N operations.

As one can guess heaps does much more than sorting stuff, once we spent our time to build data into a heap, we can save such structure and update it once the data set changes:

Inserting element: \sim O(\log N) (put the element at the next available leaf and move upwards)
Find minimum: \sim O(1) (well, the min is at the root)
Delete minimum: \sim O(\log N) (Delete, move up the smaller level-2 element, etc.)

OK~ so much for that, let’s move to the more interesting business:

Part 2: Error

The idea is: given a set of data with keys, instead of putting them into a heap, we put them into another structure (i.e. ‘soft heap’) such that the structure assigns a fake key every element and treat it as the real key. It does so in a way that:

i) All elements have fake keys larger than or equal to their real key.
ii) Only a certain percentage (\varepsilon) of elements have fake key different (larger than) real key.

The advantage of doing that is, unlike heap, it now takes only \sim O(1) (amortized) time to delete the minimum element, while building it still takes O(N) time. (Explaining exactly how this is done requires another post which I might write…for now, check out Chazelle ’00.)

Meanwhile, I guess the most curious thing is, how can we get any uncorrupted results with a corrupted data structure? As an example, let’s see how to find the (genuine) medium with data structure having \varepsilon precent of corrupted keys:

The mean observation is that, since all fake keys are larger than real keys, by deleting the smallest 1/2-\varepsilon of elements (according to fake keys), the genuine medium will be larger than all deleted elements (in real key) and hence surely remains in the set.

Now we have reduced the problem to finding the N\varepsilon/2-th smallest element in a set of cardinality (1/2+\varepsilon)N. Of course we can solve this by putting the elements in a reversed soft heap (i.e. all real keys are changed to their negative value), delete the smallest (1/2+\varepsilon) (fake key) elements, etc.

At each step, we reduce the size of the data set by at least a factor of (1/2+\varepsilon), and the amortized running time at each step is O(n) (building the soft heap) + O(n) (deleting about half of the elements) which is, in total \sim O(n).

Since the size shrinks in constant factor, the total running time is simply a geometric series, hence \sim O(N).

Remarks:

1. In the above argument, \varepsilon can be anything less than 1/2 hence in fact quite a lot of data are allowed to be corrupted!

2. It doesn’t matter whether we are finding the medium or the k-th smallest element, indeed, after the first step in the above argument we are literally finding the k-th smallest element. The general rule being simply taking the negative of the keys before building the soft heap whenever k<n/2.

3. The problem of finding the k-th smallest element is very well studied and algorithms that do this are called selection algorithms. Obviously our O(N) running time is optimum since it takes that much time to read off the data set! However, there are many other ways to achieve this linear running time, some are not even amortized, but as far as I know, one cannot do this with a real heap.


Simulating paintings on computer

February 27, 2012

Recently I’ve been playing around in our CS department and discovered there are actually people working in this field called computer graphics. Given that I’ve been playing with photoshop even before knowing what complex numbers are, this is quite exciting!

I guess I got a little too carried away in this homework item about coding your own non-photorealistic filter.

Anyways, I was never quite satisfied with any of those ‘artistic filters’ that came with commercial software. For example, here’s the photoshop ‘colored pencil’:

(original)

becomes:

and dry brush:

Well, just believe I have tried to adjust the parameters to produce reasonable results…

I head down to MoMa and Metropolitan museum on the weekend and looked at some classical paintings with the intension of figuring out how might one go about programing it. (Since color mixes quite differently on canvas and on screen, there are actually some quite interesting problems to solve in order to make the computer “paint”.)

Anyways, here’s what I ended up with: (I would really like to make a Van Gogh filter…although haven’t gotten anywhere convincing yet…)

Pointillism: (I found this being the obvious one to start with):

real painting (Georges Seurat):

my pointillism filter:

(applied to a fairly ordinary photo:)

detail:

classical oil painting portrait:

filtered:

Just for fun, I also did one for cubism:
(obviously inspired by Picasso’s violin)

photo:

my cubism filter:

Anyways, that was quite fun to program (especially for someone who haven’t touched code since high school).

In fact, I even found it rewarding in improving one’s painting skills. For example one thing I figured out was the distribution of luminance in painting is normally quite off from that of the real world, each artistic has their own ‘luminance curve’ and shifts the hue/saturation in a particular way. Umm…maybe coding filters should be added to all fine art curriculum :-P


A Chinese new year greeting~

January 23, 2012

So this is the first day of the Chinese new year, classes/seminars still haven’t resumed since December 17th, still two weeks to go before anything starts…

Anyways, I’m borrowing this occasion to thank everyone for continuously supporting this blog! We’ve had a great year, check the annual report at the end of the post for details.

As some of you may already know, I have decided that mathematics is not going to be my life-long career. So far it’s not clear in which direction will this blog go, it’s simply too important to abandon. I guess only time would tell. However, expect changes.

Finally, a sketch I doodled up last night~ Happy Chinese new year!

(btw, for those who have seen me in winter, yes I’m waring that Klein bottle hat in the picture!)

The WordPress.com stats helper monkeys prepared a 2011 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 15,000 times in 2011. If it were a concert at Sydney Opera House, it would take about 6 sold-out performances for that many people to see it.

Click here to see the complete report.


Sumsets, in discrete and continuous settings

December 12, 2011

In the past few years, there are 137 papers with the term ‘sumset’ in its title; and 50 more with ‘sum-set’. Ok the above statement was stolen from a talk I happen to be sitting in, by Noga Alon.

As usual, in the middle of the talk I got carried away by some ‘trivial facts’ he wrote down which are only slightly related to the main content. So I thought about that for during the rest of the talk (and also a little bit afterwards).

This time the curious point pops up when he was motivating why people should be studying chromatic number of random Cayley graphs by its application to sumsets.

Definition: In a vector space X, a sumset S \subseteq X is a set that can be represented as the Minkowski sum of another set with itself. i.e. S = A+A where A+A = \{ a_1+a_2 \ | \ a_1, a_2 \in A \}.

It was mentioned that given a large p, if we look at sets in \mathbb{Z}_p, then all sets with large cardinality must be sumsets.

This fact is first published by Ben Green who also raised the natural question:

Question: What’s the lower bound M(p) such that all subsets S \subseteq \mathbb{Z}_p with |S| > M(p) must be a sumset?

Let’s first see why all sets of large cardinality must be sumsets: Well as we all know \mathbb{Z}_p = \mathbb{Z}_p + \mathbb{Z}_p (can also be expressed as many other sums) is a sumset. Hence the first non-trivial case is \mathbb{Z}_p \backslash \{x\}:

The way I think of sumsets is that they are projections of product sets in the angle 3/4 \pi:

Hence our goal is to make up a set S_x such that the projection, after mod p, is everything except for the point x. Well, this isn’t too hard, first note the image of the ‘interval’ \{ x+1, x+2, \cdots, p+x-1 \} after mod p is exactly the set we wanted. Now we just need to make A \times A project to the interval! A = \{ x/2 +1, x/2 + 2, \cdots  (p+x)/2 -1 \} will do. (Here by C/2 we meant that it’s the integer C/2 when C is even and (C+p)/2 when it’s odd.

Note that since x \in \mathbb{Z}_p, A + A is actually a square in \{ 0, 1, \cdots, p-1 \}^2 and only “warped” in mod p after we project it to \{ 0, 1, \cdots, 2p-2 \}.

So we just showed all sets with one missing element is a sumset~ Now let’s move to sets of size p-2…(Don’t worry, I’m not going on to 3 and the length of this post will be finite :-P)

In contrast to the above argument which actually works in the continuous setting and shows the Lie group S^1 missing any interval is a sumset, this time we actually need to use the fact that our space is discrete and finite.

Note first that the property of being a sumset is invariant under scaling and translating by an integer. Indeed,

n\cdot S = \{ ns \ | \ s \in S \}

= \{n(a_1+a_2) \ | \ a_1, a_2 \in A\} = n \cdot A + n \cdot A

and of course S+t = (A+t/2) + (A+t/2).

Now no matter which two points a,b we delete, i.e. S = \mathbb{Z}_p \backslash \{a, b \}, we may let S' = (b-a)^{-1} \cdot (S-a). We have S' = \mathbb{Z}_p \backslash \{0, 1\}, which is the projection of the square I^2 = \{ 1,2, \cdots, (p-1)/2 \}^2.

Hence S = [(b-a) \cdot I + a/2] + [(b-a) \cdot I + a/2] is a sumset.

After playing around with the torus for a little bit, I believe in the continue case we can still write S^1 deleting two (small) intervals as a sumset A+A where A is a union of no more than 4 intervals. There are quite a few cases involved concerning the spacing between small intervals, hence I’ll just draw an example:

(By the way, this is about as far as I got daydreaming during the lecture, the rest came from the sources I looked up afterwards.)

Unfortunately since the scaling and translating gived only two degrees of freedom, the above argument fails when considering sets missing three points. (Playing with torus as I first tried, however, might still work)

Back to our question, so now we know at least M_p < p-2 the natural thing to study is of course how does it grow with p. Recall that we denoted M_p = p - f(p), hence the problem is equivalent to giving asymptotic lower bounds for f(p).

So this is quite curious, what do you think? Is f(p) just a counstant? (such as 2, or 3?) Or is it always more than a fixed proportion of \mathbb{Z}_p? (say any set containing 99% of the elements must be a sumset?) Or something in-between?

As one might have expected, f(p) is more than a constant,

Theorem: (Green) f(p) \geq \frac{1}{9} \log_2(p).

It’s also not as large as a fixed percentage:

Theorem: (Gowers-Green) f(p) \leq C \cdot p^2/3 \log(p)

So it’s something ‘in-between’. Interesting…So what is this number? This is an open question in general, by applying methods of Cayley graphs and their spectrums, the speaker (of the talk) was able to improve the above bounds (in this other paper):

Theorem: (Alon) c \cdot \sqrt{\frac{p}{\log{p}}} \leq f(p) \leq C \cdot \frac{ p^2/3}{ \log(p)^{1/3}}

However one should expect that

Conjecture: (Green) f(p) \sim p^{1/2+O(1)}.

Having knowing absolutely nothing about the subject (or combinatorics in general), my first reaction about this is that perhaps, if we look at \mathbb{Z}_p where p is roughly N^2, let S be the set missing N equally spaced points, now if we want to write S as a sumset that’s like finding a product A \times A \subseteq \mathbb{T}^2 that misses all N diagonal circles and have at least N points in each thin strip to ‘block’ every diagonal circle in the strip.

I think this set looks (by the same philosophy as when we play around with the two intervals on the tori case) fairly hard to express as a sum. i.e. when we are exactly in \mathbb{Z}_{N^2}, we can probably ‘just’ do it by choosing one representative from each mod N class and place them in the N strips. But looks as if the ‘resolution’ is any lower, (i.e. p is a little smaller and we still have about N equally spaced holes), we would not have enough freedom to place the points.

Anyways, that last remark might be completely nonsense~ The conjecture is interesting, tho.


The Schwartz lantern

November 28, 2011

It’s thanksgiving, let’s have some fun in lantern-making!

This thing called Schwartz lantern initially came up in a talk some years ago, I vaguely remembered it as ‘a cool example where the refining triangle approximation of a smooth surface fail to converge in area’. Anyways, it came up again recently as I was talking to some German postdoc working in ‘discrete differential geometry’. As the example was mentioned, he took a napkin and started folding…and I just realized this lantern can actually be made with a piece of flat paper!

Given a compact, smooth surface (possibly with boundary) embedded in \mathbb{R}^3, we can approximate it by a PL surface with all vertices on the surface and having only triangular faces. (just like what’s done in many computer graphic softwares nowadays).

Question: When the vertices of the faces gets denser and denser and the diameter of triangles converge to 0, does the area of the PL surface converge to the area of the surface?

Ok, to explain why I found this being a quite curious little question, let’s prove a couple of trivial observations:

Trivial fact #1: The sequence of PL surfaces as described above does Hausdorff converge to the smooth surface.

Proof:Since our surface is smooth, as the diameter of the triangles converge to 0, the length of the geodesic between two vertices is roughly the length of the edge in the 1-skeleton of the PL surface. In particular, less than twice its length.

Now by compactness we have a positive injectivity radius, implying that for small enough length c, the diameter (on the surface) of region enclosed by any loop of length most c is controlled by c (say it’s \leq f(c) which converge to 0 as c \rightarrow 0). Now the \mathbb{R}^3 diameter of the region is of course even smaller than its surface diameter.

In conclusion, when all triangles have small diameter \varepsilon (hence all its sides have length at most \varepsilon), the geodesic triangle on the surface has parameter less than 6 \varepsilon . So \mathbb{R}^3 diameter of the geodesic triangle is no more than f(6 \varepsilon). Hence the surface is contained in the f(6\varepsilon)-neighbourhood of the vertex set.

Obviously the PL surface is also contained in this neighbourhood. Hence the Hausdorff distance is at most f(6\varepsilon), which converges to 0.

Trivial fact #2: For curves in \mathbb{R}^2 (in fact, or \mathbb{R}^n), the length converges.

Proof: Well…what can I say…see any undergrad calculus book? (well, all we need is that smooth curves are rectifiable. Of course they are…

So from the above observations, does it kinda smell like the area would converge? (If you know the answer, you should pretend you don’t and nod at this point :-P) Well, the fact is they don’t have to converge! (otherwise why are we making counter-examples here?) Furthermore, this is first discovered by a super-cool dude – Schwartz! He even wrote a paper about it back in 1880.

How can this be possible? You might have already observed that with some simple curvature bounding, we can push the argument for trivial fact #1 to show that area of the curved surface is controlled above by the straight surface, The point being (perhaps to one’s surprise) that the ‘straight surface’ can be a lot LARGER than the curved one!

So the example is a sequence of ‘lanterns’ converging to a standard cylinder (say of height and circumference both equal to 1), i.e. PL surfaces with triangular faces, vertices on the cylinder, with smaller and smaller ‘grids’, and the sum of areas of the triangle blows up to infinity.

As shown above, if we have put N^4 points on the cylinder, N points along the circumference and N^3 in the vertical direction (picture is not to scale); connected to form triangles in the above way.

Now all triangles are isosceles and identical. Doing some middle-school geometry shows that they have base length \geq \frac{1}{2N} and height \geq \frac{1}{4N}\tan^{-1}(\frac{\pi}{2N}) (this calculates the distance between the midpoint of the base to the cylinder surface).

Having 2N^4 triangles means the area A_N of the PL surface is at least N^4 \times \frac{1}{2N} \times  \frac{1}{4N}\tan^{-1}(\frac{\pi}{2N}) when N large, \tan^{-1}(\frac{\pi}{2N}) \sim \frac{\pi}{2N} \geq \frac{1}{N}. Hence the A_N \geq \frac{N}{8} blows up to infinity.

Is this pretty cool? This lantern also have an interesting feature that, if we define ‘curvature’ on vertices to be the sum of angles attached to that vertex, (and of course the curvature on the edge between two flat faces shall be 0), then all lanterns have curvature 0 everywhere, just as in the smooth cylinder! i.e. it can be made by folding a single piece of flat paper.

Let’s note that although the triangles are getting uniformly smaller, they do become ‘thinner and thinner’ in the example. In fact this is the only way it can go wrong, i.e. it can be shown that if we further require the triangles to have bounded eccentricity then the area does converge.

Add-on: I actually made the lantern! They are interesting to fold, aesthetically pleasing and even functional! (you’ll see light flaring out in an interesting way)

Trying it out while one thinks about problems is highly amusing and recommended~

All one needs to do is:

Tips on folding:

1. Be sure to make all diagonal lines positive fold and horizontal lines negative.

2. Make the diagonals cross an even number of horizontals or else after you finish all diagonals, you’ll end up with left and right-facing diagonals not crossing on the horizontal (i.e. you’ll need to double the number of horizontals to make it work again)

3. After finishing all lines, it might be hard at first to make the whole thing ‘fold up’. The trick being to make sure all ‘crosses’ are ‘poped-out’ on the whole surface. The final folding process does not work locally!

4. Although theoretically you can take an arbitrarily long strip of paper with unit width to make unit-sized lantern, but in order to not make a million folds and have super-sharp angles between the diagonal and horizontal; I recommend not being too aggressive on the length :-P (square-ish papers are good enough)

Have fun!~

A not-very-good picture of my lantern (larking light bulb…>.<)


Follow

Get every new post delivered to your Inbox.

Join 51 other followers