Building a realtime eye tracking experience with Supabase and WebGazer.js
December 17, 2024

Building a realtime eye tracking experience with Supabase and WebGazer.js

Long story short:

Another Supabase Launch Week hackathon and another experimental project called Stare into the abyss. This ended up being one of the simplest yet most complex projects. Luckily I’ve been really into Cursor lately so I got some help finishing it! I also want to verify a question in my mind: whether it can be used only Supabase’s instant functionality No Is there a database table? The (perhaps somewhat obvious) answer is: Yes, it is (love you, Instant Team ♥️). So let’s take a deeper look at the implementation.

I just randomly thought one day Nietzsche’s famous quote about the abyss It would be nice (and cool) to actually visualize this in some way: you’re staring at a dark screen, and something is staring back at you. Nothing more!

Initially my idea was to use Three.js to make this project, but I realized that this meant I would need to create or find some free resources for the 3D eyes. I thought this was a bit too much, especially since I didn’t have much time to work on the project itself, so decided to use SVG for 2D.

I also didn’t want it to be just visual: it would have been a better experience with some audio, too. So I had an idea, it would be great if participants could speak into a microphone and others could hear the unqualified whisper or wind noise. However, this was very challenging, so I decided to abandon it completely because I couldn’t connect WebAudio and WebRTC together well. I have The remaining components in the library If you want to take a look, it listens to the local microphone and triggers a “wind sound” for the current user. Maybe something will be added in the future?


instant room

Before working on any visual content, I wanted to test out the live settings I wanted. Since there are some Immediate functionality limitations I want it to work properly so that:

  • At most. A channel has 10 participants at a time
    • This means that if a new channel is full, you need to join a new channel
  • You should only see the eyes of other participants

For this I came up with a useEffect Set it to recursively join live channels as follows:


Enter full screen mode

Exit full screen mode

this joinRoom live inside useEffect hook and is called when the room component is installed. One caveat I found while developing this feature was currentPresences param does not contain any value join event even if it is available. I’m not sure if this is a bug in the implementation or works as expected. Therefore a manual is needed room.presenceState Whenever a user joins, fetch can get the number of participants in the room.

We check the number of participants and either unsubscribe from the current room and try to join another room or then continue with the current room. We do this in join event as sync it’s too late (it triggers after join or leave event).

I tested this implementation by opening a bunch of tabs in my browser and everything looked great!

I then wanted to debug the solution via mouse position updates and quickly ran into some issues with sending too many messages in the channel! Solution: Limit calls.

/**
 * Creates a throttled version of a function that can only be called at most once 
 * in the specified time period.
 */
function createThrottledFunction<T extends (...args: unknown[]) => unknown>(
  functionToThrottle: T,
  waitTimeMs: number
): (...args: Parameters<T>) => void {
  let isWaitingToExecute = false

  return function throttledFunction(...args: Parameters<T>) {
    if (!isWaitingToExecute) {
      functionToThrottle.apply(this, args)
      isWaitingToExecute = true
      setTimeout(() => {
        isWaitingToExecute = false
      }, waitTimeMs)
    }
  }
}

Enter full screen mode

Exit full screen mode

Cursor came up with this little throttle function creator that I use with eye tracking broadcasting like this:


const throttledBroadcast = createThrottledFunction((data: EyeTrackingData) => {
  if (currentChannel) {
    currentChannel.send({
      type: 'broadcast',
      event: 'eye_tracking',
      payload: data
    })
  }
}, THROTTLE_MS)

throttledBroadcast({
 userId: userId.current,
 isBlinking: isCurrentlyBlinking,
 gazeX,
 gazeY
})
Enter full screen mode

Exit full screen mode

This helps! Also, in the initial version, I sent eye tracking messages presence However broadcast Allows more messages per second, so I’ll switch my implementation to this. This is especially important in eye tracking because the camera is recording everything all the time.


eye tracking

I have encountered WebGazer.js It wasn’t long ago when I first had the idea for this project. This was a really fun project and it worked surprisingly well!

The entire eye tracking function is completed in one function useEffect hook:

    window.webgazer
      .setGazeListener(async (data: any) => {
        if (data == null || !currentChannel || !ctxRef.current) return

        try {
          // Get normalized gaze coordinates
          const gazeX = data.x / windowSize.width
          const gazeY = data.y / windowSize.height

          // Get video element
          const videoElement = document.getElementById('webgazerVideoFeed') as HTMLVideoElement
          if (!videoElement) {
            console.error('WebGazer video element not found')
            return
          }

          // Set canvas size to match video
          imageCanvasRef.current.width = videoElement.videoWidth
          imageCanvasRef.current.height = videoElement.videoHeight

          // Draw current frame to canvas
          ctxRef.current?.drawImage(videoElement, 0, 0)

          // Get eye patches
          const tracker = window.webgazer.getTracker()
          const patches = await tracker.getEyePatches(
            videoElement,
            imageCanvasRef.current,
            videoElement.videoWidth,
            videoElement.videoHeight
          )

          if (!patches?.right?.patch?.data || !patches?.left?.patch?.data) {
            console.error('No eye patches detected')
            return
          }

          // Calculate brightness for each eye
          const calculateBrightness = (imageData: ImageData) => {
            let total = 0

            for (let i = 0; i < imageData.data.length; i += 16) {
              // Convert RGB to grayscale
              const r = imageData.data[i]
              const g = imageData.data[i + 1]
              const b = imageData.data[i + 2]
              total += (r + g + b) / 3
            }
            return total / (imageData.width * imageData.height / 4)
          }

          const rightEyeBrightness = calculateBrightness(patches.right.patch)
          const leftEyeBrightness = calculateBrightness(patches.left.patch)
          const avgBrightness = (rightEyeBrightness + leftEyeBrightness) / 2

          // Update rolling average
          if (brightnessSamples.current.length >= SAMPLES_SIZE) {
            brightnessSamples.current.shift() // Remove oldest sample
          }
          brightnessSamples.current.push(avgBrightness)

          // Calculate dynamic threshold from rolling average
          const rollingAverage = brightnessSamples.current.reduce((a, b) => a + b, 0) / brightnessSamples.current.length
          const dynamicThreshold = rollingAverage * THRESHOLD_MULTIPLIER
          // Detect blink using dynamic threshold
          const blinkDetected = avgBrightness > dynamicThreshold

          // Debounce blink detection to avoid rapid changes
          if (blinkDetected !== isCurrentlyBlinking) {
            const now = Date.now()
            if (now - lastBlinkTime > 100) { // Minimum time between blink state changes
              isCurrentlyBlinking = blinkDetected
              lastBlinkTime = now
            }
          }

          // Use throttled broadcast instead of direct send
          throttledBroadcast({
            userId: userId.current,
            isBlinking: isCurrentlyBlinking,
            gazeX,
            gazeY
          })

        } catch (error) {
          console.error('Error processing gaze data:', error)
        }
      })
Enter full screen mode

Exit full screen mode

Getting the information the user is viewing is as easy as getting the position of the mouse on the screen. However, I also wanted to add blink detection as a (cool) feature, which required jumping through some hoops.

When you google for information about WebGazer and blink detection, you can see some remnants of the original implementation. like there is Comment out the code in the source code even. Unfortunately, no such functionality exists in the library. You need to do this manually.

After a lot of trial and error, Cursor and I came up with a solution that calculated pixels and brightness levels based on the eyecup data to determine when the user blinked. It also has some dynamic lighting adjustments, since I’ve noticed (at least for me) that webcams don’t always recognize when you’re blinking based on your lighting. For me, the brighter my photo/room, the worse the effect, and the effect is better in darker lights (see picture).

When debugging eye tracking functionality (WebGazer has a very good .setPredictionPoints The call displays a red dot on the screen to visualize where you are looking), I noticed the tracking wasn’t very accurate Unless you calibrate it. This is what the project requires you to do before joining any room.

const startCalibration = useCallback(() => {
    const points: CalibrationPoint[] = [
      { x: 0.1, y: 0.1 },
      { x: 0.9, y: 0.1 },
      { x: 0.5, y: 0.5 },
      { x: 0.1, y: 0.9 },
      { x: 0.9, y: 0.9 },
    ]
    setCalibrationPoints(points)
    setCurrentPoint(0)
    setIsCalibrating(true)

    window.webgazer.clearData()
  }, [])

  const handleCalibrationClick = useCallback((event: React.MouseEvent) => {
    if (!isCalibrating) return

    // Record click location for calibration
    const x = event.clientX
    const y = event.clientY
    window.webgazer.recordScreenPosition(x, y, 'click')

    if (currentPoint < calibrationPoints.length - 1) {
      setCurrentPoint(prev => prev + 1)
    } else {
      setIsCalibrating(false)
      setHasCalibrated(true)
    }
  }, [isCalibrating, currentPoint, calibrationPoints.length])
Enter full screen mode

Exit full screen mode

          
{calibrationPoints.map((point, index) => ( ))}

Click the red dot to calibrate ({currentPoint + 1}/{calibrationPoints.length})

Enter full screen mode

Exit full screen mode

Basically we render 5 points on the screen: one in each corner and one in the center. Clicking on them will record the screen position in WebGazer so that it can better adjust the model to predict where you are looking. You might be wondering what this click actually does, and the interesting thing is, you’re actually looking at where the click is, right? When you do this, WebGazer is able to better handle your eye movements and provide more accurate results. Very cool!


Eye

I’ve added a simple SVG implementation for the eyes and connected it to tracking, but it needs more stylization. Here’s what it ended up looking like. inspiration is Alucard eyes by MIKELopez.

This is an early version of the eye, but it’s almost 95% there. I sent the video to my friends and they thought it was pretty cool, especially knowing that it actually follows your eye movements! You can also see WebGazer’s prediction points move across the screen.

The eye component itself is an SVG with some path animation implemented through Motion.

      <svg
        className={`w-full h-full self-${alignment} max-w-[350px] max-h-[235px]`}
        viewBox="-50 0 350 235"
        preserveAspectRatio="xMidYMid meet"
      >
        {/* Definitions for gradients and filters */}
        <defs>
          <filter id="pupil-blur">
            <feGaussianBlur stdDeviation="0.75" />
          filter>
          <radialGradient id="eyeball-gradient">
            <stop offset="60%" stopColor="#dcdae0" />
            <stop offset="100%" stopColor="#a8a7ad" />
          radialGradient>
          <radialGradient 
            id="pupil-gradient"
            cx="0.35"
            cy="0.35"
            r="0.65"
          >
            <stop offset="0%" stopColor="#444" />
            <stop offset="75%" stopColor="#000" />
            <stop offset="100%" stopColor="#000" />
          radialGradient>
          <radialGradient 
            id="corner-gradient-left"
            cx="0.3"
            cy="0.5"
            r="0.25"
            gradientUnits="objectBoundingBox"
          >
            <stop offset="0%" stopColor="rgba(0,0,0,0.75)" />
            <stop offset="100%" stopColor="rgba(0,0,0,0)" />
          radialGradient>

          <radialGradient 
            id="corner-gradient-right"
            cx="0.7"
            cy="0.5"
            r="0.25"
            gradientUnits="objectBoundingBox"
          >
            <stop offset="0%" stopColor="rgba(0,0,0,0.75)" />
            <stop offset="100%" stopColor="rgba(0,0,0,0)" />
          radialGradient>

          <filter id="filter0_f_302_14" x="-25" y="0" width="320" height="150" filterUnits="userSpaceOnUse" colorInterpolationFilters="sRGB">
            <feGaussianBlur stdDeviation="4.1" result="effect1_foregroundBlur_302_14"/>
          filter>
          <filter id="filter1_f_302_14" x="-25" y="85" width="320" height="150" filterUnits="userSpaceOnUse" colorInterpolationFilters="sRGB">
            <feGaussianBlur stdDeviation="4.1" result="effect1_foregroundBlur_302_14"/>
          filter>
          <filter id="filter2_f_302_14" x="-50" y="-30" width="400" height="170" filterUnits="userSpaceOnUse" colorInterpolationFilters="sRGB">
            <feGaussianBlur stdDeviation="7.6" result="effect1_foregroundBlur_302_14"/>
          filter>
          <filter id="filter3_f_302_14" x="-50" y="95" width="400" height="170" filterUnits="userSpaceOnUse" colorInterpolationFilters="sRGB">
            <feGaussianBlur stdDeviation="7.6" result="effect1_foregroundBlur_302_14"/>
          filter>
          <filter id="filter4_f_302_14" x="0" y="-20" width="260" height="150" filterUnits="userSpaceOnUse" colorInterpolationFilters="sRGB">
            <feGaussianBlur stdDeviation="3.35" result="effect1_foregroundBlur_302_14"/>
          filter>
          <filter id="filter5_f_302_14" x="0" y="105" width="260" height="150" filterUnits="userSpaceOnUse" colorInterpolationFilters="sRGB">
            <feGaussianBlur stdDeviation="3.35" result="effect1_foregroundBlur_302_14"/>
          filter>
        defs>

        {/* Eyeball */}
        <ellipse
          cx="131"
          cy="117.5"
          rx="100"
          ry="65"
          fill="url(#eyeball-gradient)"
        />

        {/* After the main eyeball ellipse but before the eyelids, add the corner shadows */}
        <ellipse
          cx="50"
          cy="117.5"
          rx="50"
          ry="90"
          fill="url(#corner-gradient-left)"
        />

        <ellipse
          cx="205"
          cy="117.5"
          rx="50"
          ry="90"
          fill="url(#corner-gradient-right)"
        />

        {/* Corner reflections - repositioned diagonally */}
        <circle
          cx={45}
          cy={135}
          r="1.5"
          fill="white"
          className="opacity-60"
        />
        <circle
          cx={215}
          cy={100}
          r="2"
          fill="white"
          className="opacity-60"
        />

        {/* Smaller companion reflections - repositioned diagonally */}
        <circle
          cx={35}
          cy={120}
          r="1"
          fill="white"
          className="opacity-40"
        />
        <circle
          cx={222}
          cy={110}
          r="1.5"
          fill="white"
          className="opacity-40"
        />

        {/* Pupil group with animations */}
        <motion.g
          variants={pupilVariants}
          animate={isBlinking ? "hidden" : "visible"}
        >
          {/* Pupil */}
          <motion.ellipse
            cx={131}
            cy={117.5}
            rx="50"
            ry="50"
            fill="url(#pupil-gradient)"
            filter="url(#pupil-blur)"
            animate={{
              cx: 131 + pupilOffsetX,
              cy: 117.5 + pupilOffsetY
            }}
            transition={{
              type: "spring",
              stiffness: 400,
              damping: 30
            }}
          />

          {/* Light reflections */}
          <motion.circle
            cx={111}
            cy={102.5}
            r="5"
            fill="white"
            animate={{
              cx: 111 + pupilOffsetX,
              cy: 102.5 + pupilOffsetY
            }}
            transition={{
              type: "spring",
              stiffness: 400,
              damping: 30
            }}
          />
          <motion.circle
            cx={124}
            cy={102.5}
            r="3"
            fill="white"
            animate={{
              cx: 124 + pupilOffsetX,
              cy: 102.5 + pupilOffsetY
            }}
            transition={{
              type: "spring",
              stiffness: 400,
              damping: 30
            }}
          />
        motion.g>

        {/* Upper eyelid */}
        <motion.path 
          custom={true}
          variants={eyelidVariants}
          animate={isBlinking ? "closed" : "open"}
          fill="#000"
        />

        {/* Lower eyelid */}
        <motion.path 
          custom={false}
          variants={eyelidVariants}
          animate={isBlinking ? "closed" : "open"}
          fill="#000"
        />

        {/* Top blurred lines */}
        <g filter="url(#filter0_f_302_14)">
          <motion.path
            custom={true}
            variants={blurredLineVariants}
            animate={isBlinking ? "closed" : "open"}
            stroke="#2A2A2A"
            strokeWidth="5"
            strokeLinecap="round"
          />
        g>
        <g filter="url(#filter2_f_302_14)">
          <motion.path
            custom={true}
            variants={outerBlurredLineVariants}
            animate={isBlinking ? "closed" : "open"}
            stroke="#777777"
            strokeWidth="5"
            strokeLinecap="round"
          />
        g>
        <g filter="url(#filter4_f_302_14)">
          <motion.path
            custom={true}
            variants={arcLineVariants}
            animate={isBlinking ? "closed" : "open"}
            stroke="#838383"
            strokeWidth="5"
            strokeLinecap="round"
          />
        g>

        {/* Bottom blurred lines */}
        <g filter="url(#filter1_f_302_14)">
          <motion.path
            variants={bottomBlurredLineVariants}
            animate={isBlinking ? "closed" : "open"}
            stroke="#2A2A2A"
            strokeWidth="5"
            strokeLinecap="round"
          />
        g>
        <g filter="url(#filter3_f_302_14)">
          <motion.path
            variants={bottomOuterBlurredLineVariants}
            animate={isBlinking ? "closed" : "open"}
            stroke="#777777"
            strokeWidth="5"
            strokeLinecap="round"
          />
        g>
        <g filter="url(#filter5_f_302_14)">
          <motion.path
            variants={bottomArcLineVariants}
            animate={isBlinking ? "closed" : "open"}
            stroke="#838383"
            strokeWidth="5"
            strokeLinecap="round"
          />
        g>
      svg>
Enter full screen mode

Exit full screen mode

Cursors work surprisingly well with SVG paths. For example, the eyelid closing animation is basically done by straightening a curved path. I just highlighted the path in the editor, pasted it into Composer, and asked to add an animation to straighten out the points so that the eye looks like it’s closing/blinking.

  // Define the open and closed states for both eyelids
  const upperLidOpen = "M128.5 53.5C59.3 55.5 33 99.6667 28.5 121.5H0V0L261.5 0V121.5H227.5C214.7 65.1 156.167 52.6667 128.5 53.5Z"
  const upperLidClosed = "M128.5 117.5C59.3 117.5 33 117.5 28.5 117.5H0V0L261.5 0V117.5H227.5C214.7 117.5 156.167 117.5 128.5 117.5Z"

  const lowerLidOpen = "M128.5 181C59.3 179 33 134.833 28.5 113H0V234.5H261.5V113H227.5C214.7 169.4 156.167 181.833 128.5 181Z"
  const lowerLidClosed = "M128.5 117.5C59.3 117.5 33 117.5 28.5 117.5H0V234.5H261.5V117.5H227.5C214.7 117.5 156.167 117.5 128.5 117.5Z"

  // Animation variants for the eyelids
  const eyelidVariants = {
    open: (isUpper: boolean) => ({
      d: isUpper ? upperLidOpen : lowerLidOpen,
      transition: {
        duration: 0.4,
        ease: "easeOut"
      }
    }),
    closed: (isUpper: boolean) => ({
      d: isUpper ? upperLidClosed : lowerLidClosed,
      transition: {
        duration: 0.15,
        ease: "easeIn"
      }
    })
  }
Enter full screen mode

Exit full screen mode

It was a really cool experience to see it in action! I applied the same method to the surrounding lines and instructed the cursor to “collapse” them towards the center: it worked almost in one go!

The eye will then be rendered within a simple CSS grid, with the cells aligned so that the entire room looks like one big eye.

<div className="fixed inset-0 grid grid-cols-3 grid-rows-3 gap-4 p-8 md:gap-2 md:p-4 lg:max-w-6xl lg:mx-auto">
          {Object.entries(roomState.participants).map(([key, presences]) => {
            const participant = presences[0]
            const eyeData = eyeTrackingState[key]
            if (key === userId.current) return null

            return (
              <div 
                key={key}
                className={`flex items-center justify-center ${getGridClass(participant.position)}`}
              >
                <Eyes
                  isBlinking={eyeData?.isBlinking ?? false}
                  gazeX={eyeData?.gazeX ?? 0.5}
                  gazeY={eyeData?.gazeY ?? 0.5}
                  alignment={getEyeAlignment(participant.position)}
                />
              </div>
            )
          })}
        </div>

// Helper function to convert position to Tailwind grid classes
function getGridClass(position: string): string {
  switch (position) {
    case 'center': return 'col-start-2 row-start-2'
    case 'middleLeft': return 'col-start-1 row-start-2'
    case 'middleRight': return 'col-start-3 row-start-2'
    case 'topCenter': return 'col-start-2 row-start-1'
    case 'bottomCenter': return 'col-start-2 row-start-3'
    case 'topLeft': return 'col-start-1 row-start-1'
    case 'topRight': return 'col-start-3 row-start-1'
    case 'bottomLeft': return 'col-start-1 row-start-3'
    case 'bottomRight': return 'col-start-3 row-start-3'
    default: return 'col-start-2 row-start-2'
  }
}

function getEyeAlignment(position: string): 'start' | 'center' | 'end' {
  switch (position) {
    case 'topLeft':
    case 'topRight':
      return 'end'
    case 'bottomLeft':
    case 'bottomRight':
      return 'start'
    default:
      return 'center'
  }
}
Enter full screen mode

Exit full screen mode


final touches

Then play some nice intro screens and background music, and the project is ready to go!

Audio always improves the experience when you’re dealing with something like this, so I used stable message generate a background music When the user “enters the abyss”. The tips I use for music are as follows:

ambient, creepy, background music, whispers, wind, slow tempo, eerie, abyss

I also thought a pure black screen was a bit boring, so I added some animated SVG filters to the background. Additionally, I added a dark, blurry circle in the center of the screen to create some nice fade effects. I could probably do this using an SVG filter, but I don’t want to spend too much time on that. Then to have more movement I made the background rotate around its axis. Animating using SVG filters can be a bit weird sometimes, so I decided to go this route.

    <div style={{ width: '100vw', height: '100vh' }}>
      {/* Background Elements */}
      <svg className="fixed inset-0 w-full h-full -z-10">
        <defs>
          <filter id="noise">
            <feTurbulence 
              id="turbFreq"
              type="fractalNoise" 
              baseFrequency="0.01"
              seed="5"
              numOctaves="1"
            >
            </feTurbulence>
            <feGaussianBlur stdDeviation="10">
              <animate
                attributeName="stdDeviation"
                values="10;50;10"
                dur="20s"
                repeatCount="indefinite"
              />
            </feGaussianBlur>
            <feColorMatrix
              type="matrix"
              values="1 0 0 0 1
                      0 1 0 0 1
                      0 0 1 0 1
                      0 0 0 25 -13"
            />
          </filter>
        </defs>
        <rect width="200%" height="200%" filter="url(#noise)" className="rotation-animation" />
      </svg>
      <div className="fixed inset-0 w-[95vw] h-[95vh] bg-black rounded-full blur-[128px] m-auto" />
Enter full screen mode

Exit full screen mode

So there you have it: a fairly straightforward look at how to implement programmatic eye tracking using Supabase’s real-time capabilities. Personally, I found this to be a very interesting experiment and didn’t have too many problems while doing it. Surprisingly, I didn’t have to stay up late the last night before submitting the project!

Please feel free to check The project or Demonstration video What’s the result? There might be some issues if a group of people were using it at the same time (it’s hard to test since it requires multiple devices and webcams to do it right), but I guess that’s the fashion for hackathon projects? If you do take the test, remember that if you see an eye, someone else is looking at you somewhere through the network!

2024-12-17 19:25:39

Leave a Reply

Your email address will not be published. Required fields are marked *