SMART LEDs

Part #4 of the "Roger Writes" series - Jan 2023

Background

Everyone loves a flashing LED, it's the embedded programmer's version of "Hello World". Just find the "Blink" app for the Arduino range.
But for more than a few LEDs, or for a wider range of colours, things get a lot more complicated. The common way to reduce the number of pins required is to multiplex the display, but this reduces the available brightness, and gets complicated.

WS2811

World-Semi came up with the WS28 range of controller chips, which solves many of these problems. Each device has a single data input, and a data output. The seperate chips handle the R/G/B PWM control, so that once the data has been sent to the chips, the controlling device doesn't have to do anything else. Because each LED has a seperate PWM channel, the LEDs are not multiplexed, so can be full brightness. As each chip buffers the signal, the signal doesn't degrade too much over a large number of devices (as long as the distange between devices is limited to around 6meters/20 feet).

The WS2812 is logically the same as this, but with the LED and chip combined in a single 4-pin package (PWR+GND+IN+OUT).
There are several clones of the WS series, notably the SK6812 (a direct replacement for the WS2812 by Shenzhen LED Color), and the PL9823 (a 5mm standard 4-pin LED package, by BaiCheng).

Signalling

There are a number of WS devices, as well as a number of clones, but they all use the same basic timing. Each LED is sent 8-bits for each of the colour channels (watch out some of the WS2812 devices are GRB order, not RGB order! Also there are RGBW devices that need 4x 8-bits per LED).

For all bits, the 0 and 1 timing is about the same (so T0H+T0L and T1H+T1L) at around 1250ns.
The Short pulses (T0H and T1L) are usually around 375ns (WS2811 claims 220ns...380ns), and the longer pulses (T0L and T1H) are around 850ns (WS2811 claims 580ns...1000ns). The reset pulse should be at least 280us (watch out, some non world-semi devices need several milli-seconds!).
Each LED will accept the first 24-bits, and store them in temporary latches, then any further bits are passed on to the output pin.
For two (or more) LEDs, we send 24-bits of 0/1, which get stored in the 1st LED's latches.
The next 24-bits of 0/1, get passed directly to the 1st LED's output pin, and the 2nd LED will take them and store them in its latches.
(We can repeat this for as many LEDs as we like)
We then wait at least 280us, for the reset. At some point the LEDs will time out, and they will transfer their latch data to their PWM controller registers, so the new RGB levels will be used.
Because all of the LED see the data stop at the same, they all change at the same time (baring propogation delays, and small internal timing variations).
This can be extended to a large number of LEDs, and the refresh rates are pretty good, at 30us (24x1250ns = 30us) per LED, a 500 LED strip can be refreshed at about 60Hz!

Generating the signal

There are many ways to generate the signals, for the Arduino, the FastLED library is common.
Generally for any device, if the CPU clock is high enough, it's simple to toggle a GPIO line with the right timing.

Optimised PIC code

For the PIC microcontroller range, there is an application note AN1606, (for older devices)
Inspired by the AN, I wrote the following code for the newer devices. This sets up CCP1 and CCP2 as a fixed PWM sources with high periods of 3 clock intervals (375ns), and the extra part (125ns or 150ns). The first configurable logic cells (CLC1) is configured as a D-type flip-flop with set/reset, and is used to stretch CCP2 from the start to the final end. It then uses CLC1 to merge the CLC2 and CCP1 outputs based on the SPI data bit. Once this is set up, writing to SSP1BUF will output a stream of 8 bits to the CCP2OUT pin using the timing set by CCP1/CCP2. Most devices will have PPS so the CLC2 output can be moved around (after waiting for the current byte to send!) so multiple strings can be driven (in sequence) from a single PIC device.

'C' codeASM code
#ifdef SLOW_SEND
#define SSP_BIT_CLK_COUNT    6        // 6x125=750ns, x2=1.5us
#define SSP_BYTE_CLOCK_NS    15000    // 15us per byte
#else
#define SSP_BIT_CLK_COUNT    5        // 5x125=625ns, x2=1.25us
#define SSP_BYTE_CLOCK_NS    12500    // 12.5us per byte
#endif

    // Set TMR2 to 625ns/750ns
#ifdef T2HLT
    T2HLT=0;
#endif
#ifdef T2CLKCON
    T2CLKCON=1;    // 1=Fosc/4 (8MHz=125ns)
#endif
    PR2=SSP_BIT_CLK_COUNT-1;    //    5*125=625ns, x2=1.25us per bit, 6*125=750ns, x2=1.5us per bit
    T2CON=1<<_T2CON_TMR2ON_POSITION;    //    T2CON = TRM2ON, POS=00, if not CLKCON, then CLKbits = FOSC/4

    // Point CCP1/CCP2 to use TMR2 (they do by default, but just in case)
#ifndef CCPTMRS
#ifdef CCPTMRS0
#define CCPTMRS CCPTMRS0
#endif
#endif
#ifdef CCPTMRS
    CCPTMRS&=0b11110000;    // Preserve whatever CCP3/4 are using
    CCPTMRS|=0b00000101;    // Set CCP2/CCP1 to use TMR2
#endif
    // Setup CCP1 for the '0' pulse duration, so 375ns high time
    CCP1CON=0b10001100;     // EC.?.OUT.FMT.MMMM, so PWM Mode, Right aligned format, Mode=1100=PWM
    CCPR1H=0;               //High period 3*125=375
    CCPR1L=8+3;             //8+3=375ns

    // Setup CCP2 for the '1' pulse duration, so 750ns high time
    // BUT: That's longer than TMR2, so set it to the extra part (750-625)==125ns (so 625+125=750ns, or 750+125=825ns)
    CCP2CON=0b10001100;     //EC.?.OUT.FMT.MMMM, so PWM Mode, Right aligned format, Mode=1100=PWM
    CCPR2H=0;               //High period
    CCPR2L=4;               //4=125ns    
    // Watch out: Both CCP1 and CCP2 seem to have a few clocks delay to the SPI, are probably syncronised to Fosc/4

    // Set SPI to TMR2/2, so 1250ns/1500ns per bits
    SSP1CON1=(1<<_SSP1CON1_SSPEN_POSITION)|0x03;    //0x03=TMR2/2
    SSP1CON3=(1<<_SSP1CON3_BOEN_POSITION);

    // This needs the enhanced version of the CLC, that has 4 full registers, not the 2 x nibble registers...
    // It uses CLC1 to generate the complicated '1' signal (it's over a TMR2 period, so needs to be TMR2+CCP2
    // Select the 3 inputs we need
    CLC1SEL0=CLCSEL_SCLK;   //MSSP1_clk_out (SCLK)
    CLC1SEL1=CLCSEL_SDO;    //MSSP1_data_out (SDO)
    CLC1SEL2=0;             //None
    CLC1SEL3=CLCSEL_CCP2;   //CCP2_out (short pulses)

    // To stretch the output over a TMR2 period (and handle the CCP2 offset relative to SDO/SCLK)
    // This uses D-type flipflop with Set and Reset
    // If SDO==1, at start of CCP2 the output is set, it's cleared by clocking SCLK on the trailing edge of second CCP2
    // CLC1GLS0=Clk, CLC1GLS1=D, CLC1GLS2=R, CLC1GLS3=S
    CLC1GLS0=0b01000000;    // !CCP2    This will clock the data through on both the first and second halves of the SCLK
    CLC1GLS1=0b00000101;    // SCLK && SDO = !(!SCLK || !SDO), we merge this with SDO, so that if SDO is '1' it has no effect during the 1/2 half of SCLK.
    CLC1GLS2=0;             // None
    CLC1GLS3=0b01000101;    // CCP2 && SCLK && SDO = !(!CCP2 || !SCLK || !SDO) If SDO==1, then set the output at the start of the CCP2 (gate on SCLK to avoid triggers between bytes)
    CLC1POL=0b1010;         // Inverts: The first level blocks are OR, so we do Y=(A && B) with Y=!(!A || !B)
    CLC1CON=(1<<_CLC1CON_LC1EN_POSITION)|4;    // Output enabled, Mode=4==Dtype - this gives a 750ns or 825ns '1' pulse when SDO=='1' and the SCLK is active, this is the long input to CLC2

    // Merge in the long '1' pulse from CLC1, and the short '0' pulse from the gated CCP1 depending on SDO
    // Use dual 2 input AND + single OR = device type of 0
    // Select the 4 inputs we need
    CLC2SEL0=CLCSEL_SCLK;   // MSSP1_clk_out (SCLK)
    CLC2SEL1=CLCSEL_SDO;    // MSSP1_data_out (SDO)
    CLC2SEL2=CLCSEL_CCP1;   // CCP1_out (short pulses, for '0' output)
    CLC2SEL3=CLCSEL_CLC1;   // CLC1_out (stretched CCP2 pulses gated on SDO/SCLK, for '1' output)
    CLC2GLS0=0b10000000;    // CLC2GLS0=And1.1 - CLC1 - The long '1' pulse (already gated on SDO/SCLK)
    CLC2GLS1=0b00001000;    // CLC2GLS1=And1.2 - SDO- Gate based on SDO==1 (already done but we have spare logic)
    CLC2GLS2=0b00010001;    // CLC2GLS2=And2.1 - CCP1 && SCLK = !(!CCP1 || !SCLK) - CCP1 is the short '0' pulse, gate this with SCLK to avoid fake pulses between bytes
    CLC2GLS3=0b00000100;    // CLC2GLS2=And2.2 - !SDO Gate based on SDO==0
    CLC2POL=0b0100;         // Invert (!CCP1 || !SCLK) to get CCP1 && SCLK
    CLC2CON=(1<<_CLC2CON_LC2EN_POSITION)|0;    // Output enabled, 0=dual 2 input ANDs feeding into dual OR - This is the WS28 output
    ; Set TMR2 to 625ns/750ns
    BANKSET  T2CON
#ifdef T2HLT
    clrf     T2HLT
#endif
#ifdef T2CLKCON
    movlw    1
    movwf    T2CLKCON     ; 1=Fosc/4 (8MHz=125ns)
#endif
    movlw    SSP_BIT_CLK_COUNT-1    ; 5*125=625ns, x2=1.25us per bit, 6*125=750ns, x2=1.5us per bit
    movwf    PR2
    movlw    1<<TMR2ON
    movwf    T2CON        ; T2CON = TRM2ON, POS=00, if not CLKCON, then CLKbits = FOSC/4

    ; Point CCP1/CCP2 to use TMR2 (they do by default, but just in case)
#ifndef CCPTMRS
#ifdef CCPTMRS0
#define CCPTMRS CCPTMRS0
#endif
#endif
#ifdef CCPTMRS
    BANKSET  CCPTMRS
    movfw    CCPTMRS
    andlw    b'11110000'  ; Preserve whatever CCP3/4 are using
    iorlw    b'00000101'  ; Set CCP2/CCP1 to use TMR2
    movwf    CCPTMRS
#endif

    ; Setup CCP1 for the '0' pulse duration, so 375ns high time
    BANKSET  CCP1CON
    movlw    b'10001100'  ; EC.?.OUT.FMT.MMMM, so PWM Mode, Right aligned format, Mode=1100=PWM
    movwf    CCP1CON
    clrf     CCPR1H       ; High period 3*125=375
    movlw    8+3          ; 8+3=375ns
    movwf    CCPR1L

    ; Setup CCP2 for the '1' pulse duration, so 750ns high time
    ; BUT: That's longer than TMR2, so set it to the extra part (750-625)==125ns (so 625+125=750ns, or 750+125=825ns)
    BANKSET  CCP2CON
    movlw    b'10001100'  ; EC.?.OUT.FMT.MMMM, so PWM Mode, Right aligned format, Mode=1100=PWM
    movwf    CCP2CON
    clrf     CCPR2H       ; High period
    movlw    4            ; 4=125ns
    movwf    CCPR2L
    ; Watch out: Both CCP1 and CCP2 seem to have a few clocks delay to the SPI, are probably syncronised to Fosc/4

    ; Set SPI to TMR2/2, so 1250ns/1500ns per bits
    BANKSET  SSP1CON1
    movlw    (1<<SSPEN)|0x03    ; 0x03=TMR2/2, 0x0A=FOSC/((SSP1ADD + 1) *4)
    movwf    SSP1CON1
    movlw    (1<<BOEN)
    movwf    SSP1CON3

    ; This needs the enhanced version of the CLC, that has 4 full registers, not the 2 x nibble registers...
    ; It uses CLC1 to generate the complicated '1' signal (it's over a TMR2 period, so needs to be TMR2+CCP2
    ; Select the 3 inputs we need
    BANKSET  CLC1SEL0
    movlw    CLCSEL_SCLK   ; MSSP1_clk_out (SCLK)
    movwf    CLC1SEL0
    movlw    CLCSEL_SDO    ; MSSP1_data_out (SDO)
    movwf    CLC1SEL1
    clrf     CLC1SEL2      ; None
    movlw    CLCSEL_CCP2   ; CCP2_out (short pulses)
    movwf    CLC1SEL3
    ; To stretch the output over a TMR2 period (and handle the CCP2 offset relative to SDO/SCLK)
    ; This uses D-type flipflop with Set and Reset
    ; If SDO==1, at start of CCP2 the output is set, it's cleared by clocking SCLK on the trailing edge of second CCP2
    ; CLC1GLS0=Clk
    movlw    b'01000000'   ; !CCP2
    movwf    CLC1GLS0      ; This will clock the data through on both the first and second halves of the SCLK
    ; CLC1GLS1=D
    movlw    b'00000101'   ; SCLK && SDO = !(!SCLK || !SDO)
    movwf    CLC1GLS1      ; We merge this with SDO, so that if SDO is '1' it has no effect during the 1/2 half of SCLK.
    ; CLC1GLS2=R=
    clrf     CLC1GLS2      ; None
    ; CLC1GLS2=S=
    movlw    b'01000101'   ; CCP2 && SCLK && SDO = !(!CCP2 || !SCLK || !SDO)
    movwf    CLC1GLS3      ; If SDO==1, then set the output at the start of the CCP2 (gate on SCLK to avoid triggers between bytes)
    movlw    b'1010'
    movwf    CLC1POL       ; Inverts: The first level blocks are OR, so we do Y=(A && B) with Y=!(!A || !B)
    movlw    (1<<LC1EN)|4  ; Output enabled, Mode=4==Dtype
    movwf    CLC1CON       ; this gives a 750ns or 825ns '1' pulse when SDO=='1' and the SCLK is active

    ; Merge in the long '1' pulse from CLC1, and the short '0' pulse from the gated CCP1 depending on SDO
    ; Select the 4 inputs we need
    BANKSET  CLC2SEL0
    movlw    CLCSEL_SCLK   ; MSSP1_clk_out (SCLK)
    movwf    CLC2SEL0
    movlw    CLCSEL_SDO    ; MSSP1_data_out (SDO)
    movwf    CLC2SEL1
    movlw    CLCSEL_CCP1   ; CCP1_out (short pulses, for '0' output)
    movwf    CLC2SEL2
    movlw    CLCSEL_CLC1   ; CLC1_out (stretched CCP2 pulses gated on SDO/SCLK, for '1' output)
    movwf    CLC2SEL3

    ; Use dual 2 input AND + single OR = 000
    ; CLC2GLS0=And1.1
    movlw    b'10000000'   ; CLC1
    movwf    CLC2GLS0      ; The long '1' pulse (already gated on SDO/SCLK)
    ; CLC2GLS1=And1.2
    movlw    b'00001000'   ; SDO
    movwf    CLC2GLS1      ; Gate based on SDO==1 (already done but we have spare logic)
    ; CLC2GLS2=And2.1
    movlw    b'00010001'   ; CCP1 && SCLK = !(!CCP1 || !SCLK)
    movwf    CLC2GLS2      ; CCP1 is the short '0' pulse, gate this with SCLK to avoid fake pulses between bytes
    ; CLC2GLS2=And2.2
    movlw    b'00000100'   ; !SDO
    movwf    CLC2GLS3      ; Gate based on SDO==0
    movlw    b'0100'
    movwf    CLC2POL       ; Invert (!CCP1 || !SCLK) to get CCP1 && SCLK
    movlw    (1<<LC2EN)|0  ; Output enabled, dual 2 input ANDs feeding into dual OR
    movwf    CLC2CON       ; This is the WS28 output

Output

Here is some example output from this code. It's showing 0x7F/00/00 0x00/7F/00 0x00/00/7F.
This is showing the RAW SPI clock and data, the intermediate CLC1, and the final CLC2 output (that goes to the LEDs)



The Roger Writes series

I research / dabble with lots of things, and figured that if I write my notes here, I can quickly reference them, also, sometimes, they are useful to others!
Here is what I have so far:





Homepage.
This page was lasted updated on Sunday, 24-Mar-2024 12:53:19 GMT

This content comes from a hidden element on this page.

The inline option preserves bound JavaScript events and changes, and it puts the content back where it came from when it is closed.

Click me, it will be preserved!

If you try to open a new Colorbox while it is already open, it will update itself with the new content.

Updating Content Example:
Click here to load new content